by:
Abstract:Social media sites (e.g. Twitter and Pinterest) allow users to change the name of their accounts. A change in the account name results in a change in the URL of the user’s homepage. We develop an algorithm that discovers a large number of social media accounts performing synchronous and collaborative URL changes. We identify various types of URL changes such as handover, exchange, serial handover and loop exchange. All such behaviors are likely to be automated behavior and may indicate accounts either already involved in malicious activities or being prepared to do so. We focus on URL handovers where a URL is released by a user and claimed by another user. In this paper, we analyze URL changes and handovers in social media, and find interesting association between handovers and temporal, textual and network behaviors of users. We show several anomalous behaviors from suspicious users for each of these associations. We identify that URL handovers are instantaneous automated operations. We further investigate to understand the benefits of URL handovers, and identify a strong association with misleading internal links and avoiding suspension mechanisms of the hosting sites. Our handover detection algorithm, which makes such analysis possible, is scalable to process millions of posts (e.g. tweets, pins) and shared publicly online |
PDF:Pdf version of the paper is available here. |
Slides:The slides are available here |
Java Source Code:The main part of the project is implemented in Java. You can find the source code here. |
Download JAR file, and install Spark:In order to run the program, you need to have the compiled jar file of the project, and Spark. Run the following commands in your terminal to do these two tasks.
$mkdir handover
$cd handover
$wget http://cs.unm.edu/~hamooni/papers/handovers/spark-1.3.1-bin-hadoop2.6.tgz
$tar xvfz spark-1.3.1-bin-hadoop2.6.tgz
$wget http://cs.unm.edu/~hamooni/papers/handovers/SparkSort-1.0-SNAPSHOT.jar
|
Run the project:After downloading Spark and the JAR file of the project, run the following command in your terminal to detect handovers for a given set of tweets (you should be in "handover" directory)
$spark-1.3.1-bin-hadoop2.6/bin/spark-submit --class Sorter --master local[*] SparkSort-1.0-SNAPSHOT.jar input_file output_file
Each line in the input file should contain a tweet object with the following format:
timestamp, URL, ID
Here is a sample input file. The output file has the following format. Each line shows a set of handovers on a URL:
URL, ID1, t1, t2, ID2, t3, t4, ...
which means ID1 owned the URL from t1 to t2, ID2 owned the URL from t3 to t4, and so on. |
Utilities:If you use Twitter streaming API to get some tweets, this helps you to convert the API output to the format needed by our code:
$python convert_twitter_output.py File1 File2
where File1 is the API output, and File2 is the new formatted file which can be used as the input of our code. |
Dataset:All detected handovers in 3 months: here All detected URL changes in 3 months: here |