Build a Twitter Data Collector
In recent years, the issue of influencing people by disseminating false information has been a recurrent topic in the media. For example, the US presidential election in 2016 sparked a broad debate about the influence of Russian bots in social media. Furthermore Donald Trump, who popularized the term "Fake News" , is also a considerable source of misinformation himself (Sweden(2), Inauguration(1) etc). The consequences of such Fake News are not only reflected in election results but can also have other effects, For example, the claim made in 2013 by the "Syrian Electronic Army" about a terrorist attack on the White House in which President Obama was injured, caused a crash on the stock markets. Meanwhile, in WhatsApp had to deactivate the group share function, as it caused lynchings of innocent people in India. When fast spreading misinformation has severe real-world consequences, we speak of "Digital Wildfires".
We have created the UMOD(3) project, which aims at understanding such Digital Wildfires on all forms of electronic news platforms. The work described here moves within the scope of social networks, especially Twitter.
A tool is to be developed that makes it possible to acquire, process and evaluate Twitter data. The following technologies are to be integrated for this purpose:
- Sentiment Analysis for Tweets
- Language Analysis for Tweets
- Output in different formats
- Bot Detection
- Fake Follower detection
- Load balancing for Requests to the Twitter API
For all these features there are web services or implementations. It’s more about combining these components to create an API that provides useful functionality in the project context. Suggestions for the technologies to be used are:
- Google Sentiment
- Java Language detection library
If possible, the tool should be written in Java. Here are a few examples of basic requests the tool should be able to answer:
- The network in which all users that speak Norwegian, Swedish, or Danish, are more than 18 years old, and have less than 100 tweets on their timeline.
- The follower network of X up to degree Y which deals wit a specific topic, e.g. climate change All Tweets related to a url X. within a certain timeframe
We expect self-motivation, initiative and the ability to work independently, as well as working knowledge of a programming language such as Python or Java. Knowledge of social network analysis is not required, although familiarity with Twitter would be helpful.
- Pål Halvorsen
- Carsten Griwodz
- Johannes Langguth
- Daniel Thilo Schroeder
TUB - Technical University of Berlin
UiO - University of Oslo
 Inauguration of Donald Trump. N.d. goo.gl/zNk3rA. Accessed: 2019-01-22.
 Sweden to Trump What happened last night? N.d. goo.gl/sh4vZg. Accessed: 2019-01-22.
 UMOD: Understanding and Monitoring Digital Wildfires. N.d. goo.gl/RwBqH8. Accessed: 2019-01-22.