NLP-Based automated Conspiracy Detection for Massive Twitter Datasets

Manually tagging social media posts is time-consuming, especially when the data sets grow in the billions. We aim to develop a machine learning-based system that automates the detection of conspiracy-supporting or promoting tweets collected during the COVID-19 pandemic.

Digital wildfires, i.e., fast-spreading inaccurate, counterfactual, or intentionally misleading information, can quickly permeate public consciousness and have severe real-world implications. While a sheer endless amount of misinformation exists on the internet, only a small fraction of it spreads far and affects people to a degree where they commit harmful and/or criminal acts in the real world. The COVID-19 pandemic has severely affected people worldwide, and consequently, it has dominated world news for months. Thus, it is no surprise that it has also been the topic of a massive amount of misinformation, which was most likely amplified by the fact that many details about the virus were unknown at the start of the pandemic. This thesis aims to develop methods capable of detecting such misinformation and its active spreaders. We consider primarily the narrative that the COVID-19 outbreak is a deliberate consequence of human activity or is somehow connected to emerging technologies, covering various conspiracy theories about COVID-19.


The goal is to assign massive amounts of COVID-19 related Tweets to different conspiracy theories.

Learning outcome

AI & machine learning

Big Data Analysis


You should be open-minded and able to work in an international team of researchers from different institutions. You should have machine learning skills or at least be very interested in machine learning. Moreover, you should have the ability to plan your work schedule independently. However, the most important requirement is motivation.


  • Johannes Langguth
  • Daniel Thilo Schroeder

Contact person