Clustering Algorithms for Long-term Twitter observation
Clustering is a fundamental concept in unsupervised learning. However, most clustering techniques are offline algorithms, which means that all data must be present from the start. Today many problems require online algorithms that are able to cluster data while it is being produced. In this thesis we will implement such an algorithm that is capable of clustering massive datasets.
The goal of this thesis is to implement an online clustering algorithm for massive datasets on a supercomputer, and test it with a continuous stream of Twitter data.
Implementation of large scale parallel algorithms
Use of supercomputers
Online clustering algorithms
Knowledge of C/C++
Familiarity with parallel graph algorithms
Experience with MPI and/or OpenMP
- Johannes Langguth
- Xing Cai
Indiana University Bloomington