|Authors||L. Burchard, D. T. Schroeder, K. Pogorelov, S. Becker, E. Dietrich, P. Filkukova and J. Langguth|
|Title||A Scalable System for Bundling Online Social Network Mining Research|
|Project(s)||UMOD: Understanding and Monitoring Digital Wildfires, Department of High Performance Computing|
|Publication Type||Proceedings, refereed|
|Year of Publication||2020|
|Conference Name||2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)|
Online social networks such as Facebook and Twitter are part of the everyday life of millions of people. They are not only used for interaction but play an essential role when it comes to information acquisition and knowledge gain. The abundance and detail of the accumulated data in these online social networks open up new possibilities for social researchers and psychologists, allowing them to study behavior in a large test population. However, complex application programming interfaces (API) and data scraping restrictions are, in many cases, a limiting factor when accessing this data. Furthermore, research projects are typically granted restricted access based on quotas. Thus, research tools such as scrapers that access social network data through an API must manage these quotas. While this is generally feasible, it becomes a problem when more than one tool, or multiple instances of the same tool, is being used in the same research group. Since different tools typically cannot balance access to a shared quota on their own, additional software is needed to prevent the individual tools from overusing the shared quota. In this paper, we present a proxy server that manages several researchers' data contingents in a cooperative research environment and thus enables a transparent view of a subset of Twitter's API. Our proxy scales linearly with the number of clients in use and incurs almost no performance penalties or implementation overhead to further layer or applications that need to work with the Twitter API. Thus, it allows seamless integration of multiple API accessing programs within the same research group.