IPFS for large-scale and long-term DNS data storage and access
Backed by economic incentives, the DNS ecosystem limits data collection, access and sharing to researchers. For instance, each stakeholder controls limited parts of the DNS space, thereby limiting analysis of real-world DNS behaviour. Thus, researchers can either create 1) new dataset or 2) collect partially or totally existing datasets or 3) combine datasets.
Collected data can be public or private, long-term or short-term using passive or active measurement. However, whatever the approach, the data collected is massive, limiting the capability of long-term centralised storage. For instance, OpenINTEL  reported 10TB of compressed data as of February 2015 from .com, .net and .org.
Several attempts have been made to share the massive amount of DNS data including but not limited to BitTorrent and publicly available FTP or HTTP servers. BitTorrent usage is limited while most DNS data files are available through HTTP. Therefore, accessing DNS data is challenging. Moreover, it is impossible to download part of the content of the DNS data file without downloading the complete file.
The InterPlanetary File Systems (IPFS) by providing distributed storage and delivery may help to overcome current protocols limitations for DNS data access. According to , IPFS provides better performance than BitTorrent while allowing content-based addressing. Thus, and similarly to BitTorrent, IPFS allows the access of part of the content of a file. Moreover, by adopting the distributed design, IPFS reduces the need of centralised DNS data collectors. Each DNS data provider can control access to their data through the IPFS p2p network without the burden of a centralised HTTP server.
The goal of this thesis is to evaluate the use of IPFS for massive and large-scale DNS data storage and access. This work will estimate the possibility to use IPFS as it is or assess the need to extend IPFS with new features.
- Gain understanding of DNS distributed data collection challenges
- Increase knowledge on IPFS and/or other distributed storage systems
- Enhance programming skills on distributed networks
- Work in an interdisciplinary field, gaining experience with distributed networking, p2p networks and storage modelling
- Working on a real-world data
- Collaboration with researchers
The master’s student will ideally have:
- Completed a bachelor’s degree in informatics/maths or a related discipline
- General programming skills, preferably C, rust or go-lang
- Familiarity with version control system such as Github
- Exposure or interest in DNS and/or distributed storage and access
- Alfred Arouna
- Ioana Alexandrina Livadariu
 van Rijswijk-Deij, Roland, et al. "A high-performance, scalable infrastructure for large-scale active DNS measurements." IEEE journal on selected areas in communications 34.6 (2016): 1877-1888.
 Trautwein, Dennis, et al. "Design and evaluation of IPFS: a storage layer for the decentralized web." Proceedings of the ACM SIGCOMM 2022 Conference. 2022.