Topic Classification of Scientific Papers

Topic Classification of Scientific Papers

Millions of scientific papers are published very year. Can we use AI to understand their contents?

Topic classification of scientific papers is a widely studies problem, but solving the problem with high accuracy remains very challenging. Most current approaches use graph neural networks based on the citation graph that links all scientific papers, together with BERT features from the paper text. Understanding the topic is closely linked with other relevant questions such as evaluating paper quality, importance, and novelty. The thesis will build on existing work, with the aim of improving the state of the art using large scale computation.


The goal of this thesis is to build a classifier for the topics of scientific papers, using text, images, and connections to other papers.


  • Experience with Python
  • Experience with deep learning software such as PyTorch
  • Experience with GNNs is helpful
  • Experience with NLP applications is very helpful

Collaborations partners

  • University of Lorraine

Associated contacts

Johannes Langguth

Senior Research Scientist