Automatic Program Repair using Deep Learning

Develop and evaluate data-driven techniques and prototypes for automatically repairing bugs in source code.
Master

Keywords: Automated Program Repair, Deep Learning, Natural Language Processing, ML4Code

Description: Faults (aka bugs) in software systems can affect large groups of people and lead to massive financial damages. Correcting such bugs accounts for a significant portion of overall software development costs. Automated program repair (APR) techniques aim to reduce these costs by automatically generating program patches - edits in code - to remove bugs from software systems. In this project, you will investigate APR approaches that are developed using data-driven techniques.

Data-driven software engineering aims to use the wealth of data produced during software development and operation to support its development, maintenance, and evolution. Concretely, we apply deep learning and data mining techniques on software engineering data, such as source code, versioning histories, issue tracking, build & test logs, operational data. From this data, we derive actionable insights, which in this case are suggestions of code edits to repair bugs in software. The underlying assumptions are that the vast amounts of code must contain implicitly embedded knowledge on how good code should be written, and that this knowledge can be uncovered through deep learning and data mining.

Goal

The goal of this project is to investigate how, and to what extent it is possible to automatically repair bugs in source code based on the frequent patterns that are learned from large corpora of source code. The most interesting starting points for investigation come from modern deep learning-based natural language processing and are very similar to the ones that help today's email programs suggest how to continue or finish a sentence. Recently, this technology has also been used in IDEs for advanced code completion, which we hypothesize makes them ideal candidates for the generation of repair suggestions.

Learning outcome

  • Application of data science in a software engineering context
  • Proficiency with implementing and evaluating data-driven software engineering techniques and prototypes
  • Gain appreciation for the state of the art in machine learning on source code
  • Experience with working in an exciting and active research environment
  • Excellent opportunities to publish your research results in the form of a scientific publication

Qualifications

  • Interested in deep learning, in particular, natural language processing, and machine learning for source code analysis (an area also known as ML4code)
  • Interested in experimenting with APR approaches, reusing and building models for source code, as well as evaluating them
  • Preferably knowledge of python and LaTeX.

Supervisors

  • Anastasiia Grishina
  • Leon Moonen

Contact person