Analyzing the Impact of Developer Sentiment on Software Fault-Proneness or Maintainability

Develop and evaluate data-driven techniques and prototypes for automatically analyzing developer sentiment in commit message or code comments, and investigating if and how developer sentiment impacts software fault-proneness or software maintainability.
Master

Previous research has shown that human factors such as emotions, mood, and stress affect task quality, productivity, creativity, group rapport, and job satisfaction. It is therefore not unlikely that the emotions of a software engineer affect the quality of the code that they deliver, for example in terms of its fault-proneness or maintainability. Being aware of such impacts could help development teams, for example, with allocating more code review or testing efforts to affected parts of the code.

Sentiment analysis (also referred to as opinion mining or emotion AI) concerns the use of text analysis, natural language processing, and computational linguistics, that aims to identify and quantify subjective information from a piece of text, such as the emotional state of the writer while writing that text. Recent advances in deep learning have considerably improved the ability of algorithms to analyze text, making them an exciting technique for doing in-depth research into how the emotional state of software developers may impact software fault-proneness or software maintainability.

Data-driven software engineering aims to use the wealth of data produced during software development and operation to support its development, maintenance, and evolution. Concretely, we apply machine learning and data mining techniques on software engineering data (such as source code, versioning histories, issue tracking, build & test logs, operational data) to derive actionable insights.
As a carrier for the emotions expressed by developers in connection to source code, we can, for example, consider at the commit messages in a versioning system, or the comments left in the code.

The goal of this project would be first to develop data-driven techniques and prototypes for automatically analyzing developer sentiment in commit message or code comments. In the next step, you would investigate if and how developer sentiment impacts software fault-proneness or software maintainability. These studies can be done on open source projects, or on four industrially developed systems that were commissioned by Simula for earlier research projects.

Learning outcome

  • application of data science in a software engineering context
  • proficiency with implementing and evaluating data-driven software engineering techniques and prototypes
  • gain appreciation for the state of the art in sentiment analysis and empirical software engineering
  • experience with working in an exciting and active research environment
  • excellent opportunities to publish your research results in the form of a scientific publication

Qualifications

  • interested in human factors and empirical software engineering
  • interested in machine learning, in particular, sentiment analysis
  • preferably knowledge of python, R and LaTeX.

Supervisors

  • Leon Moonen

Contact person