Knowledge Graphs for Software Vulnerability Assessments
Software vulnerabilities are weaknesses in software systems that can trigger unintended actions. The exploitation of security vulnerabilities in software can affect large groups of people and lead to massive financial damages. Several automated software vulnerability assessment techniques build on data sources that collect, rank, and abstract knowledge about concrete vulnerabilities found in existing systems. Many of these data sources keep their data in ways that make it difficult for machines to understand, combine and and reuse the knowledge automatically. In order to discover and reason about implicit connections amongst weaknesses, security experts have to manually extract the vulnerability knowledge, analyse the vulnerability descriptions, and link it to related issues.
Knowledge graphs are knowledge bases that can be programmatically constructed from heterogeneous data sources using systematic open information extraction methods. A vulnerability knowledge graph can be thought of as a knowledge graph that combines information about source code, commits, issue tracking, build-test logs, and vulnerability descriptions and scorings from several vulnerability databases, such as the NVD (National Vulnerability Database), Common Vulnerabilities and Exposures (CVE), Common Weakness Enumeration (CWE), and Common Vulnerability Scoring System (CVSS), Open Web Application Security Project (OWASP), Bugzilla etc. Different software repository mining, machine learning, natural language processing, logical inference and correlation analysis techniques can be used to process and analyse different entities of the vulnerability data. The information derived from different data sources can be added as new knowledge to the graph and linked to the vulnerability knowledge graph ontology. Reasoning over the vulnerability knowledge graph can help identify potential vulnerabilities in a system, as well as uncover implicit relationships among weaknesses.
The goal of this project is to investigate how to construct vulnerability knowledge graphs and evaluate their use for software vulnerability assessment by reasoning over the constructed vulnerability knowledge graph. Moreover, it aims to evaluate techniques for enriching existing knowledge graphs with new knowledge using additional fact extraction and inference techniques.
- application of data science in a software engineering context
- proficiency with implementing and evaluating data-driven software engineering techniques and prototypes
- gain appreciation for the state of the art in machine learning on source code
- experience with working in an exciting and active research environment
- excellent opportunities to publish your research results in the form of a scientific publication
- interested in software security/application security
- interested in machine learning, in particular machine learning on source code, logs, commits
- programming in Python and preferably paper writing in LaTeX
- Guru Prasad Bhandari
- Leon Moonen