|Authors||D. Falessi, G. Cantone and G. Canfora|
|Title||Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements Via Natural Language Processing Techniques|
|Afilliation||Software Engineering, The Certus Centre (SFI), Software Engineering|
|Project(s)||The Certus Centre (SFI)|
|Publication Type||Journal Article|
|Year of Publication||2012|
|Journal||IEEE Transactions on Software Engineering|
Though very important in software engineering, linking artifacts of the same type (clone detection) or different types (traceability recovery) is extremely tedious, error-prone and requires significant effort. Past research focused on supporting analysts with mechanisms based on Natural Language Processing (NLP) to identify candidate links. Because a plethora of NLP techniques exists, and their performances vary among contexts, it is important to define and use reliable approaches for selecting the right NLP for the task at hand. In this paper we propose a novel analysis procedure to evaluate the performance of NLP techniques; the procedure aims at alleviating the influence of the adopted dataset on the results and is independent of both the type of artifacts being linked and the type of analyzed NLP. The paper presents a case study that applies the analysis procedure to characterize a large number of NLP techniques for identifying equivalent requirements in the context of an Italian company in the defense and aerospace domain. Major results from the case study include: i) the performances among NLP techniques vary significantly; ii) NLP techniques that are able to detect similarity even when the terms are not identical resulted in worst performances; iii) the combination of NLP techniques significantly improves the benefits of adopting any single NLP; iv) only the 12% of NLP provide different information.