|Authors||G. Fraser and A. Arcuri|
|Title||A Large Scale Evaluation of Automated Unit Test Generation Using EvoSuite|
|Afilliation||Software Engineering, Software Engineering, Software Engineering|
|Project(s)||The Certus Centre (SFI)|
|Publication Type||Journal Article|
|Year of Publication||2014|
|Journal||ACM Transactions on Software Engineering and Methodology|
Research on software testing produces many innovative automated techniques, but because software testing is by necessity incomplete and approximate, any new technique faces the challenge of an empirical assessment. In the past, we have demonstrated scientific advance in automated unit test generation with the EVOSUITE tool by evaluating it on manually selected open source projects or examples that represent a particular problem addressed by the underlying technique. However, demonstrating scientific advance is not necessarily the same as demonstrating practical value: Even if EVOSUITE worked well on the software projects we selected for evaluation, it might not scale up to the complexity of real systems. Ideally, one would use large “real-world” software systems to minimize the threats to external validity when evaluating research tools. However, neither choosing such software systems nor applying research prototypes to them are trivial tasks. In this paper we present the results of a large experiment in unit test generation using the EVOSUITE tool on 100 randomly chosen open source projects, the 10 most popular open source projects according to the SourceForge website, 7 industrial projects, and 11 automatically generated software projects. The study confirms that EVOSUITE can achieve good levels of branch coverage (on average 71% per class) in practice. However, the study also exemplifies how the choice of software systems for an empirical study can influence the results of the experiments, which can serve to inform researchers to make more conscious choices in the selection of software system subjects. Furthermore, our experiments demonstrate how practical limitations interfere with scientific advances: Branch coverage on an unbiased sample is affected by predominant environmental dependencies. The surprisingly large effect of such practical engineering problems in unit testing will hopefully lead to a larger appreciation of work in this area, thus supporting transfer of knowledge from software testing research to practice.