|Authors||C. Laaber, T. Yue and S. Ali|
|Title||Multi-Objective Search-Based Software Microbenchmark Prioritization|
|Project(s)||Department of Engineering Complex Software Systems, AIT4CR: AI-Powered Testing Infrastructure for Cancer Registry System|
|Publication Type||Journal Article|
|Year of Publication||2022|
|Keywords||JMH, Multi-objective optimization, Performance Testing, Regression testing, Search-Based Software Engineering, software microbenchmarking, Test Case Prioritization|
Ensuring that software performance does not degrade after a code change is paramount. A potential solution, particularly for libraries and frameworks, is regularly executing software microbenchmarks, a performance testing technique similar to (functional) unit tests. This often becomes infeasible due to the extensive runtimes of microbenchmark suites, however. To address that challenge, research has investigated regression testing techniques, such as test case prioritization (TCP), which reorder the execution within a microbenchmark suite to detect larger performance changes sooner. Such techniques are either designed for unit tests and perform sub-par on microbenchmarks or require complex performance models, reducing their potential application drastically. In this paper, we propose a search-based technique based on multi-objective evolutionary algorithms (MOEAs) to improve the current state of microbenchmark prioritization. The technique utilizes three objectives, i.e., coverage to maximize, coverage overlap to minimize, and historical performance change detection to maximize. We find that our technique improves over the best coverage-based, greedy baselines in terms of average percentage of fault-detection on performance (APFD-P) and Top-3 effectiveness by 26 percentage points (pp) and 43 pp (for Additional) and 17 pp and 32 pp (for Total) to 0.77 and 0.24, respectively. Employing the Indicator-Based Evolutionary Algorithm (IBEA) as MOEA leads to the best effectiveness among six MOEAs. Finally, the technique's runtime overhead is acceptable at 19% of the overall benchmark suite runtime, if we consider the enormous runtimes often spanning multiple hours. The added overhead compared to the greedy baselines is miniscule at 1%.These results mark a step forward for universally applicable performance regression testing techniques.