|Authors||M. Grambow, D. Kovalev, C. Laaber, P. Leitner and D. Bermbach|
|Title||Using Microbenchmark Suites to Detect Application Performance Changes|
|Project(s)||Department of Engineering Complex Software Systems, AIT4CR: AI-Powered Testing Infrastructure for Cancer Registry System|
|Publication Type||Journal Article|
|Year of Publication||2022|
|Journal||IEEE Transactions on Cloud Computing|
|Keywords||Benchmarking, Microbenchmarks, Performance Change Detection, Performance Testing, Regression Detection|
Software performance changes are costly and often hard to detect pre-release. Similar to software testing frameworks, either application benchmarks or microbenchmarks can be integrated into quality assurance pipelines to detect performance changes before releasing a new application version. Unfortunately, extensive benchmarking studies usually take several hours which is problematic when examining dozens of daily code changes in detail; hence, trade-offs have to be made. Optimized microbenchmark suites, which only include a small subset of the full suite, are a potential solution for this problem, given that they still reliably detect the majority of the application performance changes such as an increased request latency. It is, however, unclear whether microbenchmarks and application benchmarks detect the same performance problems and one can be a proxy for the other. In this paper, we explore whether microbenchmark suites can detect the same application performance changes as an application benchmark. For this, we run extensive benchmark experiments with both the complete and the optimized microbenchmark suites of two time-series database systems, i.e., InfluxDB and VictoriaMetrics, and compare their results to the results of corresponding application benchmarks. We do this for 70 and 110 commits, respectively. Our results show that it is not trivial to detect application performance changes using an optimized microbenchmark suite. The detection (i) is only possible if the optimized microbenchmark suite covers all application-relevant code sections, (ii) is prone to false alarms, and (iii) cannot precisely quantify the impact on application performance. For certain software projects, an optimized microbenchmark suite can, thus, provide fast performance feedback to developers (e.g., as part of a local build process), help estimating the impact of code changes on application performance, and support a detailed analysis while a daily application benchmark detects major performance problems. Thus, although a regular application benchmark cannot be substituted for both studied systems, our results motivate further studies to validate and optimize microbenchmark suites.