Modern distributed component and microservice applications can be deployed in various ways and configured using different settings and software stacks. These applications have complex performance characteristics, as the constituent services feature different bottlenecks that may even change over time, depending on the usage profile. In general, microservice architectures present new challenges in the areas of performance monitoring, performance evaluation, performance modeling and performance prediction. The performance community has proposed many approaches that utilize the degrees of freedom of such applications at different points of the software life-cycle to tackle these emerging challenges.
Verifying, comparing, and evaluating the results of such research is difficult. To enable practical evaluation, researchers need a software application that they can deploy as reference, offers realistic degrees of freedom and sufficiently complex performance behavior. The software in question should be open source, provide sufficient instrumentation, and should produce results that enable analysis and comparison of research findings, all while being indicative of how the evaluated research would affect applications in production use.
Real world distributed software is usually proprietary and cannot be used for experimentation. In addition, results from evaluations conducted using such software are difficult to reproduce and compare, as the software used remains inaccessible for other researchers. Existing and broadly used test software does not offer the necessary degrees of freedom and is often manually adapted. Some of the most widely used test and reference applications, such as RUBiS, Dell DVD Store or SPECjEnterprise2010 are outdated and not representative of modern real-world applications. Reference applications from software vendors such as the Sock Shop, on the other hand, use a state-of-the-art software stack that is representative of modern real-world applications. However, these applications do not contain complex business logic and therefore do not offer representative performance behavior.
We present the TeaStore, a micro-services-based test and reference application that can be used as a benchmarking framework by researchers. The TeaStore consists of five services, each featuring unique performance characteristics and bottlenecks:
Service discovery is implemented using the Netflix Ribbon client-side load balancer and a simplified implementation of the Netflix Eureka registry. This enables distributed deployment and high scalability. The TeaStore offers multiple deployment options, manual deployment of WAR files, public docker containers or deployment using container orchestration frameworks, such as Kubernetes. Deployment in a container orchestration framework enables dynamic autoscaling of service instances, container health monitoring and failure recovery. We provide docker containers with a tailored Kieker instrumentation and a central trace repository, which collects the monitoring traces from all service instances using a RabbitMQ Server.
Using these monitoring traces, tools such as the Performance Model Extractor (PMX)  can be used to extract PCM models and evaluate the accuracy of these models. As the TeaStore is continuously developed, it can be used to evaluate approaches that attempt to incorporate architectural performance models into the DevOps cycle, such as . In general, the services’ different resource usage profiles enable performance and efficiency optimization with non-trivial service placement and resource provisioning decisions.
 Jürgen Walter, Christian Stier, Heiko Koziolek, and Samuel Kounev. An Expandable Extraction Framework for Architectural Performance Models. In Proceedings of the 3rd International Workshop on Quality-Aware DevOps (QUDOS'17), L'Aquila, Italy, April 2017. ACM. April 2017
 Manar Mazkatli and Anne Koziolek. 2018. Continuous Integration of Performance Model. In Companion of the 2018 ACM/SPEC International Conference on Performance Engineering (ICPE '18). ACM, New York, NY, USA, 153-158. DOI: doi.org/10.1145/3185768.3186285