Tales from the field: Building an effective test environment for a scalable service

By | | 8 min read

Read this blog see how we use our own AppDynamics products to test our Events Service - which currently receives more than three trillion events per month.

This blog series highlights how we at AppDynamics use our own products to test our Events Service, which currently receives more than three trillion events per month. The first post in this series, Automation Framework in Analytics – Part 1, was published in September.

To summarize part one of the series, the key requirements to have a successful testing effort against our Events Service architecture are as follows:

– To come up with the most ideal way to bring up a test environment. This required evaluating possible choices of test infrastructures to find out which best filled our needs.

– To design a robust and scalable framework that could make use of these environment(s) and run tests effectively and reliably as part of CI.

Let’s go down a step and analyze the various options we had to bring up the best test environment.

Bring up an environment in the local box by bringing up the individual processes


– Does not need separate hardware/machines.


– Deploying the process in a single environment with custom properties starts getting complex when the test environment scales. For example, we had to test our product’s behavior by spawning a large number of Kafka brokers and Elasticsearch data nodes, with multiple other components we use. It’s better to deploy isolated environments than bringing them up together in a a complex, hard-to-manage setup.

– Negative tests, such as bringing components down and then bringing them up, may not be seamless.

– The development and deployment environments are different, so at times, specific scripts are required to bring up test environments, at component level.

– Non-isolated deployment, which means the deployed components do not get deployed in an isolated environment and all the components/stacks are deployed monolithically in the same host as a set of individual processes. This is not close to how we deploy the stack in production.

– Impacts test reliability and may cause flaky tests at times, especially when we have to play around with shutting down and restarting the process.

Have some separate VMs/bare metal hosts allocated for these tests to run


– Reliable isolated environment for deployment and execution

– A properly managed test environment, which can scale up to a reasonable point.

– Running negative tests is possible.


– Incurring more hardware costs for tests which may not be absolutely necessary.

– Requires special attention to manage VMs and use them periodically. The test may not have great flexibility to make it elastic.

– Flexible configurations are not so easy. For example, if the tester needs to run the same test twice, using a 3 broker Kafka and then a 5 broker Kafka, manual intervention may be required to have the setup available separately for both of these tests.

– Costly environments such as this for a per-check-in test (tests that run for every code check-in) is pretty much on the luxury end. At the minimum, these environments should be used wisely to get the best results.

Use AWS for deploying these components and use them for testing


– Reliable, isolated environment for deployment and execution.

– More flexibility when it comes to scaling.

– No need to maintain/require separate resources to manage the test infrastructure.

– Running negative tests is still possible.

– Choosing instances requires computing and storage power based on the situation.

– Infrastructure can be managed by tests with ease.

– Can be scaled towards running performance tests.


– We should always answer the question, “How can we most effectively reduce our AWS bill?”

– Running per-check-in tests/smoke tests in AWS is too costly for the requirement.

Use Docker containers


– Creates secluded environment to deploy the components separately without new machines/hardware.

– Easy to handle negative tests by bringing up or closing down a container.

– Less costly. All that is needed is to have the image pulled from a Docker registry, or build a Docker image locally.

– Easy to have uniform script across Unix-based systems.

– Sanity and smoke tests can be run in a more flexible and cost-effective environment.

– Infrastructure can be managed by tests with ease.


– Takes time based on the heaviness of the component and wiring things  together. This will cause a bottleneck when the tests require frequent cluster restart. For example, in case of on-premises, where the number of containers could be just 1, it may get spawned and set up in real time. However, when it comes to SaaS, the test may spawn 16 containers and the heaviest component (the API node) may take some time to ensure that all the components are ready to talk to each other. Therefore, restarting cluster would be a costly operation, given that the hardware configurations we give for a Docker container are not as powerful as a physical system.

– May not be the right methodology to run tests with larger clusters, given that we have to run our tests in both development as well as the test environment.

– May not be scaled towards running performance tests.

Coming up with a choice

Since the per-check-in test run frequencies would be pretty high, we felt that utilizing a Docker-based test framework would be best for us. However, we had to opt for a slightly more powerful environment to run tests, which involves more time-consuming tests, which are highly processing-intensive. Since we are an SaaS-based company with operational excellence established in handling cloud systems, we opted for AWS as a parallel environment to run such tests. In the end, we decided that we had to build a hybrid test framework which has the flexibility to run tests in either Docker or AWS, depending on what is required of the test.

Test framework architecture

Test flow

Below is a more detailed explanation of the framework architecture represented above. This framework has proven to be effective in our build pipeline.

– Test Categorization

  •  The tests are categorized in a way that consumes time or that should be part of nightly builds are categorized in AWSTest.xml
  • The regular smoke/sanity tests are added inside Docker-SaaS.xml and Docker-OnPrem.xml


– BaseTest finds the environment required based on the test XML specified.

– The base test passes control to ClusterFactory, which spawns a cluster as required by the test and returns the cluster as an object to the test. The contextual wiring up of containers or the nodes are done inside the specific object returned.

– Every test created is an extension of BaseTest, which will start running the test against the environment.

– The test environments are collapsed in the end.

Performance on various aspects

There had been a notable improvement in performance from the following aspects:

– The overall test running time

– Infrastructure-based tests, such as testing resiliency by bringing individual components up and down

– Infrastructure and environment setup time

– Effort needed to expand or shrink the environment while running the test

– Ability to add more tests and expand the framework

– Ease of debugging

Two areas above where performance especially improved were infra setup time and running infra-based tests, which are elaborated on below.

Performance improvement on infra setup time

We observed definite infra setup performance improvement during every iteration. Time taken to make the environment test-ready in manual/semi-automated ways against fully-automated ways were computed and compared in the charts below.

Performance improvement on running infra-based tests

As part of our daily regressions, we executed over 20 tests, which used to consume too much time if not executed diligently in a fully-automated way. For instance, tests such as bringing down the services, nodes, and generating interruptions (resilience tests) in the services are time-consuming if done through semi-automated or manual ways. Additionally, these tests would turn out to be too complex to handle and time-consuming if the number of nodes/hosts we use for services needed to be increased as part of the test.

Also, these services take time to be brought up in AWS infrastructure, which may not be the ideal environment to perform such tests run when keeping ROI and cost-effectiveness in mind. Using Docker helped us out in running tests. Docker is easily managed through the APIs and scripts we have created, and the tests were effective for managing an Events Service infrastructure. Scaling and shrinking the nodes in Docker through our tests proved to be much simpler than AWS and less expensive, and allowed us to run tests on our laptops as well as in TeamCity/Jenkins build farms. This approach turned out to be especially effective when the number of resilience tests started to increase.

Future work

Though we’re pleased with the results of our testing framework, we still recognize the improvements that can be made. Below are some of the enhancements we’re working on:

– Making tests self-aware of the elasticity and environment to be chosen.

– Some architecture and code revamps to comply with best practices.

– The ability to plug in reliability tests and performance tests.

Want to learn more about AppDynamics? Learn more here or schedule a demo today.