Tips for Modern Software Testing

Developers often dream of starting a new code project from scratch, but most have to make do with a legacy project. The best they can hope for is a rewrite.

I was lucky. In my previous role as a development manager at a large financial institution, I had the pleasure to work on a greenfield project, one of the company’s first truly agile efforts. How did we start? The first step was to form scrum teams and fill the backlog of work. One of the problems that arose early on involved software testing.

Many developers on our team were big fans of test-driven development (TDD), a software development process where you write tests before writing the code. The goal is to write a failing test first, although this test is really more of a specification before you start to flesh out the code. The major problem with TDD is that it focuses primarily on developers rather than embracing the entire organization. Enter BDD.

BDD: A Better Approach

Behavior Driven Development, or BDD, is a modern software testing method of getting testers, developers and the business to work together. The idea is to take things up a level and start talking about what your business and customers really want from a software product.

Once you manage to get these guys talking and collaborating with each other, you need a common language. This is often achieved with the famous Given-When-Then method of writing user stories. We used on our project, in fact.

However, there’s a risk that this formula gets abused, and that business leaders start writing stories with Given-When-Then syntax, only without the collaboration. This simply does not work.

Why? In my experience, the business often didn’t understand the terminology and would use it in a way that made no sense at all. More importantly, their approach to Given-When-Then was all wrong. For this method to work, the business needs to collaborate with developers and testers to write user stories.

Automating Your Tests

Once your team has a shared understanding of user behavior—and has articulated this in a common language—it’s time to start automating tests to prove the success of your business definition.

Unfortunately, many companies have a bad strategy when it comes to software testing: too many end-to-end tests and not enough unit tests. The problem here is that end-to-end tests take a long time to run and can be very difficult to maintain. A better approach is to run end-to-end tests and manual tests to cover only the business specification of the user story.

You want to avoid looking at edge cases and bad data in heavy end-to-end (e2e) tests, which are complicated, time-consuming and expensive to run. It’s smarter to do this in other parts of the test package, mainly unit, component, integration and contract tests.

In a unit test, for instance, you only have to exercise a tiny piece of code to check for proper operation. The idea here is to devise a test that achieves a small goal. But too often companies find themselves in a situation with a million end-to-end tests, and a test suite that takes weeks and weeks to run. The end result? They’re not getting any value from their tests.

Your ratio of UI and manual tests to unit tests should resemble the pyramid below. The higher the spot on the pyramid, the lower the number of tests:



Due to the complexity of maintaining end-to-end tests, some companies have ditched them altogether in favor of synthetic testing in production, something that AppDynamics is uniquely positioned to provide with our browser synthetic monitoring. For example, most organizations use Selenium to write their end-to-end tests, and we use Selenium browser automation on the backend for synthetic testing.

The Virtues of Contract Testing

For microservices testing, contract tests can provide the greatest value. When a consumer binds to a producer, a contract forms between them. Since components in a microservices architecture have many contracts, this area of testing becomes critical. In the old days of the monolithic application, code would compile together. Although there would still be implied contracts between various parts of the code, these contracts would be enforced at compile time. Since this safety net isn’t available in a microservices environment, you absolutely need reliable contract tests in place.

Each consumer forms a different contract based on how it uses a service. Here’s a quick  example I’ve borrowed from Martin Fowler:

  • A service exposes a resource with three fields: identifier, name and age.

  • Three consumers, each coupled to different parts of the resource, have adopted the service.

  • Consumer A couples only to the identifier and name fields. The corresponding contract test suite asserts that the resource response contains those fields, but makes no assertion concerning the age field.

  • Consumer B uses the identifier and age fields, but not the name field. Consumer C uses all three fields. The contract test suite for Consumer B makes no assertion for the name field. But for Consumer C, the test suit asserts that the resource response contains all three fields.

It’s important in microservices that these contracts be satisfied, even though downstream services are likely to change over time.

With contract testing, each consuming service runs a suite of unit tests that check whether the inputs and outputs are valid for their purposes. Those tests are then run by the producer as part of its build pipeline should any changes occur to that service.

In my next blog, I’ll cover the practical aspects of setting up a successful build pipeline, which covers contract and microservice testing.

4 Types of Continuous Performance Testing for a DevOps World

No one wants to write fragile, unreliable code. Developers want to build software that is bulletproof and that bounces back if there is a deficiency in a backend service. Coding well takes talent and experience. But resiliency is ultimately the result of performance testing, rigor, and quality feedback. In my last blog post, “The Importance of Application Decomposition,” I laid out a methodology for breaking down an app and properly instrumenting it. Today, we’ll look at test patterns that will help you build software that operates efficiently in an environment that is intrinsically unreliable.

1) Establishing a baseline

This test is an extension of App Decomposition, and with this test, we want to look at the performance of the application when the application is under no load whatsoever to derive a baseline. To do this, we run a single virtual user in a loop against every touchpoint, e.g., the endpoints on the APIs and batch job triggers, and ensure we are well defined with respect to transaction detection.  While I don’t put much emphasis on test harnesses here, it should be relatively obvious that part of the equation on deriving quality baselines depends heavily on your ability to invoke your application to do important things. Make sure you are investing time in building out your ability to test as much of the app as seems reasonable.

Baseline tests represent the best case scenario, i.e., our business transactions are never going to respond in less than this amount of time. I generally like to have a single instance of the current release candidate running continuously and reporting to a monitoring solution for this.  With every service running in a loop, I have baselines for the fastest and slowest calls and can easily pull up the transactions with the lowest and highest percent time spent on the CPU. I’ll discuss how to leverage this information in my next post.

2) Finding the breakpoints

Now that we have baselines, we want to ramp up a single instance of the service until it breaks. From a charting perspective, we are tracking response time and throughput for each transaction. We want to figure out the performance ceiling relatively quickly, in 30 minutes or less, because we don’t want 4-6 hours of data to comb through. We want to know the throughput at the breaking point, and we want to look at secondary metrics that might be trending as the system is breaking down.  Above all, we want to identify the root cause of the break and determine if we want to optimize.

As most app teams are only concerned with their code, we want to ensure we’re only testing our code and remove the possibility of a dependency being the cause of a breakage during this optimization testing. This is where it becomes really beneficial to have mocks for your downstream dependencies.

3) Scaling Factors

Instead of a single instance under load, with these tests we are looking at 2-5 instances, and we are load balancing among them. Essentially, we are trying to determine if the scaling factor is close to one to one. We want to understand if the application will perform better if I scale it horizontally by adding instances or if I scale it vertically by adding more memory, compute, and I/O shares.

4) Soaking

This pattern is designed to expose how well our software can recover from a high-stress situation. We want to take our application up to the breakpoint and then start backing off. We want to see how long we can sustain running a service at 80% or 90% of the breakpoint. And we want to know if we back off load and then increase the load again, is it stable? Is it resilient? A lot of applications are plagued with memory leaks and other antipatterns in code development that prevent them from recovering, and you have to restart the process. Soak tests offer the opportunity to uncover a lot of code deficiencies. Running them will help you lower your cost per transaction in terms of memory and compute resources.

Repeat

Every time you make a change, whether you are tweaking a configuration or refactoring the code, you start a new baseline, you find the breakpoint, identify the scaling factors, and validate behavior with soak tests. In this way you progressively fine-tune your code.

If you think about continuous performance testing, what we are doing is essentially striving for Six Sigma—the defect-eliminating methodology that drives toward six standard deviations between the mean and the nearest specification limit. We’re looking at run tests frequently enough and continuously enough that variations in underlying dependencies including software, hardware, virtualization, storage, latency, and network, are distilled away until you’re left with statistical deviations that are relevant and baselines that you can really rely on.

In my final post we’ll cover best practices for using AppDynamics to as part of your continuous performance testing initiative. Stay tuned!

Colin Fallwell is part of AppDynamics Global Services team, which is dedicated to helping enterprises realize the value of business and application performance monitoring. AppDynamics’ Global Services’ consultants, architects, and project managers are experts in unlocking the cross-stack intelligence needed to improve business outcomes and increase organizational efficiency.

 

 

The Importance of Application Decomposition in App Performance Testing

DevOps is changing the way companies develop and maintain software. By embedding operations engineers into software development teams, companies are reducing the average number of days from code completion to live production and eliminating wait time and rework, among other benefits. But as I pointed out in my previous post “Performance Testing in a DevOps World” performance testing remains the weakest link in the process.

Along with continuous integration and continuous delivery (CI/CD), companies need to practice continuous performance testing to fully realize the benefits of DevOps. Proper instrumentation is crucial to ensuring you are collecting rock-solid metric data, and having a repeatable process for collecting is something that will benefit you regardless of how you are finding correlation. The old adage, “garbage in, garbage out” still applies and always will.

My process starts with what I like to call “Application Decomposition.” In essence, I want to understand all the dimensions or attributes about each transaction in an application or service. Most, if not all, modern APM software like AppDynamics uses bytecode injection to tag and trace transactions from known entry points to exit points in code. In this way, a transaction comes in to an API, or starts on a particular class/method, and will trace across threads and exit with a header tag that the next service running an agent is able to correlate.

Where the bytecode injection is configured is of much importance as is how we define the transactions to the proper degree of uniqueness. To illustrate this, let us assume we have an API that offers an endpoint for a ShoppingCart.

Now, let us assume that this ShoppingCart endpoint can do a few things depending on the HTTP method used (i.e. GET, PUT POST, etc). Each one of these methods invokes a different path through the code. To decompose the app, we want to capture each of these as their own transaction by monitoring the servlet and split on the HTTP methods.

Once you have configured your transaction, you will want to call the endpoint with each of the HTTP methods and preview to make sure everything is working right. Once this is done, save the settings and start some load. Enabling developer mode will ensure you are capturing snapshots so you can analyze the call stack and understand how the application ticks. Make note of how much time is spent in data transformation, on backend calls, in DB, on CPU, and disk (if applicable). What are its dependencies? And so on. App decomposition is about understanding the fundamental behavior of what an app or service does, and how it does it. I can’t tell you how many times I have done this and found opportunity for optimizing an app without ever driving load. Time spent doing this is never wasted time.

Evaluating how you derive metrics is a crucial step in building or improving your performance engineering program. While the strategy does change depending on the framework in question the principles are the same. You will want to develop a strategy of monitoring the mission-critical code paths, so that the like transactions are bucketed and baselined under the same business transaction names. If you combine too many different transactions together (or fail to split them properly) your baselines will reflect the differences in the characteristics of the transactions and the rate they are being processed rather than changes in system performance. While you are at it, make sure you are naming these transactions in a way that will have meaning not only to your DevOps engineers, but to the Business as well.

When decomposing apps for my customers, I prefer to script up a load-generator that will run a single virtual user through each of the transactions, over and over on a loop. Ideally I do this in a quiet environment where nothing else is happening, mostly because I am just after a nice, clean dataset that I can dive deep into and understand. I capture all the data on the request and response as I really want to understand primarily if any of those attribute values of changes result in significant changes in response time or resource consumption. If they do, chances are we are invoking a different code path, and I want to break it out into its own transaction.

If you have decomposed the app properly, as you progress into driving some load the performance profile of a business transaction should be deterministic. Meaning, you should see a response-time hockey stick in your chart data, and not the Rocky Mountains. If the response time varies greatly, and seemingly for no reason, that could be an indication that there is something wrong with the system, test harness, or the app is improperly decomposed. There are edge cases where apps are just not well written and don’t align with performance engineering dev practices. Nondeterministic interfaces are more common in legacy code and sometimes it is not easy to get predictable response times based on the data present in headers or request payloads. In these cases, we may elect to add logic to derive metrics from the code being executed either through Data Collectors or through the use of our Agent SDKs.

Most business transactions can be broken down easily, however, making it relatively easy to do continuous testing, particularly if you have a unified APM solution like AppDynamics that will automate much of the process.

Now that you’ve ensured the quality of your metric data, you’ll want to think about testing. In my next blog post, “Five Use Cases for Performance Testing in a DevOps World,” I’ll cover recommended test patterns for improving the efficiency and reliability of your application.

Colin Fallwell is part of AppDynamics Global Services team, which is dedicated to helping enterprises realize the value of business and application performance monitoring. AppDynamics’ Global Services’ consultants, architects, and project managers are experts in unlocking the cross-stack intelligence needed to improve business outcomes and increase organizational efficiency.

 

Continuous Performance Testing in a DevOps World

The benefits of adopting a DevOps approach are widely known. By unifying developer and operations groups and emphasizing monitoring and automation, companies are increasing the speed and frequency of deployments and recovering faster from failures. The results of firms that successfully implement DevOps can be eye-opening. According to the 2017 State of DevOps report produced by Puppet and DORA (DevOps Research and Assessment), high-performing DevOps organizations reported 46 times more frequent code deployments and 440 times faster lead times from commit to deploy. Results like these are inspiring more companies to adopt DevOps. A separate survey by Forrester Research found 50% of companies implemented or expanded DevOps initiatives last year. Another 27% are planning to adopt DevOps by the fall of 2018.

Historically, however, the failure of DevOps initiatives is high. In 2015, Ian Head, a research director at Gartner, famously predicted that “90% of I&O organizations attempting to use DevOps without specifically addressing their cultural foundations will fail.”

In this four-part blog series, I argue that continuous performance testing holds the key to unlocking organizational transformation and DevOps success, and I lay out a methodology for creating an effective performance engineering program. This adaptable methodology is one that I have developed over years of trial and error working as a performance engineering lead and architect. The approach I recommend supports not just CI/CD, traditionally understood as continuous integration/continuous delivery, but also leads directly to a culture of continuous improvement in process, borrowing from Lean-Agile and Six Sigma concepts.

A NASA mindset

To be good at DevOps, the latter needs to embrace the former. Ops needs to get on the Agile train. And, once ops is onboard, everyone will discover the importance of striving for Lean-Agile. Lean-Agile is rooted in Lean Manufacturing and focuses on eliminating waste. By reducing waste, you will increase the speed at which you get things done. Good CI/CD and DevOps not only continuously improves code and code quality, it enables the continuous improvement of the processes and automation servicing the Software Development Life Cycle.

Think about how much time systems in QA, UAT, and other lower environments sit idle.  If you are like many organizations there is an immensity of wasted compute resources that can be converted into productive time by implementing and automating continuous testing.

Simply decreasing idle time is not enough, however, highly optimized processes for gathering metric data are vital if you are to be successful. To have really good metric data and telemetry, you need to approach performance testing like NASA does.

On a NASA mission, the failure of even a small component can be catastrophic. Long before an initial launch occurs, components are modeled with equations, assembled into composite systems, working their way up to increasingly complex systems, all with rigor in testing to ensure all variables are known. By the time a system is ready for launch, engineers fully understand all the environmental variables that lead to component failure and have optimized the processes for achieving the desired outcome.

In performance testing of software systems, variations or deviations in metrics during component tests must likewise be completely understood. They must have a positive or negative correlation coefficient to other metrics. If a metric deviation exists with a neutral coefficient, meaning the deviation is uncorrelated to another variable or cannot be explained, you cannot predict its behavior. In the modern software-defined world, when companies implement application performance monitoring absent a well-defined strategy this is an all-too-common problem faced by DevOps. While AI and ML promise to rescue us, it’s still vital that teams understand why metrics deviate and strive to deeply understand the relationship between the variables that cause those deviations.

Organizations need to test mission-critical code with NASA-like rigor. You may be thinking that such meticulousness would lead to impossible bottlenecks. In fact, the opposite happens. By break-testing individual services continuously rather than trying to performance test the entire system, you build confidence in every components’ tolerances and their impact on parent and child dependencies. Organizations will eliminate waste and achieve “Leanness” by continuously running a multitude of small, repeatable, concurrent tests. Coupled with precision monitoring and pipeline automation serving as the basis for amplified feedback loops, it supercharges the CI/CD pipeline and DevOps is unchained.

Hobbled by old habits

During the days of monolithic applications, the approach of discreetly testing individual components would have been hard, if not impossible. Today, I work with many organizations that are investing heavily in decomposing legacy apps into microservices, yet they still struggle to shift the mindset of their teams toward more effective component-level and scale-model testing.

Instead of testing individual services of an application with fast, repeatable performance tests and leveraging mocks for dependencies, many organizations run performance tests against the entire system as if it were still monolithic.

Performance teams spend an incredible amount of time setting up, configuring, and stressing an application at production-level load for 24, 36, or 72 hours to prove it’s stable, sometimes requiring developers and technical leads to bail out of the next sprint cycle to help out.

When these large-scale tests do break, it’s often hard to pinpoint the failure because they cannot be replicated consistently. Teams end up spending inordinate hours—days and sometimes weeks—troubleshooting issues and re-tuning parameters to keep an app from blowing up so they can release it to production.

Three steps to continuous testing

Three things need to happen for developers and operations engineers to break their old habits and achieve the impressive DevOps results mentioned earlier.

First, basic DevOps principles should be in place. If QA and performance teams are still siloed, they should be reorganized and rolled up under operations. Operations team members should then be embedded with development teams. Ops engineers should take an active partnering role in amplifying the feedback loops to developers by writing stories in the backlog for performance issues and by participating in scrums and sprint retros. These ops engineers should become the automation experts at isolating, replicating, and describing the environmental variables causing issues and ensuring that test harnesses are continuously improving. In this way, the pipeline becomes more efficient, giving developers, tech leads, QA engineers, and product managers direct insight into what is happening in every stage leading up to production.

Second, if your tests are large, you need to start breaking them up. The goal is to componentize the tests and run as many tests as you can in a half hour to an hour. This should be done at the API layer so that different services are tested at the same time but independently of one another. Each test should have an underpinning goal and should provide an answer to a specific what-if scenario.

Third, you want to replace downstream services with mocks wherever possible. This allows you to more easily test what-if scenarios for dependent services without relying on them to be up or stable.

As you continuously repeat the runs of smaller tests and begin to provide real-time feedback to developers, you should get to the point where you are able to form a hypothesis about how to improve your code and then quickly make the improvements. And, as you get into a more Lean-Agile state, you will be equipped to multiply the number of hypotheses that you are trying to derive answers for at any given time.

In today’s blog post, I’ve provided an overview of an approach to performance testing that enables effective DevOps, borrowing from Lean-Agile and Six-Sigma. In my next blog, “The Importance of Application Decomposition in Performance Testing,” I’ll lay out the basis for how to properly instrument your applications so you can begin collecting high-quality metric data.

Colin Fallwell is part of AppDynamics Global Services team, which is dedicated to helping enterprises realize the value of business and application performance monitoring. AppDynamics’ Global Services’ consultants, architects, and project managers are experts in unlocking the cross-stack intelligence needed to improve business outcomes and increase organizational efficiency.

Performance Testing for Modern Apps

The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. To understand how to test performance in modern applications, you must start by understanding performance is key to a great user experience.  The reality is the only performance metric that matters is the user’s perceived load time.

The value of performance

A common question is how fast is fast enough for a web application? A quick overview of key performance metrics.

Most engineering teams under you need to treat performance as a feature. When it comes to performance testing you should understand the baseline performance of your application. The performance of each transaction is unique. For example, in an e-commerce application, a homepage transaction is likely highly cached and very fast whereas a checkout transaction is more complicated and has to talk to a payment service, shipping service, etc. To ensure users have a great experience you need to test the most common flows of your users and understand performance in the browser and on the server.

Understanding server-side performance

Apache Bench and Siege are great for quick load tests of a single end point. If you just need to get a sense of the requests per second for an endpoint, these are a great solution. A more advanced approach and my personal preference is locust.io which is a load testing framework that enables complex transactions and can generate high levels of concurrency with ease.

  • Locust.io is a great tool for understanding the performance of the server side.

  • Bees with Machine Guns – A utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).

  • MultiMechanize – Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python.

  • Siege – Siege is an http load testing and benchmarking utility. It was designed to let web developers measure their code under duress, to see how it will stand up to load on the internet. Siege supports basic authentication, cookies, HTTP and HTTPS protocols. It lets its user hit a web server with a configurable number of simulated web browsers. Those browsers place the server “under siege.”

  • Apache Bench – AB is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression of how Apache performs.

  • HttpPerf – Httperf is a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macro-level benchmarks. The three distinguishing characteristics of httpperf are its robustness, which includes the ability to generate and sustain server overload, support for the HTTP/1.1 and SSL protocols, and its extensibility to new workload generators and performance measurements.

  • JMeter – Apache JMeter may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, databases and queries, FTP servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

Understanding client-side performance

Modern applications spend more time in the browser than on the server-side. The best tool to get started understanding client-side performance is Google PageSpeed Insights. Google PageSpeed Insights is a service to analyzes the content of a web page, and then generates suggestions to make that page faster.

  • Google PageSpeed Insights – PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates.

Understanding real-world performance

SiteSpeed.io is my favorite tool for evaluating the client-side performance from real browsers. Sitespeed.io is an open source tool that helps you analyze your website speed and performance based on performance best practices and timing metrics. You can analyze one site, analyze and compare multiple sites or let your continuous integration server break your build when your performance budget is exceeded.

It is not always possible for teams to modify the applications to optimize client-side performance. Google has invested in making ngx_pagespeed + mod_pagespeed as web server extensions to automate performance improvements without code changes.

  • Google ngx_pagespeed – ngx_pagespeed speeds up your site and reduces page load time. This open-source nginx server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

  • Google mod_pagespeed – mod_pagespeed speeds up your site and reduces page load time. This open-source Apache HTTP server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

  • Cloudflare, Incapsula, Torbit, Visual Website Optimizer are all commercial services that will proxy your website and automatically improve performance without code or infrastructure changes.

WebPageTest.org is an excellent utility for testing a web page in any browser, from any location, over any network condition for free. WebPageTest gives deep insight into the performance of the client-side in a variety of real browsers.

It is not always wise to build and manage your own performance testing tools and infrastructure. Through these services you can build, execute, and analyze performance tests.

  • Soasta – Build, execute, and analyze performance tests on a single, powerful, intuitive platform.

  • Apica  – Cloud-based load testing for web and mobile applications

  • Blitz.io – Blitz allows you to continuously monitor your app 24×7 from around the world. You can emulate a single user or hundreds of users all day, every day and be notified immediately if anything goes wrong.

  • Blazemeter – BlazeMeter is a self- service performance & load testing cloud, 100% JMeter-compatible. Easily run tests of 30k, 50k, 80k or more concurrent users, on demand.

Deriving insights from performance and load testing

The goal of performance testing is to understand how your applications behave under heavy load conditions. Performance testing is only as useful as the intelligence it yields about your application’s bottlenecks. When running performance tests, you should always instrument your applications and infrastructure to understand what breaks and why. APM tools enable you to see the performance of your applications and infrastructure in real-time. To completely understand performance modern teams leverage real user monitoring to get visibility into the real performance of your end users across many browsers and platforms.

 

Everyone knows you should treat performance as a feature. Hopefully, you now have a path to get started on capacity planning and load testing the server-side, optimizing and performance testing the client-side, and how to monitor performance from end to end to derive meaningful insights from performance tests. For a more in-depth step by step walkthrough of the tools mentioned see my presentation on Performance Testing for Modern Apps on SpeakerDeck.

Best Practices from the Field: Performance Testing with AppDynamics

Recently I’ve been working with some of my larger customers on ways they can revamp their performance testing using AppDynamics. During this process, what became evident was that performance testing with an APM tool is much different than performance testing without. As part of this, we uncovered a few best practices that I don’t see very often- even in more sophisticated users of APM. I think these principles apply whether you’re a household name testing a massive web application, or a startup just getting off the ground.

Performance testing is a very broad category, so for this post I’ll narrow it down to a few areas:

  • Maintaining current performance baselines
  • Optimizing performance and increasing throughput of the application

Maintaining Status Quo

Stop me if you’ve heard this one before: Every release after the build, the performance team steps in and runs their suite of tests. Their test harness spits out a couple of numbers. Based on what they’ve seen before, the build gets a thumbs up or down.

That’s great. You know if your static use case improved or degraded. Let’s step back now. Let’s say this application is made up of over ten different application tiers, each with a multitude of potential bottlenecks and knobs to turn to optimize performance. What the process absolved is–What changed? How do I fix it?

This instance came out of a real life scenario. We did a few pretty simple things from a current process that resembled what I just described, that provided some quick, dramatic changes. First we installed AppDynamics in their test environment and ran a test cycle.

From that information, we created a baseline from that time period.

Macintosh HD:Users:eric.smith:Desktop:Screen Shot 2016-02-05 at 2.26.05 PM.png

That allowed us to monitor and baseline not just average response time, but much more granular things like backend performance, JVM health, specific Business Transactions, etc.

We then set up health rules for key metrics across those baselines. Now, rather than just relying on the test suite’s metrics, we can test for performance deviations in depth.

Lastly, we automated this process. The process was already part of the CICD process so we attached alerts to each of these health rules so that the team can be alerted any time their build performance degraded and take action. Alternatively, we could have used these health rules to fail the build automatically via REST API or actions from the controller.

Optimizing Performance

Once we had reasonable assurance that application performance was stable, that freed up time to go attack performance problems. I see most performance teams doing this, but what’s lacking is an easy analysis of what’s causing those performance problems. Using AppDynamics here allowed us to iterate much more quickly than a traditional test/fix model.

From the same environment, we were able to turn up the heat a little with load. Each time we did this in different scenarios, we were able to identify bottlenecks in the environment- JDBC Connections, then Heap Usage, then JVM parameters, inefficient caching protocols, etc. During our concentrated effort over the course of 2 weeks, we found over 20 significant performance improvements and improved throughput of the system over 40% with that test suite. Arguably these would have been very challenging to do without deep visibility into the application under load.

Macintosh HD:Users:eric.smith:Google Drive:Demo:Blog Posts:Application Scaling:Blog Post Picture 5.png

What it All Means

While these techniques seem basic, I see a lot of test teams hesitant to engage in this way and potentially foregoing an opportunity to drive significant value. Hopefully, some of these strategies provide some new insight, happy testing!

Agile Performance Testing – Proactively Managing Performance

Just in case you haven’t heard, Waterfall is out and Agile is in.  For organizations that thrive on innovation, successful agile development and continuous deployment processes are paramount to reducing go to market time, fast tracking product enhancements and quickly resolving defects.

Executed successfully, with the right team in place, Agile practices should result in higher functional product quality.  Operating in small, focused teams that work well-defined sprints with clearly groomed stories is ideal for early QA involvement, parallel test planning and execution.

But how do you manage non-functional performance quality in an Agile model?  The reality is that traditional performance engineering, and testing, is often best performed over longer periods of time; workload characterizations, capacity planning, script development, test user creation, test data development, multi-day soak tests and more… are not always easily adaptable into 2-week, or shorter, sprints.  And the high-velocity of development change often cause continuous, and sometimes large, ripples that disrupt a team’s ability to keep up with these activities; anyone ever had a data model change break their test dataset?

Before joining AppDynamics I faced this exact scenario as the Lead Performance Engineer for PayPal’s Java Middleware team.  PayPal was undergoing an Agile transformation and our small team of historically matrix aligned, specialty engineers, was challenged to adapt.

Here are my best practices and lessons learned, sometimes the hard way, of how to adapt performance-engineering practices into an agile development model:

  1. Fully integrate yourself into the Sprint team, immediately.  My first big success at PayPal was the day I had my desk moved to sit in the middle of the Dev team.  I joined the water cooler talk, attended every standup, shot nerf missiles across the room, wrote and groomed stories as a core part of the scrum team.  Performance awareness, practices, and results organically increased because it was a well represented function within the team as opposed to an after thought farmed out to a remote organization.
  2. Build multiple performance and stress test scenarios with distinct goals and execution schedules.  Plan for longer soak and stress tests as part of the release process, but have one or more per-sprint, and even nightly, performance tests that can be continually executed to proactively measure performance, and identify defects as they are introduced.  Consider it your mission to quantify the performance impact of a code change.
  3. Extend your Continuous Integration (CI) pipelines to include performance testing.  At PayPal, I built custom integrations between Jenkins and JMeter to automate test execution and report generation.  Our pipelines triggered automated nightly regressions on development branches and within a well-understood platform where QA and development could parameterize workload, kick-off a performance test and interpret a test report.  Unless you like working 18-hour days, I can’t overstate the importance of building integrations into tools that are already or easily adopted by the broader team.  If you’re using Jenkins, you might take a look at the Jenkins Performance Plugin.
  4. Define Key Performance Indicators (KPIs).  In an Agile model you should expect smaller scoped tests, executed at a higher frequency.  It’s critical to have a set of KPIs the group understands, and buys into, so you can quickly look at a test and interpret if a) things look good, or b) something funky happened and additional investigation is needed. Some organizations have clearly defined non-functional criteria, or SLAs, and many don’t. Be Agile with your KPIs, and refine them over time. Here are some of the KPIs we commonly evaluated:
  • Percentile Response-Time – 90th, 95th, 99th – Summary and Per-Transaction
  • Throughput – Summary and Per-Transaction
  • Garbage Collector (GC) Performance – % non-paused time, number of collections (major and minor), and collection times.
  • Heap Utilization – Young Generation and Tenured Space
  • Resource Pools – Connection Pools and Thread Pools

5. Invest in best of breed tooling.  With higher velocity code change and release schedules, it’s essential to have deep visibility into your performance environment. Embrace tooling, but consider these factors impacted by Agile development: 

  • Can your toolset automatically, and continuously discover, map and diagnose failures in a distributed system without asking you to configure what methods should be monitored?  In an Agile team the code base is constantly shifting.  If you have to configure method-level monitoring, you’ll spend significant time maintaining tooling, rather than solving problems.
  • Can the solution be enabled out of the box under heavy loads?  If the overhead of your tooling degrades performance under high loads, it’s ineffective in a performance environment.  Don’t let your performance monitoring become your performance problem.

When a vendor recommends you reduce monitoring coverage to support load testing, consider a) the effectiveness of a tool which won’t provide 100% visibility, and b) how much time will be spent consistently reconfiguring monitoring for optimal overhead.

Performance testing within an Agile organization challenges us as engineers to adapt to a high velocity of change.  Applying best practices gives us the opportunity to work as part of the development team to proactively identify and diagnose performance defects as code changes are introduced.  Because the fastest way to resolve a defect in production is to fix it before it gets there.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

AppDynamics goes to QCon San Francisco

AppDynamics is at QCon San Francisco this week for another stellar event from the folks at InfoQ. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. If you are in the area this week stop by our booth and say hello!

I presented the Performance Testing Crash Course highlighting how to capacity plan and load test your applications to gaurantee a smooth launch.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Performance testing tools explained: The client side

In my last post I showed different approaches for load testing the server side. In this post I will highlight some tools I use to monitor the performance of the client side.

In modern Javascript-intensive web applications, users spend more time waiting on client-side rendering than the server-side processing. The reality is that as much effort that goes into optimizing the server-side, more effort should be made to optimize the client-side. End user monitoring has never been so important.

Why does performance matter?

Just to recap a few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance on both the server side and the client side and understand the impact of every release. Here are some tools you can use to test the performance of the entire end user experience:

Google PageSpeed Insights

Google PageSpeed Insights provides actionable advice for improving the performance of client-side web applications. PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates. The service is available as part of the Google Chrome Developer Tools as an extension, a web service, and extensions for Apache and Nginx.

Use the Google PageSpeed Insight API to integrate client-side optimizations into your continuous integration setup.

curl "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?url=http://dustinwhittle.com/&key=xxx"

WBench

WBench is a tool that uses the HTML5 navigation timing API to benchmark end user load times for websites.

1) Install WBench:

gem install wbench

2) Run WBench:

wbench http://dustinwhittle.com/

WebPageTest.org

WebPageTest.org enables anyone to test the client-side performance on a range of browsers from anywhere in the world for free. This service is great and worth every penny I didn’t pay for it. Not only does it provide a range of mobile/desktop browsers and locations, but it also shows a waterfall timeline and a video of the rendering.

Screen Shot 2013-10-23 at 3.52.50 PM

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track end user experience times and errors over the duration of the load tests:

With AppDynamics Pro End User Experience Dashboard you get visibility into both the server-side and the client-side:

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

In my next post in this series I will cover load testing tools for native mobile applications. Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Performance testing tools explained: the server side

The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on user experience and, ultimately, their bottom line. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. In my last post on performance testing I highlighted a long list of tools that can be used for load testing. In this post we will walk through three performance testing tools: Siege, MultiMechanize, and Bees with Machine Guns. I will show simple examples to get started performance testing your web applications regardless of the language.

Why performance matters?

A few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance and understand the impact of every release. As your production infrastructure evolves you should also track the impact of package and operating system upgrades. Here are some tools you can use to load test your production applications:

Apache Bench

Apache bench is simple tool for load testing applications provided by default with the Apache httpd server. A simple example to load test example.com with 10 concurrent users for 10 seconds.

Install Apache Bench:

apt-get install apache2-utils

Apache bench a web server with 10 conncurrent connections for 10 seconds:

ab -c 10 -t 10 -k http://example.com/

Benchmarking example.com (be patient)
Finished 286 requests

Server Software:        nginx
Server Hostname:        example.com
Server Port:            80

Document Path:          /
Document Length:        6642 bytes

Concurrency Level:      10
Time taken for tests:   10.042 seconds
Complete requests:      286
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2080364 bytes
HTML transferred:       1899612 bytes
Requests per second:    28.48 [#/sec] (mean)
Time per request:       351.133 [ms] (mean)
Time per request:       35.113 [ms] (mean, across all concurrent requests)
Transfer rate:          202.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9   20  14.2     17     148
Processing:   117  325  47.8    323     574
Waiting:      112  317  46.3    314     561
Total:        140  346  46.0    341     589

Percentage of the requests served within a certain time (ms)
  50%    341
  66%    356
  75%    366
  80%    372
  90%    388
  95%    408
  98%    463
  99%    507
 100%    589 (longest request)

 

Siege + Sproxy

Personally, I prefer Siege to Apache Bench for simple load testing as it is a bit more flexible.

Install Siege:

apt-get install siege

Siege a web server with 10 conncurrent connections for 10 seconds:

siege -c 10 -b -t 10S http://example.com/

** SIEGE 2.72
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263 hits
Availability:              100.00 %
Elapsed time:              9.36 secs
Data transferred:          0.35 MB
Response time:             0.35 secs
Transaction rate:          28.10 trans/sec
Throughput:                0.04 MB/sec
Concurrency:               9.82
Successful transactions:   263
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

More often than not you want to load test an entire site and not just a single endpoint. A common approach is to crawl the entire application to discover all urls and then load test a sample of those urls. The makers of Siege also make sproxy which in conjunction with wget enables you to crawl an entire site through a proxy and record all of the urls accessed. It makes for an easy way to compile a list of every possible url in your appplication.

 

1) Enable sproxy and specify to output all the urls to a file urls.txt:

sproxy -o ./urls.txt

2) Use wget with sproxy to crawl all the urls of example.com:

wget -r -o verbose.txt -l 0 -t 1 --spider -w 1 -e robots=on -e "http_proxy = http://127.0.0.1:9001" "http://example.com/"

3) Sort and de-duplicate the list of urls from our application:

sort -u -o urls.txt urls.txt

4) Siege the list of urls with 100 concurrent users for 3 minutes:

siege -v -c 100 -i -t 3M -f urls.txt

** SIEGE 2.72
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263- hits
Availability:              100.00 %
Elapsed time:              90.36 secs
Data transferred:          3.51 MB
Response time:             0.35 secs
Transaction rate:          88.10 trans/sec
Throughput:                0.28 MB/sec
Concurrency:               9.82
Successful transactions:   2630
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

Multi-Mechanize

When testing web applications sometimes you need to write test scripts that simulate virtual user activity against a site/service/api. Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python. Test output reports are saved as HTML or JMeter-compatible XML.

 

1) Install Multi-Mechanize:

pip install multi-mechanize

2) Bootstrapping a new multi mechanize project is easy:

multimech-newproject demo

import mechanize
import time

class Transaction(object):
    def run(self):
        br = mechanize.Browser()
        br.set_handle_robots(False)

        start_timer = time.time()
        resp = br.open('http://www.example.com/')
        resp.read()
        latency = time.time() - start_timer

        self.custom_timers['homepage'] = latency

        assert (resp.code == 200)
        assert ('Example' in resp.get_data())

3) Run the multi-mechanize project and review the outputted reports

multimech-run demo

multimech

 

Bees with Machine Guns

In the real world you need to test your production infrastructure with realistic traffic. In order to generate the amount of load that realistically represents production, you need to use more than one machine. The Chicago Tribune has invested in helping the world solve this problem by creating Bees with Machine Guns. Not only does it have an epic name, but it is also incredibly useful for load testing using many cloud instances via Amazon Web Services. Bees with Machine Guns is a utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).

1) Install Bees with Machine Guns:

pip install beeswithmachineguns

 

2) Configure Amazon Web Services credentials in ~/.boto:

[Credentials]

aws_access_key_id=xxx
aws_secret_access_key=xxx

[Boto]

ec2_region_name = us-west-2
ec2_region_endpoint = ec2.us-west-2.amazonaws.com

 

3) Create 2 EC2 instances using the default security group in the us-west-2b availabily zone using the ami-bc05898c image and login using the ec2-user user name.

bees up -s 2 -g default -z us-west-2b -i ami-bc05898c -k aws-us-west-2 -l ec2-user

Connecting to the hive.
Attempting to call up 2 bees.
Waiting for bees to load their machine guns...
.
.
.
.
Bee i-3828400c is ready for the attack.
Bee i-3928400d is ready for the attack.
The swarm has assembled 2 bees.

 

4) Check if the ec2 instances are ready for battle

bees report

Read 2 bees from the roster.
Bee i-3828400c: running @ 54.212.22.176
Bee i-3928400d: running @ 50.112.6.191

 

5) Attack a url if the ec2 instances are ready for battle

bees attack -n 100000 -c 1000 -u http://example.com/

Read 2 bees from the roster.
Connecting to the hive.
Assembling bees.
Each of 2 bees will fire 50000 rounds, 125 at a time.
Stinging URL so it will be cached for the attack.
Organizing the swarm.
Bee 0 is joining the swarm.
Bee 1 is joining the swarm.
Bee 0 is firing his machine gun. Bang bang!
Bee 1 is firing his machine gun. Bang bang!
Bee 1 is out of ammo.
Bee 0 is out of ammo.
Offensive complete.
     Complete requests:   100000
     Requests per second: 1067.110000 [#/sec] (mean)
     Time per request:    278.348000 [ms] (mean)
     50% response time:   47.500000 [ms] (mean)
     90% response time:   114.000000 [ms] (mean)
Mission Assessment: Target crushed bee offensive.
The swarm is awaiting new orders.

6) Spin down all the EC2 instances

bees down

 

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track key response times and errors over the duration of the load tests:

metrics browser

Use the AppDynamics Scalability Analysis Report to evaluate the performance of your application against load tests:

Screen Shot 2013-10-15 at 5.44.52 PM

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.