Performance Testing for Modern Apps

The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. To understand how to test performance in modern applications, you must start by understanding performance is key to a great user experience.  The reality is the only performance metric that matters is the user’s perceived load time.

The value of performance

A common question is how fast is fast enough for a web application? A quick overview of key performance metrics.

Most engineering teams under you need to treat performance as a feature. When it comes to performance testing you should understand the baseline performance of your application. The performance of each transaction is unique. For example, in an e-commerce application, a homepage transaction is likely highly cached and very fast whereas a checkout transaction is more complicated and has to talk to a payment service, shipping service, etc. To ensure users have a great experience you need to test the most common flows of your users and understand performance in the browser and on the server.

Understanding server-side performance

Apache Bench and Siege are great for quick load tests of a single end point. If you just need to get a sense of the requests per second for an endpoint, these are a great solution. A more advanced approach and my personal preference is locust.io which is a load testing framework that enables complex transactions and can generate high levels of concurrency with ease.

  • Locust.io is a great tool for understanding the performance of the server side.

  • Bees with Machine Guns – A utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).

  • MultiMechanize – Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python.

  • Siege – Siege is an http load testing and benchmarking utility. It was designed to let web developers measure their code under duress, to see how it will stand up to load on the internet. Siege supports basic authentication, cookies, HTTP and HTTPS protocols. It lets its user hit a web server with a configurable number of simulated web browsers. Those browsers place the server “under siege.”

  • Apache Bench – AB is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression of how Apache performs.

  • HttpPerf – Httperf is a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macro-level benchmarks. The three distinguishing characteristics of httpperf are its robustness, which includes the ability to generate and sustain server overload, support for the HTTP/1.1 and SSL protocols, and its extensibility to new workload generators and performance measurements.

  • JMeter – Apache JMeter may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, databases and queries, FTP servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

Understanding client-side performance

Modern applications spend more time in the browser than on the server-side. The best tool to get started understanding client-side performance is Google PageSpeed Insights. Google PageSpeed Insights is a service to analyzes the content of a web page, and then generates suggestions to make that page faster.

  • Google PageSpeed Insights – PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates.

Understanding real-world performance

SiteSpeed.io is my favorite tool for evaluating the client-side performance from real browsers. Sitespeed.io is an open source tool that helps you analyze your website speed and performance based on performance best practices and timing metrics. You can analyze one site, analyze and compare multiple sites or let your continuous integration server break your build when your performance budget is exceeded.

It is not always possible for teams to modify the applications to optimize client-side performance. Google has invested in making ngx_pagespeed + mod_pagespeed as web server extensions to automate performance improvements without code changes.

  • Google ngx_pagespeed – ngx_pagespeed speeds up your site and reduces page load time. This open-source nginx server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

  • Google mod_pagespeed – mod_pagespeed speeds up your site and reduces page load time. This open-source Apache HTTP server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

  • Cloudflare, Incapsula, Torbit, Visual Website Optimizer are all commercial services that will proxy your website and automatically improve performance without code or infrastructure changes.

WebPageTest.org is an excellent utility for testing a web page in any browser, from any location, over any network condition for free. WebPageTest gives deep insight into the performance of the client-side in a variety of real browsers.

It is not always wise to build and manage your own performance testing tools and infrastructure. Through these services you can build, execute, and analyze performance tests.

  • Soasta – Build, execute, and analyze performance tests on a single, powerful, intuitive platform.

  • Apica  – Cloud-based load testing for web and mobile applications

  • Blitz.io – Blitz allows you to continuously monitor your app 24×7 from around the world. You can emulate a single user or hundreds of users all day, every day and be notified immediately if anything goes wrong.

  • Blazemeter – BlazeMeter is a self- service performance & load testing cloud, 100% JMeter-compatible. Easily run tests of 30k, 50k, 80k or more concurrent users, on demand.

Deriving insights from performance and load testing

The goal of performance testing is to understand how your applications behave under heavy load conditions. Performance testing is only as useful as the intelligence it yields about your application’s bottlenecks. When running performance tests, you should always instrument your applications and infrastructure to understand what breaks and why. APM tools enable you to see the performance of your applications and infrastructure in real-time. To completely understand performance modern teams leverage real user monitoring to get visibility into the real performance of your end users across many browsers and platforms.

 

Everyone knows you should treat performance as a feature. Hopefully, you now have a path to get started on capacity planning and load testing the server-side, optimizing and performance testing the client-side, and how to monitor performance from end to end to derive meaningful insights from performance tests. For a more in-depth step by step walkthrough of the tools mentioned see my presentation on Performance Testing for Modern Apps on SpeakerDeck.

Best Practices from the Field: Performance Testing with AppDynamics

Recently I’ve been working with some of my larger customers on ways they can revamp their performance testing using AppDynamics. During this process, what became evident was that performance testing with an APM tool is much different than performance testing without. As part of this, we uncovered a few best practices that I don’t see very often- even in more sophisticated users of APM. I think these principles apply whether you’re a household name testing a massive web application, or a startup just getting off the ground.

Performance testing is a very broad category, so for this post I’ll narrow it down to a few areas:

  • Maintaining current performance baselines
  • Optimizing performance and increasing throughput of the application

Maintaining Status Quo

Stop me if you’ve heard this one before: Every release after the build, the performance team steps in and runs their suite of tests. Their test harness spits out a couple of numbers. Based on what they’ve seen before, the build gets a thumbs up or down.

That’s great. You know if your static use case improved or degraded. Let’s step back now. Let’s say this application is made up of over ten different application tiers, each with a multitude of potential bottlenecks and knobs to turn to optimize performance. What the process absolved is–What changed? How do I fix it?

This instance came out of a real life scenario. We did a few pretty simple things from a current process that resembled what I just described, that provided some quick, dramatic changes. First we installed AppDynamics in their test environment and ran a test cycle.

From that information, we created a baseline from that time period.

Macintosh HD:Users:eric.smith:Desktop:Screen Shot 2016-02-05 at 2.26.05 PM.png

That allowed us to monitor and baseline not just average response time, but much more granular things like backend performance, JVM health, specific Business Transactions, etc.

We then set up health rules for key metrics across those baselines. Now, rather than just relying on the test suite’s metrics, we can test for performance deviations in depth.

Lastly, we automated this process. The process was already part of the CICD process so we attached alerts to each of these health rules so that the team can be alerted any time their build performance degraded and take action. Alternatively, we could have used these health rules to fail the build automatically via REST API or actions from the controller.

Optimizing Performance

Once we had reasonable assurance that application performance was stable, that freed up time to go attack performance problems. I see most performance teams doing this, but what’s lacking is an easy analysis of what’s causing those performance problems. Using AppDynamics here allowed us to iterate much more quickly than a traditional test/fix model.

From the same environment, we were able to turn up the heat a little with load. Each time we did this in different scenarios, we were able to identify bottlenecks in the environment- JDBC Connections, then Heap Usage, then JVM parameters, inefficient caching protocols, etc. During our concentrated effort over the course of 2 weeks, we found over 20 significant performance improvements and improved throughput of the system over 40% with that test suite. Arguably these would have been very challenging to do without deep visibility into the application under load.

Macintosh HD:Users:eric.smith:Google Drive:Demo:Blog Posts:Application Scaling:Blog Post Picture 5.png

What it All Means

While these techniques seem basic, I see a lot of test teams hesitant to engage in this way and potentially foregoing an opportunity to drive significant value. Hopefully, some of these strategies provide some new insight, happy testing!

Agile Performance Testing – Proactively Managing Performance

Just in case you haven’t heard, Waterfall is out and Agile is in.  For organizations that thrive on innovation, successful agile development and continuous deployment processes are paramount to reducing go to market time, fast tracking product enhancements and quickly resolving defects.

Executed successfully, with the right team in place, Agile practices should result in higher functional product quality.  Operating in small, focused teams that work well-defined sprints with clearly groomed stories is ideal for early QA involvement, parallel test planning and execution.

But how do you manage non-functional performance quality in an Agile model?  The reality is that traditional performance engineering, and testing, is often best performed over longer periods of time; workload characterizations, capacity planning, script development, test user creation, test data development, multi-day soak tests and more… are not always easily adaptable into 2-week, or shorter, sprints.  And the high-velocity of development change often cause continuous, and sometimes large, ripples that disrupt a team’s ability to keep up with these activities; anyone ever had a data model change break their test dataset?

Before joining AppDynamics I faced this exact scenario as the Lead Performance Engineer for PayPal’s Java Middleware team.  PayPal was undergoing an Agile transformation and our small team of historically matrix aligned, specialty engineers, was challenged to adapt.

Here are my best practices and lessons learned, sometimes the hard way, of how to adapt performance-engineering practices into an agile development model:

  1. Fully integrate yourself into the Sprint team, immediately.  My first big success at PayPal was the day I had my desk moved to sit in the middle of the Dev team.  I joined the water cooler talk, attended every standup, shot nerf missiles across the room, wrote and groomed stories as a core part of the scrum team.  Performance awareness, practices, and results organically increased because it was a well represented function within the team as opposed to an after thought farmed out to a remote organization.
  2. Build multiple performance and stress test scenarios with distinct goals and execution schedules.  Plan for longer soak and stress tests as part of the release process, but have one or more per-sprint, and even nightly, performance tests that can be continually executed to proactively measure performance, and identify defects as they are introduced.  Consider it your mission to quantify the performance impact of a code change.
  3. Extend your Continuous Integration (CI) pipelines to include performance testing.  At PayPal, I built custom integrations between Jenkins and JMeter to automate test execution and report generation.  Our pipelines triggered automated nightly regressions on development branches and within a well-understood platform where QA and development could parameterize workload, kick-off a performance test and interpret a test report.  Unless you like working 18-hour days, I can’t overstate the importance of building integrations into tools that are already or easily adopted by the broader team.  If you’re using Jenkins, you might take a look at the Jenkins Performance Plugin.
  4. Define Key Performance Indicators (KPIs).  In an Agile model you should expect smaller scoped tests, executed at a higher frequency.  It’s critical to have a set of KPIs the group understands, and buys into, so you can quickly look at a test and interpret if a) things look good, or b) something funky happened and additional investigation is needed. Some organizations have clearly defined non-functional criteria, or SLAs, and many don’t. Be Agile with your KPIs, and refine them over time. Here are some of the KPIs we commonly evaluated:
  • Percentile Response-Time – 90th, 95th, 99th – Summary and Per-Transaction
  • Throughput – Summary and Per-Transaction
  • Garbage Collector (GC) Performance – % non-paused time, number of collections (major and minor), and collection times.
  • Heap Utilization – Young Generation and Tenured Space
  • Resource Pools – Connection Pools and Thread Pools

5. Invest in best of breed tooling.  With higher velocity code change and release schedules, it’s essential to have deep visibility into your performance environment. Embrace tooling, but consider these factors impacted by Agile development: 

  • Can your toolset automatically, and continuously discover, map and diagnose failures in a distributed system without asking you to configure what methods should be monitored?  In an Agile team the code base is constantly shifting.  If you have to configure method-level monitoring, you’ll spend significant time maintaining tooling, rather than solving problems.
  • Can the solution be enabled out of the box under heavy loads?  If the overhead of your tooling degrades performance under high loads, it’s ineffective in a performance environment.  Don’t let your performance monitoring become your performance problem.

When a vendor recommends you reduce monitoring coverage to support load testing, consider a) the effectiveness of a tool which won’t provide 100% visibility, and b) how much time will be spent consistently reconfiguring monitoring for optimal overhead.

Performance testing within an Agile organization challenges us as engineers to adapt to a high velocity of change.  Applying best practices gives us the opportunity to work as part of the development team to proactively identify and diagnose performance defects as code changes are introduced.  Because the fastest way to resolve a defect in production is to fix it before it gets there.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

AppDynamics goes to QCon San Francisco

AppDynamics is at QCon San Francisco this week for another stellar event from the folks at InfoQ. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. If you are in the area this week stop by our booth and say hello!

I presented the Performance Testing Crash Course highlighting how to capacity plan and load test your applications to gaurantee a smooth launch.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Performance testing tools explained: The client side

In my last post I showed different approaches for load testing the server side. In this post I will highlight some tools I use to monitor the performance of the client side.

In modern Javascript-intensive web applications, users spend more time waiting on client-side rendering than the server-side processing. The reality is that as much effort that goes into optimizing the server-side, more effort should be made to optimize the client-side. End user monitoring has never been so important.

Why does performance matter?

Just to recap a few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance on both the server side and the client side and understand the impact of every release. Here are some tools you can use to test the performance of the entire end user experience:

Google PageSpeed Insights

Google PageSpeed Insights provides actionable advice for improving the performance of client-side web applications. PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates. The service is available as part of the Google Chrome Developer Tools as an extension, a web service, and extensions for Apache and Nginx.

Use the Google PageSpeed Insight API to integrate client-side optimizations into your continuous integration setup.

curl "https://www.googleapis.com/pagespeedonline/v1/runPagespeed?url=http://dustinwhittle.com/&key=xxx"

WBench

WBench is a tool that uses the HTML5 navigation timing API to benchmark end user load times for websites.

1) Install WBench:

gem install wbench

2) Run WBench:

wbench http://dustinwhittle.com/

WebPageTest.org

WebPageTest.org enables anyone to test the client-side performance on a range of browsers from anywhere in the world for free. This service is great and worth every penny I didn’t pay for it. Not only does it provide a range of mobile/desktop browsers and locations, but it also shows a waterfall timeline and a video of the rendering.

Screen Shot 2013-10-23 at 3.52.50 PM

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track end user experience times and errors over the duration of the load tests:

With AppDynamics Pro End User Experience Dashboard you get visibility into both the server-side and the client-side:

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

In my next post in this series I will cover load testing tools for native mobile applications. Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Performance testing tools explained: the server side

The performance of your application affects your business more than you might think. Top engineering organizations think of performance not as a nice-to-have, but as a crucial feature of their product. Those organizations understand that performance has a direct impact on user experience and, ultimately, their bottom line. Unfortunately, most engineering teams do not regularly test the performance and scalability of their infrastructure. In my last post on performance testing I highlighted a long list of tools that can be used for load testing. In this post we will walk through three performance testing tools: Siege, MultiMechanize, and Bees with Machine Guns. I will show simple examples to get started performance testing your web applications regardless of the language.

Why performance matters?

A few statistics about the business impact of performance at major internet companies:

Performance Impact

As you iterate through the software development cycle it is important to measure application performance and understand the impact of every release. As your production infrastructure evolves you should also track the impact of package and operating system upgrades. Here are some tools you can use to load test your production applications:

Apache Bench

Apache bench is simple tool for load testing applications provided by default with the Apache httpd server. A simple example to load test example.com with 10 concurrent users for 10 seconds.

Install Apache Bench:

apt-get install apache2-utils

Apache bench a web server with 10 conncurrent connections for 10 seconds:

ab -c 10 -t 10 -k http://example.com/

Benchmarking example.com (be patient)
Finished 286 requests

Server Software:        nginx
Server Hostname:        example.com
Server Port:            80

Document Path:          /
Document Length:        6642 bytes

Concurrency Level:      10
Time taken for tests:   10.042 seconds
Complete requests:      286
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2080364 bytes
HTML transferred:       1899612 bytes
Requests per second:    28.48 [#/sec] (mean)
Time per request:       351.133 [ms] (mean)
Time per request:       35.113 [ms] (mean, across all concurrent requests)
Transfer rate:          202.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        9   20  14.2     17     148
Processing:   117  325  47.8    323     574
Waiting:      112  317  46.3    314     561
Total:        140  346  46.0    341     589

Percentage of the requests served within a certain time (ms)
  50%    341
  66%    356
  75%    366
  80%    372
  90%    388
  95%    408
  98%    463
  99%    507
 100%    589 (longest request)

 

Siege + Sproxy

Personally, I prefer Siege to Apache Bench for simple load testing as it is a bit more flexible.

Install Siege:

apt-get install siege

Siege a web server with 10 conncurrent connections for 10 seconds:

siege -c 10 -b -t 10S http://example.com/

** SIEGE 2.72
** Preparing 10 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263 hits
Availability:              100.00 %
Elapsed time:              9.36 secs
Data transferred:          0.35 MB
Response time:             0.35 secs
Transaction rate:          28.10 trans/sec
Throughput:                0.04 MB/sec
Concurrency:               9.82
Successful transactions:   263
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

More often than not you want to load test an entire site and not just a single endpoint. A common approach is to crawl the entire application to discover all urls and then load test a sample of those urls. The makers of Siege also make sproxy which in conjunction with wget enables you to crawl an entire site through a proxy and record all of the urls accessed. It makes for an easy way to compile a list of every possible url in your appplication.

 

1) Enable sproxy and specify to output all the urls to a file urls.txt:

sproxy -o ./urls.txt

2) Use wget with sproxy to crawl all the urls of example.com:

wget -r -o verbose.txt -l 0 -t 1 --spider -w 1 -e robots=on -e "http_proxy = http://127.0.0.1:9001" "http://example.com/"

3) Sort and de-duplicate the list of urls from our application:

sort -u -o urls.txt urls.txt

4) Siege the list of urls with 100 concurrent users for 3 minutes:

siege -v -c 100 -i -t 3M -f urls.txt

** SIEGE 2.72
** Preparing 100 concurrent users for battle.
The server is now under siege...
Lifting the server siege...      done.

Transactions:              263- hits
Availability:              100.00 %
Elapsed time:              90.36 secs
Data transferred:          3.51 MB
Response time:             0.35 secs
Transaction rate:          88.10 trans/sec
Throughput:                0.28 MB/sec
Concurrency:               9.82
Successful transactions:   2630
Failed transactions:       0
Longest transaction:       0.54
Shortest transaction:      0.19

 

Multi-Mechanize

When testing web applications sometimes you need to write test scripts that simulate virtual user activity against a site/service/api. Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python. Test output reports are saved as HTML or JMeter-compatible XML.

 

1) Install Multi-Mechanize:

pip install multi-mechanize

2) Bootstrapping a new multi mechanize project is easy:

multimech-newproject demo

import mechanize
import time

class Transaction(object):
    def run(self):
        br = mechanize.Browser()
        br.set_handle_robots(False)

        start_timer = time.time()
        resp = br.open('http://www.example.com/')
        resp.read()
        latency = time.time() - start_timer

        self.custom_timers['homepage'] = latency

        assert (resp.code == 200)
        assert ('Example' in resp.get_data())

3) Run the multi-mechanize project and review the outputted reports

multimech-run demo

multimech

 

Bees with Machine Guns

In the real world you need to test your production infrastructure with realistic traffic. In order to generate the amount of load that realistically represents production, you need to use more than one machine. The Chicago Tribune has invested in helping the world solve this problem by creating Bees with Machine Guns. Not only does it have an epic name, but it is also incredibly useful for load testing using many cloud instances via Amazon Web Services. Bees with Machine Guns is a utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).

1) Install Bees with Machine Guns:

pip install beeswithmachineguns

 

2) Configure Amazon Web Services credentials in ~/.boto:

[Credentials]

aws_access_key_id=xxx
aws_secret_access_key=xxx

[Boto]

ec2_region_name = us-west-2
ec2_region_endpoint = ec2.us-west-2.amazonaws.com

 

3) Create 2 EC2 instances using the default security group in the us-west-2b availabily zone using the ami-bc05898c image and login using the ec2-user user name.

bees up -s 2 -g default -z us-west-2b -i ami-bc05898c -k aws-us-west-2 -l ec2-user

Connecting to the hive.
Attempting to call up 2 bees.
Waiting for bees to load their machine guns...
.
.
.
.
Bee i-3828400c is ready for the attack.
Bee i-3928400d is ready for the attack.
The swarm has assembled 2 bees.

 

4) Check if the ec2 instances are ready for battle

bees report

Read 2 bees from the roster.
Bee i-3828400c: running @ 54.212.22.176
Bee i-3928400d: running @ 50.112.6.191

 

5) Attack a url if the ec2 instances are ready for battle

bees attack -n 100000 -c 1000 -u http://example.com/

Read 2 bees from the roster.
Connecting to the hive.
Assembling bees.
Each of 2 bees will fire 50000 rounds, 125 at a time.
Stinging URL so it will be cached for the attack.
Organizing the swarm.
Bee 0 is joining the swarm.
Bee 1 is joining the swarm.
Bee 0 is firing his machine gun. Bang bang!
Bee 1 is firing his machine gun. Bang bang!
Bee 1 is out of ammo.
Bee 0 is out of ammo.
Offensive complete.
     Complete requests:   100000
     Requests per second: 1067.110000 [#/sec] (mean)
     Time per request:    278.348000 [ms] (mean)
     50% response time:   47.500000 [ms] (mean)
     90% response time:   114.000000 [ms] (mean)
Mission Assessment: Target crushed bee offensive.
The swarm is awaiting new orders.

6) Spin down all the EC2 instances

bees down

 

AppDynamics

With AppDynamics Pro you get in-depth performance metrics to evaluate the scalability and performance of your application. Use the AppDynamics Pro Metrics Browser to track key response times and errors over the duration of the load tests:

metrics browser

Use the AppDynamics Scalability Analysis Report to evaluate the performance of your application against load tests:

Screen Shot 2013-10-15 at 5.44.52 PM

Use AppDynamics Pro to compare multiple application releases to see the change in performance and stability:

release

Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Tools of the trade for performance and load testing

Your application is fast and scalable, right? How do you know? How often do you run performance or load tests? In this post I will give an overview of the tools of the trade for performance and load testing web applications.

Open-Source performance testing tools

These tools allow you to load test your application for free. My preferred tool is Bees with Machine Guns — not just because of the epic name, but primarily because it uses Amazon’s EC2 to generate high levels of concurrency with ease.

  • Bees with Machine Guns – A utility for arming (creating) many bees (micro EC2 instances) to attack (load test) targets (web applications).
  • MultiMechanize – Multi-Mechanize is an open source framework for performance and load testing. It runs concurrent Python scripts to generate load (synthetic transactions) against a remote site or service. Multi-Mechanize is most commonly used for web performance and scalability testing, but can be used to generate workload against any remote API accessible from Python.
  • Siege – Siege is an http load testing and benchmarking utility. It was designed to let web developers measure their code under duress, to see how it will stand up to load on the internet. Siege supports basic authentication, cookies, HTTP and HTTPS protocols. It lets its user hit a web server with a configurable number of simulated web browsers. Those browsers place the server “under siege.”
  • HttpPerf – Httperf is a tool for measuring web server performance. It provides a flexible facility for generating various HTTP workloads and for measuring server performance. The focus of httperf is not on implementing one particular benchmark but on providing a robust, high-performance tool that facilitates the construction of both micro- and macro-level benchmarks. The three distinguishing characteristics of httpperf are its robustness, which includes the ability to generate and sustain server overload, support for the HTTP/1.1 and SSL protocols, and its extensibility to new workload generators and performance measurements.
  • Apache Bench – AB is a tool for benchmarking your Apache HTTP server. It is designed to give you an impression of how Apache performs.
  • JMeter – Apache JMeter may be used to test performance both on static and dynamic resources (files, Servlets, Perl scripts, Java Objects, databases and queries, FTP servers and more). It can be used to simulate a heavy load on a server, network or object to test its strength or to analyze overall performance under different load types. You can use it to make a graphical analysis of performance or to test your server/script/object behavior under heavy concurrent load.

Performance testing tools as a service

Through these services you can build, execute, and analyze performance tests.

  • Apica Load Test – Cloud-based load testing for web and mobile applications
  • Blitz.io – Blitz allows you to continuously monitor your app 24×7 from around the world. You can emulate a single user or hundreds of users all day, everyday and be notified immediately if anything goes wrong.
  • Soasta – Build, execute, and analyze performance tests on a single, powerful, intuitive platform.
  • Blazemeter – BlazeMeter is a self- service performance & load testing cloud, 100% JMeter-compatible. Easily run tests of 30k, 50k, 80k or more concurrent users, on demand.

Performance testing on the client side

The best place to get started is at Google with the Web Performance best practices.

  • Google PageSpeed Insights – PageSpeed Insights analyzes the content of a web page, then generates suggestions to make that page faster. Reducing page load times can reduce bounce rates and increase conversion rates.
  • Google ngx_pagespeed – ngx_pagespeed speeds up your site and reduces page load time. This open-source nginx server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.
  • Google mod_pagespeed – mod_pagespeed speeds up your site and reduces page load time. This open-source Apache HTTP server module automatically applies web performance best practices to pages, and associated assets (CSS, JavaScript, images) without requiring that you modify your existing content or workflow.

Web acceleration services

Through a simple DNS change, your website’s traffic is routed through these services and your content is optimized and cached globally for better performance. This is an easy way to improve performance with minimum efforts.

  • Yottaa – All-in-one web optimization solution delivers speed, scale, security and actionable insight for any website.
  • Cloudflare – Offers free and commercial, cloud-based services to help secure and accelerate websites.
  • Torbit – Torbit helps you accurately measure your website’s performance and quantify how speed impacts your revenue.
  • Incapsula – Incapsula offers state-of-the-art security and performance to websites of all sizes.

Recommended reading

A great book to learn about capacity planning and preparing for production traffic is The Art of Capacity Planning by John Allspaw. John Allspaw wrote the book from his experience scaling Flickr.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Cloud Migration Tips #3: Plan to Fail

Planning to deploy or migrate an application to a cloud environment is a big deal. In my last post we discussed the value of using real business and IT requirements to drive the justification of using a cloud architecture. We also explored the importance of using monitoring information to understand your before and after picture of application performance and overall success.

In this post I am going to dive deeper into the planning phase. You can’t expect to throw a half assed plan in place and just deal with problems as they pop up during an application migration. That will almost certainly result in frustration for the end users, IT staff, and the business who relies upon the application.

In reality, at least 90% of your total project time should be dedicated to planning and at most 10% to the actual implementation. The big question is “What are the most important aspects of the planning phase?”. Here’s my cloud migration planning top 10 list:

  1.  Application Portfolio Rationalization  – Let’s face reality for a moment… If you’re in a large enterprise you have multiple apps that perform a very similar business function at some level. Application Portfolio Rationalization is a method of discovering the overlap between your application and consolidating where it makes sense. It’s like spring cleaning for your IT department. You need to get your house in order before you decide to start moving applications or you will just waste a lot of time and money moving duplicate business functionality across your portfolio.
  2. Business Justification and Goal Identification – If there is one thing I try to make clear in every blog post it is the fact that you need to justify your activities using business logic. If there is no business driver for a change then why make the change? Even very techie-like activities can be related back to business drivers.
    Example… Techie Activity: Quarterly server patching Business Driver: Failure to patch exposes the business to risk of being hacked which could cause brand damage and loss of revenue.
    I included goal identification with business justification because your goals should align with the business drivers responsible for the change.
  3. Current State Architecture Assessment (App and Infra) – This task sounds simple but is really difficult for most companies. Current State Architecture Assessment is all about documenting the actual deployed application components, infrastructure components, and application dependencies. Why is this so difficult? Most enterprises have implemented a CMDB to try and document this information but the CMDB is typically manually populated and updated. What happens in reality is that over time the CMDB is neglected when application and infrastructure changes occur. In order to solve this problem some companies have turned to Automated discovery and dependency mapping tools. These tools are usually agentless so they login to each server and scan for processes, network connections, etc… at pre-defined intervals and create a very detailed mapping that includes all persistent connections to and from each host regardless of whether or not they are application related. The periodic scans also miss the short lived services calls between applications unless the scan happens to be at approximately the same time of the transient application call. An agent based APM tool covers all the gaps associated with these other methods.

    productionCraziness

    How well do you know the current architecture and dependencies of your application?

  4. Current State Performance Assessment – Traditional monitoring metrics (CPU, Memory, Disk I/O, Network I/O, etc…) will help you size your cloud environment but tell you nothing about the actual application performance. The important performance metrics encompass end user response time, business transaction response time, external service response time, error and exception rates, transaction throughput, with baselines for each. This is also a good time to make sure there are no glaring performance issues that you are going to promote into your cloud environment. It’s better to fix any known issues before you migrate as the extra complexity of the cloud can amplify your application problems.

    Screen Shot 2012-08-14 at 1.59.33 PM

    You don’t want to carry application problems into your new environment.

  5. Architectural Change Impact Assessment – Now that you know what your real application and infrastructure components are, you need to assess the impact of the difference between traditional and cloud architectures. Are there components that wont work well (or at all) in a cloud architecture? Are code changes required to take advantage of the dynamic features available in your cloud of choice? You need to have a very good understanding of how your application works today and how you want it to work after migration and plan accordingly.
  6. Problem Resolution Planning – Problem resolution planning is about making a commitment to your monitoring tools and strategy as a core layer of your overall application architecture. The number of potential points of failure increases dramatically from traditional to cloud environments due to increased virtualization and dynamic scaling. In highly distributed applications you need monitoring tools that will tell you exactly where problems are occurring or you will spend too much time isolating the problem location. Make monitoring a part of your application deployment and support culture!!!
  7. Process re-alignment – Just hearing the word “process” makes me cringe and have flashbacks to the giant, bloated , slow moving enterprise environments that I used to call my workplace. The unfortunate truth is that we really do need solid processes if we want to maintain order and have any chance of managing a large environment in a sustainable fashion. Many of the traditional IT development and operations processes need to be modified when we migrate to the cloud so you can’t overlook this task.
  8. Application re-development – The fruits of your Architectural Change Impact Assessment work will probably precipitate some level of development work within your application. Maybe only minor tweaks are required, maybe significant portions of your code need to change, maybe this application should never have been selected as a cloud migration candidate. If you need to change the application code you need to test it all over again and measure the performance.
  9. Application Functional and Performance Testing – No surprises here, after the code has been modified to function as intended with your cloud deployment it needs to be tested. APM tools really speed up the testing process since they show you the root of your performance problems down to the line of code level. If you rely only upon the output of your application testing suite your developers will spend hours trying to figure out what code to change instead of minutes fixing the problematic code.

    Call Graph

    Know exactly where the problem is whether it is in your code or an external service.

  10. Training (New Processes and Technology) – With all of those new and/or modified processes and new software required to support your cloud application training is imperative. Never forget the “people” part of “people, process, technology”.

There’s a lot more that really goes into planning a cloud migration but these major tasks are the big ones in my book. Give these 10 tasks the attention they deserve and the odds will shift in your favor for a successful cloud migration. Next week we’ll talk about important work that should happen after your application gets migrated.