Scaling Your Application Efficiently – Horizontal or Vertical?

Anyone deploying an application in production probably has some experience with scaling to meet increased demand. A generation ago, virtualization made scaling your application as simple as increasing your instance count or size. However, now with the advent of cloud, you can scale to theoretical infinity. Maybe you’ve even set up some auto-scaling based on underlying system metrics such as CPU, heap size, thread count, etc. Now the question changes from “Can I scale my environment to meet demand?” (if you add enough computing resources you probably can), to “How can I efficiently scale my infrastructure to accommodate my traffic, and if I’m lucky maybe even scale down when needed?” This is a problem I run into almost every day dealing with DevOps organizations.

If your application environment looks like this (if so, I’d love to be you):

Screen Shot 2015-03-17 at 10.57.36 AM

You can probably work your way through to the solution, eventually. Run a bunch of load tests, find a sweet spot of machine size based on the performance under the test parameters, and bake it into your production infrastructure. Add more instances to each tier when your CPU usage gets high. Easy. What if your application looks like this?

Screen Shot 2015-03-17 at 10.57.45 AM

What about when your application code changes? What if adding more instances no longer fixes your problem? (Those do cost money, and the bill adds up quickly…)

The complexity of the problem is that CPU bounding is only one aspect — most applications encounter a variety of bounds as they scale and they vary at each tier. CPU, memory, heap size, thread count, database connection pool, queue depth, etc. come into play from an infrastructure perspective. Ultimately, the problem breaks down to response time: how do I make each transaction as performant as possible while minimizing overhead?

The holy grail here is the ability to determine dynamically how to size my app server instances (right size), how many to create at each level (right scale) and when to create them (right time). Other factors come into play as well such as supporting infrastructure, code issues, and the database — but let’s leave that for another day.

Let me offer a simple example. This came into play recently when working with a customer analyzing their production environment. Looking at the application tier under light/normal load, it was difficult to determine what factors to scale, we ended up with this:

Screen Shot 2015-03-17 at 10.57.54 AM

Response time actually decreases toward the beginning of the curve (possibly a caching effect?). But if you look at the application under heavier load, things get more interesting. All of a sudden you can start to see how performance is affected as demand on the application increases:

Screen Shot 2015-03-17 at 10.58.02 AM

Looking at a period of heavy load in this specific application, hardware resources are actually still somewhat lightly utilized, even though response time starts to spike:

Screen Shot 2015-03-17 at 10.58.12 AM Screen Shot 2015-03-17 at 10.58.24 AM

In this application, it appears that response time is actually more closely correlated with garbage collection than any specific hardware bound.

While there is clearly some future effort here to look at garbage collection optimization, in this case optimizing best fit actually comes down to determining desired response time, maximum load for a given instance size maintaining that response time, and cost for that instance size. In a cloud scenario, instance cost is typically fairly easy to determine. In this case, you can normalize this by calculating volume/(instance cost) at various instance sizes to determine a better sweet spot for vertical scale.

Horizontal scale will vary somewhat by environment, but this tends to be more linear — i.e. each additional instance adds incremental bandwidth to the application.

There’s still quite a bit more room for analysis of this problem, like resource cost for individual transactions, optimal response time vs. cost to achieve that response time, synchronous vs. asynchronous design trade-offs, etc. but these will vary based on the specific environment.

Using some of these performance indicators from the application itself (garbage collection, response time, connection pools, etc.) rather than infrastructure metrics, we were able to quickly and intelligently right size the cloud instances under the current application release as well as determine several areas for code optimization to help improve their overall efficiency. While the code optimization is a forward looking project, the scaling question was in response to a near term impending event that needed to be addressed. Answering the question in this way allowed us to meet both the near term impending deadline, but also remain flexible enough to accommodate any forthcoming optimizations or application changes.

Interested to see how you can scale your environment? Check out a FREE trial now!

Cloud Auto Scaling using AppDynamics

Are your applications moving to an elastic cloud infrastructure? The question is no longer if, but when – whether that is a public cloud, a private cloud, or a hybrid cloud.

Classic computing capacity models clearly indicate that over-provisioning is essential to keep up with peak loads of traffic while the over-provisioned capacity is largely left under-utilized during non-peak periods. Such over-provisioning and under-utilization can be avoided by moving to an elastic cloud-computing capacity model where just-in-time provisioning and deprovisioning can be achieved by automatically scaling up and down on-demand.

(Source: http://blog.maartenballiauw.be)

Cloud auto-scaling decisions are often made based on infrastructure metrics such as CPU Utilization. However, in a cloud or virtualized environment, infrastructure metrics may not be reliable enough for making auto-scaling decisions. Auto-scaling decisions based on application metrics, such as request-queue depth or requests per minute, are much more useful since the application is intimately familiar with conditions such as:

  • When the existing number of compute instances cannot handle the incoming arrival rate of traffic and must elastically scale up additional instances based on a high-watermark threshold on a given application metric

  • When it’s time to scale back down based on a low-watermark threshold on the same application metric.

Every application service can be expressed as a statistical model of traffic, queues and resources as shown in the diagram below.

  • For a given arrival rate λ, we need to maximize the service rate μ with an optimum value of n resources. Monitoring either the arrival rate  λ itself for synchronous requests or q depth for asynchronous requests will help us tune the application system to see if we need additional service compute instances to meet the demands of the current arrival rate.

  • Having visibility into this data allows us not only to find bottlenecks in the code but also possibly flaws in design and architecture. AppDynamics provides visibility into these application metrics.

The basic flow for auto-scaling using AppDynamics is shown in the diagram below:

Let’s take an example to illustrate how this actually works in AppDynamics. ACME Corporation has a multi-tier distributed online bookstore application running on AWS EC2:

The front-end E-Commerce tier is experiencing a very heavy volume of requests resulting in the tier going into a Warning (Yellow) state.

Now we will walk through the 6 simple steps that the ACME Corporation will use to exploit the Cloud Auto Scaling features of AppDynamics.

 

Step 1: Enable display of Cloud Auto Scaling features

 To do this, they first select “Setup-> My Preferences” and check the box to “Show Cloud Auto Scaling features” under “Advanced Features”:

Step 2: Define a Compute Cloud and an Image

Then they click on the Cloud Auto Scaling option at the bottom left of the screen:

 Next, they click on Compute Clouds and register a new Compute Cloud:

and fill in their AWS EC2 account info and credentials:

Next, they register a new image from which new instances of the E-Commerce tier nodes can be spawned:

 

and provide the details of that machine image:

By using the Launch Instance button, they can manually test whether it was successfully launched.

Step 3: Define a scale-up and a scale-down workflow

 Then, they define a scale-up workflow for the E-Commerce tier with a step to create a new compute instance from the AMI defined earlier:

Next, they define a scale-down workflow for the E-Commerce tier with a step to terminate a running compute instance from the same AMI:

Now, you may be wondering why these workflows are so simplistic and why there are no additional steps to rebalance the load-balancer after every new compute instance gets added or terminated. Well, the magic for that lies in the Ubuntu AMI that bootstraps the Tomcat JVM for the E-Commerce tier. It has the startup logic to automatically join the cluster and also has a shutdown-hook to automatically leave the cluster, by communicating directly with Apache load-balancer mod_proxy.

Step 4: Define an auto-scaling health rule

 Now, they define an auto-scaling health rule for the E-Commerce tier:and select the E-Commerce Server tier as the scope for the health rule:

 

and specify a Critical Condition as “Calls per Minute > 3500”, which in this case, represents the arrival rate  λ:

and a Warning Condition of “Calls per Minute > 3000”:

 Note: It is very important to choose the threshold values for Calls Per Minute in the Critical and Warning conditions very carefully, because failing to do so may result in scaling thrash.

Step 5: Define a scale-up policy

Now, they define a Scale Up Policy which will bind their newly defined Health Rule with  a Cloud Auto-scaling action:



Step 6: Define a scale-down policy

Finally, they define another policy that will invoke the Scale-down workflow when the Health rule violation is resolved.

And they’re done!

After a period of time when the Calls per Minute exceeds the configured threshold, they actually witness that the Auto-scaling Health rule was violated, as it shows up under the Events list:

 

When they drill down into the event, they can see the details of the Health Rule violation:

 

And when they click on the Actions Executed for the Cloud Auto-Scaling Workflows, they see:

 

Also, under Workflow executions, they see:

and when they drill-down into it, they see:

 

Finally, under the Machines  item under Cloud Auto Scaling, they can see the actual compute instance that was started as a result of Auto Scaling:

Thus, without any manual intervention, whenever the E-Commerce tier needs additional capacity indicated by the threshold of Calls Per Minute in the Auto-Scaling Health rule, it is automatically provisioned. Also, these additional instances are automatically released when the Calls Per Minute goes below that threshold.

 

AppDynamics has cloud connectors for all the major cloud providers:

 

Contribute

If you have your own cloud platform, you can always develop your own Cloud Connector using the AppDynamics Cloud Connector API and SDKs that are available via the AppDynamics Community. Find out more in the AppDynamics Connector Development Guide. Our cloud connector code is all open-source and can be found on GitHub.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

Cloud Migration Tips Part 4: Failure Breeds Success

Welcome back to my series on migration to the cloud. In my last post we discussed all of the effort you need to put into the planning phase of your migration. In this post we are going to focus on what should happen directly after the migration has been completed.

Regardless of how well you planned or if you just decided to dive right in without any forethought, there are steps that need to be taken after your migration to ensure your application is working properly and performing up to snuff. These steps need to be performed whether you chose to use a public, private or hybrid cloud implementation.

Step 1: Take Your New Cloud Based Application for a Test Drive

Go easy at first and just roll through the functionality as a user would. If it doesn’t work well for you then you know it wont work well when there are a bunch of users hitting it.

Assuming things went well with your functional test it’s time to go bigger. Lay down a load test and see step 2 below.

Step 2: Monitoring is Not the Job of Your Users

If you’re relying on the users of your application to let you know if there are performance or stability issues you are already a major step behind your competition. If you planned properly then you have a monitoring system in place. If you’re just winging it, put in a monitoring system now!!!

Here are the things your monitoring tool should help you understand:

  • Architecture and Flow: You design an application architecture to support the type of application you are building. How do you really know if you have deployed the architecture you designed in the first place? How do you know if your application flow changes over time and causes problems? Cloud computing environments are dynamic and can shift at any given time. You need to have a tool in place that let’s you know exactly what happened, when and if it caused any impact.

E-Commerce Website Architecture

What happens if you don’t have a flow map? Simple, when there’s a problem you waste a bunch of time trying to figure out what components were involved in the problematic transaction so that you can isolate the problem to the right component.

  • Response Times: Slow sucks! You moved to the cloud for many potential reasons but one thing is certain, your users don’t want your application(s) to run slowly. It seems obvious to monitor the response time of your applications but I’m constantly amazed by how many organizations still don’t have this type of monitoring in place for their applications. There are really only 2 options in this category; let your users tell you when (notice I didn’t say if) your application is slow or have a monitoring tool alert you right away.

Screen Shot 2012-08-14 at 1.59.33 PM

  • Resources: You need to keep an eye on the resources you are consuming in the cloud. New instances of your application can quickly add up to a large expense if your code is inefficient. You need to understand how well your application scales under load and fix the resource hogs so that you can drive better value out of your application as usage increases.

resources

Step 3: Elasticity

Elasticity is a key benefit of migrating your application to the cloud. Traditional application architectures accounted for periodic spikes in workload by permanently over-allocating resources. Put simply, we used to buy a bunch of servers so that we could handle the monthly or yearly spikes in activity. Most of these servers sat nearly idle the rest of the year and generated heat.

If you’re going to take advantage of the inherent elasticity within your cloud environment you need to understand exactly how your application will respond to being overloaded and how your infrastructure adapts to this condition. Cloud providers have tools to execute the dynamic shift in resources but ultimately you need a tool to detect the trigger conditions and then interface with the dynamic provisioning features of your cloud.

The combination of slow transactions AND resource exhaustion would be a great trigger to spin up new application instances. Each condition on its own does not justify adding a new resource.

Screen Shot 2013-04-25 at 3.16.38 PM

Screen Shot 2013-04-25 at 3.20.05 PM

The point here is that migrating to the cloud is not a magic bullet. You need to know how to use the features that are available and you need the right tools to help you understand exactly when to use those features. You need to stress your new cloud application to the point of failure and understand how to respond BEFORE you set users free on your application. Your users will certainly break your application and during an event is not the proper time to figure out how to manage your application in the cloud.

Let failure be your guide to success. Fail when it doesn’t matter so that you can success when the pressure is on. The cloud auto-scaling features shown in this post are part of AppDynamics Pro 3.7. Click here to start your free trial today.