Cloud Auto Scaling using AppDynamics

Are your applications moving to an elastic cloud infrastructure? The question is no longer if, but when – whether that is a public cloud, a private cloud, or a hybrid cloud.

Classic computing capacity models clearly indicate that over-provisioning is essential to keep up with peak loads of traffic while the over-provisioned capacity is largely left under-utilized during non-peak periods. Such over-provisioning and under-utilization can be avoided by moving to an elastic cloud-computing capacity model where just-in-time provisioning and deprovisioning can be achieved by automatically scaling up and down on-demand.


Cloud auto-scaling decisions are often made based on infrastructure metrics such as CPU Utilization. However, in a cloud or virtualized environment, infrastructure metrics may not be reliable enough for making auto-scaling decisions. Auto-scaling decisions based on application metrics, such as request-queue depth or requests per minute, are much more useful since the application is intimately familiar with conditions such as:

  • When the existing number of compute instances cannot handle the incoming arrival rate of traffic and must elastically scale up additional instances based on a high-watermark threshold on a given application metric

  • When it’s time to scale back down based on a low-watermark threshold on the same application metric.

Every application service can be expressed as a statistical model of traffic, queues and resources as shown in the diagram below.

  • For a given arrival rate λ, we need to maximize the service rate μ with an optimum value of n resources. Monitoring either the arrival rate  λ itself for synchronous requests or q depth for asynchronous requests will help us tune the application system to see if we need additional service compute instances to meet the demands of the current arrival rate.

  • Having visibility into this data allows us not only to find bottlenecks in the code but also possibly flaws in design and architecture. AppDynamics provides visibility into these application metrics.

The basic flow for auto-scaling using AppDynamics is shown in the diagram below:

Let’s take an example to illustrate how this actually works in AppDynamics. ACME Corporation has a multi-tier distributed online bookstore application running on AWS EC2:

The front-end E-Commerce tier is experiencing a very heavy volume of requests resulting in the tier going into a Warning (Yellow) state.

Now we will walk through the 6 simple steps that the ACME Corporation will use to exploit the Cloud Auto Scaling features of AppDynamics.


Step 1: Enable display of Cloud Auto Scaling features

 To do this, they first select “Setup-> My Preferences” and check the box to “Show Cloud Auto Scaling features” under “Advanced Features”:

Step 2: Define a Compute Cloud and an Image

Then they click on the Cloud Auto Scaling option at the bottom left of the screen:

 Next, they click on Compute Clouds and register a new Compute Cloud:

and fill in their AWS EC2 account info and credentials:

Next, they register a new image from which new instances of the E-Commerce tier nodes can be spawned:


and provide the details of that machine image:

By using the Launch Instance button, they can manually test whether it was successfully launched.

Step 3: Define a scale-up and a scale-down workflow

 Then, they define a scale-up workflow for the E-Commerce tier with a step to create a new compute instance from the AMI defined earlier:

Next, they define a scale-down workflow for the E-Commerce tier with a step to terminate a running compute instance from the same AMI:

Now, you may be wondering why these workflows are so simplistic and why there are no additional steps to rebalance the load-balancer after every new compute instance gets added or terminated. Well, the magic for that lies in the Ubuntu AMI that bootstraps the Tomcat JVM for the E-Commerce tier. It has the startup logic to automatically join the cluster and also has a shutdown-hook to automatically leave the cluster, by communicating directly with Apache load-balancer mod_proxy.

Step 4: Define an auto-scaling health rule

 Now, they define an auto-scaling health rule for the E-Commerce tier:and select the E-Commerce Server tier as the scope for the health rule:


and specify a Critical Condition as “Calls per Minute > 3500”, which in this case, represents the arrival rate  λ:

and a Warning Condition of “Calls per Minute > 3000”:

 Note: It is very important to choose the threshold values for Calls Per Minute in the Critical and Warning conditions very carefully, because failing to do so may result in scaling thrash.

Step 5: Define a scale-up policy

Now, they define a Scale Up Policy which will bind their newly defined Health Rule with  a Cloud Auto-scaling action:

Step 6: Define a scale-down policy

Finally, they define another policy that will invoke the Scale-down workflow when the Health rule violation is resolved.

And they’re done!

After a period of time when the Calls per Minute exceeds the configured threshold, they actually witness that the Auto-scaling Health rule was violated, as it shows up under the Events list:


When they drill down into the event, they can see the details of the Health Rule violation:


And when they click on the Actions Executed for the Cloud Auto-Scaling Workflows, they see:


Also, under Workflow executions, they see:

and when they drill-down into it, they see:


Finally, under the Machines  item under Cloud Auto Scaling, they can see the actual compute instance that was started as a result of Auto Scaling:

Thus, without any manual intervention, whenever the E-Commerce tier needs additional capacity indicated by the threshold of Calls Per Minute in the Auto-Scaling Health rule, it is automatically provisioned. Also, these additional instances are automatically released when the Calls Per Minute goes below that threshold.


AppDynamics has cloud connectors for all the major cloud providers:



If you have your own cloud platform, you can always develop your own Cloud Connector using the AppDynamics Cloud Connector API and SDKs that are available via the AppDynamics Community. Find out more in the AppDynamics Connector Development Guide. Our cloud connector code is all open-source and can be found on GitHub.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

Don’t Deploy Your Cloud Application Without Reading This – Part 1

Public cloud, private cloud, hybrid cloud, cloud bursting, cloud storming, elastic compute, IaaS, PaaS, SaaS, the list of terms goes on and on ad-nauseam. Like it or not, cloud computing has taken hold as an important design consideration in companies ranging from small startups to large established enterprises. The concepts and technologies behind cloud computing have been around for quite a long time now so why is it taking so long for so many companies to move their applications and realize the benefits that cloud computing offers?

Getting beyond the ridiculous fear of the unknown, security concerns are a major inhibitor to cloud adoption but between private cloud and a slew of security technologies and methods that should only impact a small portion of applications. The real problem, in my opinion, is that nobody wants to fail and suffer damage to their personal and/or corporate brands. I’ve seen so many companies make a poor transition to cloud computing and it impacts their revenue and customer retention.


Companies like Netflix, Orbitz, and Family Search have been tremendously successful with their cloud computing initiatives. Do they have better technologists than other companies? Are their processes better than others? Do they have special tools that nobody else has? Or have they made a commitment that is okay to fail as long as they fail fast and don’t repeat their mistakes? The answer might be a combination of all of the above depending upon which organizations we are talking about.

There is a wealth of information published on the internet about deploying applications to the cloud; there are companies that exist solely to help you move application to the cloud; there are even companies that exist to help you figure out IF you should move your application(s) to the cloud. I used to work for one of those companies and what we saw over and over again was that our clients really didn’t know how to get started down the path of moving their existing applications to a cloud environment. Even worse were the companies that thought they knew what it took to successfully migrate their application(s) but didn’t. All of these companies were missing crucial bits of information that would make the difference between a smooth and painless migration and a rough, frustrating migration.

toolbox-cloudThe tools, processes, and information you use in the planning, execution, and ongoing management of your cloud applications will make all of the difference between success and failure.

In this blog series I’ll discuss some of the key considerations related to planning and execution of migrating your applications to the cloud. I’ll cover a few important aspects of deciding IF you should move your applications to the cloud and then focus mostly on what happens after you’ve decided to go for it. Everything I discuss will be directly from my experience moving and monitoring cloud applications within an enterprise and as a consultant.

In my opinion it’s much harder to move an existing application than it is to set up a new application in the cloud. The good news is that there are common considerations for each of these scenarios so next week I’ll discuss the following:

Should we move or deploy to the cloud?
What can I monitor to ensure my users are not impacted in a negative way?

In future posts I’ll discuss the planning and migration phases, how to take advantage of cloud elasticity, and good ongoing management practices. I might even preview some awesome new features we’re cooking up to make management of your applications faster and easier (shhhhh, it’ll be our little secret).

Applications were failing long before Cloud came along

I’m fed up of reading about Cloud outages, largely because all applications are created and managed by the most dangerous species on the planet – the human being. Failure is inevitable in regards to everything the human being creates or touches, and for this reason alone I see no news in seeing the word “outage” in IT articles with or without Cloud mentioned.

What gets me the most is that applications, infra-structure and data centers were slowing down and blowing up long before “Clouds” became fashionable. They just didn’t make the news every other week when applications resided in “data-centers”–ah, the good old days. Just ask anyone who works in operations or help desk/app support whether they’ve worked a 38 hour week; I guess the vast majority will either laugh or slap you. If everything worked according to plan, IT would be a really dull place to work, help desk would be replaced with OK desk, and we’d have nothing to talk about in the office or pub.