Scaling Your Application Efficiently – Horizontal or Vertical?

Anyone deploying an application in production probably has some experience with scaling to meet increased demand. A generation ago, virtualization made scaling your application as simple as increasing your instance count or size. However, now with the advent of cloud, you can scale to theoretical infinity. Maybe you’ve even set up some auto-scaling based on underlying system metrics such as CPU, heap size, thread count, etc. Now the question changes from “Can I scale my environment to meet demand?” (if you add enough computing resources you probably can), to “How can I efficiently scale my infrastructure to accommodate my traffic, and if I’m lucky maybe even scale down when needed?” This is a problem I run into almost every day dealing with DevOps organizations.

If your application environment looks like this (if so, I’d love to be you):

Screen Shot 2015-03-17 at 10.57.36 AM

You can probably work your way through to the solution, eventually. Run a bunch of load tests, find a sweet spot of machine size based on the performance under the test parameters, and bake it into your production infrastructure. Add more instances to each tier when your CPU usage gets high. Easy. What if your application looks like this?

Screen Shot 2015-03-17 at 10.57.45 AM

What about when your application code changes? What if adding more instances no longer fixes your problem? (Those do cost money, and the bill adds up quickly…)

The complexity of the problem is that CPU bounding is only one aspect — most applications encounter a variety of bounds as they scale and they vary at each tier. CPU, memory, heap size, thread count, database connection pool, queue depth, etc. come into play from an infrastructure perspective. Ultimately, the problem breaks down to response time: how do I make each transaction as performant as possible while minimizing overhead?

The holy grail here is the ability to determine dynamically how to size my app server instances (right size), how many to create at each level (right scale) and when to create them (right time). Other factors come into play as well such as supporting infrastructure, code issues, and the database — but let’s leave that for another day.

Let me offer a simple example. This came into play recently when working with a customer analyzing their production environment. Looking at the application tier under light/normal load, it was difficult to determine what factors to scale, we ended up with this:

Screen Shot 2015-03-17 at 10.57.54 AM

Response time actually decreases toward the beginning of the curve (possibly a caching effect?). But if you look at the application under heavier load, things get more interesting. All of a sudden you can start to see how performance is affected as demand on the application increases:

Screen Shot 2015-03-17 at 10.58.02 AM

Looking at a period of heavy load in this specific application, hardware resources are actually still somewhat lightly utilized, even though response time starts to spike:

Screen Shot 2015-03-17 at 10.58.12 AM Screen Shot 2015-03-17 at 10.58.24 AM

In this application, it appears that response time is actually more closely correlated with garbage collection than any specific hardware bound.

While there is clearly some future effort here to look at garbage collection optimization, in this case optimizing best fit actually comes down to determining desired response time, maximum load for a given instance size maintaining that response time, and cost for that instance size. In a cloud scenario, instance cost is typically fairly easy to determine. In this case, you can normalize this by calculating volume/(instance cost) at various instance sizes to determine a better sweet spot for vertical scale.

Horizontal scale will vary somewhat by environment, but this tends to be more linear — i.e. each additional instance adds incremental bandwidth to the application.

There’s still quite a bit more room for analysis of this problem, like resource cost for individual transactions, optimal response time vs. cost to achieve that response time, synchronous vs. asynchronous design trade-offs, etc. but these will vary based on the specific environment.

Using some of these performance indicators from the application itself (garbage collection, response time, connection pools, etc.) rather than infrastructure metrics, we were able to quickly and intelligently right size the cloud instances under the current application release as well as determine several areas for code optimization to help improve their overall efficiency. While the code optimization is a forward looking project, the scaling question was in response to a near term impending event that needed to be addressed. Answering the question in this way allowed us to meet both the near term impending deadline, but also remain flexible enough to accommodate any forthcoming optimizations or application changes.

Interested to see how you can scale your environment? Check out a FREE trial now!