Managing the Performance of Cloud Based Applications

In the last post I covered several architectural techniques you can use to build a highly scalable, failure resistant application in the cloud. However, these architectural changes – along with the inherent unreliability of the cloud – introduce some new problems for application performance management. Many organizations rely on logging, profilers, and legacy application performance monitoring (APM) solutions to monitor and manage performance in the data center, but these strategies and solutions simply aren’t enough when you move into the cloud. Here are a few important considerations for choosing an APM solution that works in the cloud.

Business Transactions

Many monitoring solutions check for server availability and alert users when a server goes down. In the cloud, however, servers can come and go all the time, so alerting on availability will result in a lot of false positives. In addition, many of the server-level metrics that APM tools and server monitoring tools report are no longer as relevant as they were on a vertically scaled system. For example, what does 90% CPU utilization mean to the behavior of your cloud application? Does it mean there is an impending performance problem that needs to be addressed? Or does it mean that more servers need to be added into that tier? This goes for other metrics, too, like physical memory usage, JVM memory usage, thread usage, database connection pool usage, and so on. These are all good indicators of the performance of a single server, but when servers can come and go they’re no longer the best approximation of the performance of your application as a whole.

Instead, it’s best to understand performance in terms of Business Transactions. A business transaction is essentially a user request – for an eCommerce application, “Check out” or “Add to Cart” may be two important business transactions. Each business transaction includes all of the downstream activities until the end user receives a response (and perhaps more, if your application uses asynchronous communication). For example, an application may define a service that performs request validation, stores data in a database, and then publishes a request to a topic. A JMS listener might receive that message from the topic, make a call to an external service, and then store the data in a Hadoop cluster. All of these activities need to be grouped together into a single Business Transaction so that you can understand how every part of your system affects your end users.

Tiers

With these various tiers tracked at the Business Transaction level, the next step is to measure performance at the tier level. While it is important to know when a Business Transaction is behaving abnormally, it is equally as important to detect performance anomalies at the tier level. If the response time of a Business Transaction, as a whole, is slow by one standard deviation (which is acceptable) but one of its tiers is slower by a factor of three standard deviations, you may have a problem developing, even though it hasn’t affected your end users yet. Chances are the tier’s problem will evolve into a systemic problem that causes multiple Business Transactions to suffer.

Returning to our example from figure 3, let’s say the web service behaves well, but the topic listener is significantly slower than usual. The topic listener has not caused a problem in the Business Transaction itself, but it has slowed down enough to cause concern, so there might be an issue that needs to be addressed. Business Transactions, therefore, need to be evaluated both as a whole and at the tier level in order to identify performance issues before they arise. The only way to effectively monitor the performance of an application in a dynamic environment is to capture metrics at the Business Transaction level and the tier level.

Baselines

One of the most important reasons that many organizations move to the cloud is to be able to scale applications up and down rapidly as load changes. If the load on your application fluctuates dramatically over the day, week or year, the cloud will allow you to scale your application infrastructure efficiently to meet that load. However, most application monitoring tools are not equipped to handle such dramatic shifts in load or performance. Application monitoring tools that rely on static thresholds for alerting and data collection will create alert storms when load increases and miss potential problems when it decreases. You need to be able to understand what normal performance is for a given time of day, day of the week or time of the year, which is best done by baselining the performance of your application over time.

Baselining your application essentially means collecting data around how your application performs (or how a specific Business Transactions performs) at any given time. Having this data will allow you (or your APM solution) to determine if how your application is performing now is normal or if it might indicate a problem. Baselines can be defined on a per-hour basis over a period of time – for example, for the past 30 days, how has Checkout performed from 9:00am to 10:00am? In this configuration, the response time of a specific Business Transaction will be compared to the average response time for that Business Transaction over the past 30 days, between the hours of 9:00am and 10:00am. If the response time is greater than some measurable value, such as two standard deviations, then the monitoring system should raise an alert. Figure 4 attempts to show this graphically.

The average response time for this Business Transaction is about 1.75 seconds, with two standard deviations being between 1.5 seconds and 2 seconds, captured over the past 30 days. All incoming occurrences of this Business Transaction during this hour (9:00am to 10:00am in this example) will be compared to the average of 1.75 seconds, and if the response time exceeds two standard deviations from this normal (2 seconds), then an alert will be raised.

What happens if the behavior of your users differs from day to day or month to month? Your monitoring solution should be configurable enough to handle this. Banking applications probably have spikes in load twice a month when most people get paid, and eCommerce applications are inundated on Black Friday. By baselining the performance of these applications over the year, an APM tool could anticipate this load and expect slower performance during these times. Make sure your APM tool is configurable or intelligent enough that it can understand what’s “normal” behavior for your app.

Dynamic Application Mapping

Many monitoring solutions today require manual configuration to instrument and monitor a new server. If new servers are disappearing and appearing all the time, however, this will result in blind spots as you update the tool to reflect the new environment. This will quickly become untenable as your application scales. A cloud-ready monitoring tool must automatically detect and map the application in real time, so you always have an up-to-date idea of what your application looks like. For agent-based monitoring solutions, this can be accomplished by deploying your agent along with your application so that new nodes are automatically instrumented by your APM solution of choice.

 

Take five minutes to get complete visibility into the performance of your cloud applications with AppDynamics today.

Architecting for the Cloud

The biggest difference between cloud-based applications and the applications running in your data center is scalability. The cloud offers scalability on demand, allowing you to expand and contract your application as load fluctuates. This scalability is what makes the cloud appealing, but it can’t be achieved by simply lifting your existing application to the cloud. In order to take advantage of what the cloud has to offer, you need to re-architect your application around scalability. The other business benefit comes in terms of price, as in the cloud costs scale linearly with demand.

Sample Architecture of a Cloud-Based Application

Designing an application for the cloud often requires re-architecting your application around scalability. The figure below shows what the architecture of a highly scalable cloud-based application might look like.

The Client Tier: The client tier contains user interfaces for your target platforms, which may include a web-based user interface, a mobile user interface, or even a thick client user interface. There will typically be a web application that performs actions such as user management, session management, and page construction. But for the rest of the interactions the client makes RESTful service calls into the server.

Services: The server is composed of both caching services, from which the clients read data, that host the most recently known good state of all of the systems of record, and aggregate services that interact directly with the systems of record for destructive operations (operations that change the state of the systems of record).

Systems of Record: The systems of record are your domain-specific servers that drive your business functions. These may include user management CRM systems, purchasing systems, reservation systems, and so forth. While these can be new systems in the application you’re building, they are most likely legacy systems with which your application needs to interact. The aggregate services are responsible for abstracting your application from the peculiarities of the systems of record and providing a consistent front-end for your application.

ESB: When systems of record change data, such as by creating a new purchase order, a user “liking” an item, or a user purchasing an airline ticket, the system of record raises an event to a topic. This is where the idea of an event-driven architecture (EDA) comes to the forefront of your application: when the system of record makes a change that other systems may be interested in, it raises an event, and any system interested in that system of record listens for changes and responds accordingly. This is also the reason for using topics rather than using queues: queues support point-to-point messaging whereas topics support publish-subscribe messaging/eventing. If you don’t know who all of your subscribers are when building your application (which you shouldn’t, according to EDA) then publishing to a topic means that anyone can later integrate with your application by subscribing to your topic.

Whenever interfacing with legacy systems, it is desirable to shield the legacy system from load. Therefore, we implement a caching system that maintains the currently known good state of all of the systems of record. And this caching system utilizes the EDA paradigm to listen to changes in the systems of record and update the versions of the data it hosts to match the data in the systems of record. This is a powerful strategy, but it also changes the consistency model from being consistent to being eventually consistent. To illustrate what this means, consider posting an update on your favorite social media site: you may see it immediately, but it may take a few seconds or even a couple minutes before your friends see it. The data will eventually be consistent, but there will be times when the data you see and the data your friends see doesn’t match. If you can tolerate this type consistency then you can reap huge scalability benefits.

NoSQL: Finally, there are many storage options available, but if your application needs to store a huge amount of data it is far easier to scale by using a NoSQL document store. There are various NoSQL document stores, and the one you choose will match the nature of your data. For example, MongoDB is good for storing searchable documents, Neo4J is good at storing highly inter-related data, and Cassandra is good at storing key/value pairs. I typically also recommend some form of search index, such as Solr, to accelerate queries to frequently accessed data.

Let’s begin our deep-dive investigation into this architecture by reviewing service-oriented architectures and REST.

REpresentational State Transfer (REST)

The best pattern for dividing an application into tiers is to use a service-oriented architecture (SOA). There are two main options for this, SOAP and REST. There are many reasons to use each protocol that I won’t go into here, but for our purposes REST is the better choice because it is more scalable.

REST was defined in 2000 by Roy Fielding in his doctoral dissertation and is an architectural style that models elements as a distributed hypermedia system that rides on top of HTTP. Rather than thinking about services and service interfaces, REST defines its interface in terms of resources, and services define how we interact with these resources. HTTP serves as the foundation for RESTful interactions and RESTful services use the HTTP verbs to interact with resources, which are summarized as follows:

  • GET: retrieve a resource

  • POST: create a resource

  • PUT: update a resource

  • PATCH: partially update a resource

  • DELETE: delete a resource

  • HEAD: does this resource exist OR has it changed?

  • OPTIONS: what HTTP verbs can I use with this resource

For example, I might create an Order using a POST, retrieve an Order using a GET, change the product type of the Order using a PATCH, replace the entire Order using a PUT, delete an Order using a DELETE, send a version (passing the version as an Entity Tag or eTag) to see if an Order has changed using a HEAD, and discover permissible Order operations using OPTIONS. The point is that the Order resource is well defined and then the HTTP verbs are used to manipulate that resource.

In addition to keeping application resources and interactions clean, using the HTTP verbs can greatly enhance performance. Specifically, if you define a time-to-live (TTL) on your resources, then HTTP GETs can be cached by the client or by an HTTP cache, which offloads the server from constantly rebuilding the same resource.

REST defines three maturity levels, affectionately known as the Richardson Maturity Model (because it was developed by Leonard Richardson):

  1. Define resources

  2. Properly use the HTTP verbs

  3. Hypermedia Controls

Thus far we have reviewed levels 1 and 2, but what really makes REST powerful is level 3. Hypermedia controls allow resources to define business-specific operations or “next states” for resources. So, as a consumer of a service, you can automatically discover what you can do with the resources. Making resources self-documenting enables you to more easily partition your application into reusable components (and hence makes it easier to divide your application into tiers).

Sideline: you may have heard the acronym HATEOAS, which stands for Hypermedia as the Engine of Application State. HATEOAS is the principle that clients can interact with an application entirely through the hypermedia links that the application provides. This is essentially the formalization of level 3 of the Richardson Maturity Model.

RESTful resources maintain their own state so RESTful web services (the operations that manipulate RESTful resources) can remain stateless. Stateless-ness is a core requirement of scalability because it means that any service instance can respond to any request. Thus, if you need more capacity on any service tier, you can add additional virtual machines to that tier to distribute the load. To illustrate why this is important, let’s consider a counter-example: the behavior of stateful servers. When a server is stateful then it maintains some client state, which means that subsequent requests by a client to that server need to be sent to that specific server instance. If that tier becomes overloaded then adding new server instances to the tier may help new client requests, but will not help existing client requests because the load cannot be easily redistributed.

Furthermore, the resiliency requirements of stateful servers hinder scalability because of fail-over options. What happens if the server to which your client is connected goes down? As an application architect, you want to ensure that client state is not lost, so how to we gracefully fail-over to another server instance? The answer is that we need to replicate client state across multiple server instances (or at least one other instance) and then define a fail-over strategy so that the application automatically redirects client traffic to the failed-over server. The replication overhead and network chatter between replicated servers means that no matter how optimal the implementation, scalability can never be linear with this approach.

Stateless servers do not suffer from this limitation, which is another benefit to embracing a RESTful architecture. REST is the first step in defining a cloud-based scalable architecture. The next step is creating an event-driven architecture.

Deploying to the Cloud

This paper has presented an overview of a cloud-based architecture and provided a cursory look at REST and EDA. Now let’s review how such an application can be deployed to and leverage the power of the cloud.

Deploying RESTful Services

RESTful web services, or the operations that manage RESTful resources, are deployed to a web container and should be placed in front of the data store that contains their data. These web services are themselves stateless and only reflect the state of the underlying data they expose, so you are able to use as many instances of these servers as you need. In a cloud-based deployment, start enough server instances to handle your normal load and then configure the elasticity of those services so that new server instances are added as these services become saturated and the number of server instances is reduced when load returns to normal. The best indicator of saturation is the response time of the services, although system resources such as CPU, physical memory, and VM memory are good indicators to monitor as well. As you are scaling these services, always be cognizant of the performance of the underlying data stores that the services are calling and do not bring those data stores to their knees.

The above graphics shows that the services that interact with Document Store 1 can be deployed separately, and thus scaled independently, from the services that interact with Document Store 2. If Service Tier 1 needs more capacity then add more server instances to Service Tier 1 and then distribute load to the new servers.

Deploying an ESB

The choice of whether or not to use an ESB will dictate the EDA requirements for your cloud-based deployment. If you do opt for an ESB, consider partitioning the ESB based on function so that excessive load on one segment does not take down other segments.

 The importance of segmentation is to isolate the load generated by System 1 from the load generated by System 2. Or stated another way, if System 1 generates enough load to slow down the ESB, it will slow down its own segment, but not System 2’s segment, which is running on its own hardware. In our initial deployment we had all of our systems publishing to a single segment, which exhibited just this behavior! Additionally, with segmentations, you are able to scale each segment independently by adding multiple servers to that segment (if your ESB vendor supports this).

Cloud-based applications are different from traditional applications because they have different scalability requirements. Namely, cloud-based applications must be resilient enough to handle servers coming and going at will, must be loosely-coupled, must be as stateless as possible, must expect and plan for failure, and must be able to scale from a handful of server to tens of thousands of servers.

There is no single correct architecture for cloud-based applications, but this paper presented an architecture that has proven successful in practice making use of RESTful services and an event-driven architecture. While there is much, much more you can do with the architecture of your cloud application, REST and EDA are the basic tools you’ll need to build a scalable application in the cloud.

 

Take five minutes to get complete visibility into the performance of your cloud applications with AppDynamics today.

Monitoring Apps on the Cloud Foundry PaaS

At AppDynamics, we pride ourselves on making it easier to monitor complex applications. This is why we are excited to announce our partnership with Pivotal to make it easier to deploy built-in application performance monitoring to the cloud.

 

Getting started with Pivotal’s Cloud Foundry Web Service

Cloud Foundry is the open platform as a service, developed and operated by Pivotal. You can deploy applications to the hosted Pivotal Web Services (much like you host apps on Heroku) or you can run your own Cloud Foundry PaaS on premise using Pivotal CF. Naturally, Cloud Foundry is an open platform that is used and operated by many companies and service providers.

1) Sign up for a Pivotal CF account and AppDynamics Pro SaaS account

In the future, Pivotal Web Services will include the AppDynamics SaaS APM services, so you’ll only need to sign up for Pivotal Web Services and it will automatically create an AppDynamics account.

2) Download the Cloud Foundry CLI (Command Line Interface)

Pivotal Web Services has both a web based GUI as well as a full featured linux style command line interface (CLI). Once you have a PWS account, you can download a Mac, Windows or Unix CLI from the “Tools” tab in the PWS dashboard or directly for OSX, Linux, and Windows.

Pivotal Web Services CLI

3) Sign in with your Pivotal credentials

Using the CLI, log in to your Pivotal Web Services account. Remember to preface all commands given to Cloud Foundry with “cf”.  Individual Cloud Foundry PaaS clouds are identified by their domain API endpoint. For PWS, the endpoint is api.run.pivotal.io. The system will automatically target your default org (you can change this later) and ask you to select a space (a space is similar to a project or folder where you can keep a collection of app(s).

$ cf login

Cloud Foundry CLI 

Monitoring Cloud Foundry apps on Pivotal Web Services

Cloud Foundry uses a flexible approach called buildpacks to dynamically assemble and configure a complete runtime environment for executing a particular class of applications. Rather than specifying how to run applications, your developers can rely on buildpacks to detect, download and configure the appropriate runtimes, containers and libraries. The AppDynamics agent is built-in to the Java buildpack for easy instrumentation so if you have AppDynamics monitoring running, the Cloud Foundry DEA will auto-detect the service and enable the agent in the buildpack. If you start the AppDynamics monitoring for an app already running, just restart the app and the DEA will autodetect the new service.

1) Clone the Spring Trader demo application

The sample Spring Trader app is provided by Pivotal as a demonstration. We’ll use it to show how monitoring works. First git clone the app from the Github repository.

$ git clone https://github.com/cloudfoundry-samples/rabbitmq-cloudfoundry-samples

2) Create a user provided service to auto-discover the AppDynamics agent

$ cf create-user-provided-service demo-app-dynamics-agent -p “host-name,port,ssl-enabled,account-name,account-access-key”

Cloud Foundry CLI

Find out more about deploying on PWS in the Java buildpack docs.

3) Use the Pivotal Web Services add-on marketplace to add a cloud based AMQP + PostgreSQL database instance

$ cf create-service elephantsql turtle demo-db

$ cf create-service cloudamqp lemur demo-amqp

Cloud Foundry CLI

4) Bind PostgreSQL, AMQP, and AppDynamics services to app

$ git clone https://github.com/cloudfoundry-samples/rabbitmq-cloudfoundry-samples

$ cd rabbitmq-cloudfoundry-samples/spring

$ mvn package

$ cf bind-service demo-app demo-app-dynamics-agent

$ cf bind-service demo-app demo-amqp

$ cf bind-service demo-app demo-db

Cloud Foundry CLI

5) Push the app to production using the Cloud Foundry CLI (Command Line Interface)

$ cf push demo-app -i 1 -m 512M -n demo-app -p target/rabbitmq-spring-1.0-SNAPSHOT.war

Cloud Foundry CLI

Spring AMQP Stocks Demo App

Spring Trader

Pivotal Web Services Console

Pivotal PaaS CloudFoundry

 

 

Production monitoring with AppDynamics Pro

Monitor your critical cloud-based applications with AppDynamics Pro for code level visibility into application performance problems.

AppD Dashboard

Pivotal is the proud sponsor of Spring and the related open-source JVM technologies Groovy and Grails. Spring helps development teams build simple, portable, fast, and flexible JVM-based systems and applications. Spring is the most popular application development framework for enterprise Java. AppDynamics Java agent supports the latest Spring framework and Groovy natively. Monitor the entire Pivotal stack including TC server and Web Server, GreenPlum, RabbitMQ, and the popular Spring framework:

AppD

 

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

 

Taking Advantage of What the Cloud Has to Offer

Moving your application to the cloud isn’t as simple as porting your code over and configurations to someone else’s infrastructure – nor should it be. Cloud computing represents a shift in the world of application architecture from vertical scalability to horizontal scalability. This new paradigm offers organizations the opportunity to build highly scalable and dynamic applications. However, if you’re not careful or purposeful in how you prepare for the cloud, your application could suffer.

Horizontal vs. Vertical Scalability

The biggest fundamental difference between the cloud and your data center is  the cloud is typically run on commodity hardware, rather than the powerful machines in your data center. This means having to write an application that’s horizontally scalable instead of vertically scalable. Google probably best described what this means for your application architecture in their blog on highscalability.com:

A 1,000-fold computer power increase can be had for a 33 times lower cost if you use a failure-prone infrastructure rather than an infrastructure built on highly reliable components. You must build reliability on top of unreliability for this strategy to work.

In other words, it can be cost effective to run an application on cheaper, less reliable commodity servers instead of more expensive and powerful machines. But in order to be successful, the software component needs to be highly scalable – even infinitely scalable – and resistant to failure. These two requirements guide many of the architectural decisions of cloud pioneers like Netflix, and give a good indicator of what’s required to be successful in the cloud from a performance standpoint.

Architecting for the Cloud

Very few organizations have the same requirements from their applications that Netflix does. With tens of thousands of nodes in the Amazon EC2, Netflix is undoubtedly a pioneer in cloud architecture. Even though most organizations will never need the scale that Netflix does, these architectural practices and strategies are relevant to anyone building in or migrating to the cloud.

Here are a few of the ways Netflix takes advantage of the cloud, from a presentation by Netflix’s Director of Cloud Solutions, Ariel Tseitlin.

Service Oriented Architecture

The easiest way to accomplish horizontal scalability is with a service-oriented architecture. This is already pretty commonplace, but it’s especially important in the cloud, where you pay for the resources you consume – as your application scales, you can scale out only the services you need, which is more efficient (and cheaper) than scaling everything across the board. In addition, service-oriented architectures help manage concurrency. The following two diagrams demonstrate this point.

Fig. 1: The environment under normal load

Fig. 2: The environment after load changes

Auto-scaling

In the data center, scaling your application is expensive and time-consuming. In the cloud, it’s easy and (relatively) cheap. Netflix takes advantage of this, scaling their application up during the evenings when load increases and back down when peak viewing hours are over. Anyone with very dynamic load can take advantage of auto-scaling, which allows you to be cost-effective in the cloud without sacrificing performance.

Planning for Failure

One of the techniques Netflix is most famous for is simulating failure with its Simian Army. While this might not be a feasible approach for everyone, planning for failure is important for any cloud-based application – this is what Google is referring to when it talks about building “reliability on top of unreliability.” Your application needs to be able to survive failure at multiple levels – an individual node, a cluster, or perhaps even more.

Server instances can come and go at the drop of a hat, so they cannot store any state. Instead, Netflix groups server instances together into “clusters” and considers the behavior of the cluster as a whole.

These are just a few of the best practices for building a highly scalable cloud application. In order to take advantage of what the cloud has to offer, you need to rethink the architecture of your entire application. This also means you need to rethink your approach to managing the performance of your application in order to ensure the availability and performance of your application in the cloud.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

 

Black Friday and Cyber Monday thru the eyes of an APM solution

A week has passed since Black Friday, so I thought it would be a good idea to summarise what we saw at AppDynamics from monitoring one of several e-commerce applications in production.

Firstly, things went pretty well for our customers who experienced between 300 and 500% increase in transaction volume over the holiday period on their applications. Thats a pretty big spike in traffic for any application so its always good to look at those spikes and see what impact they had on application performance.

Here’s a screenshot which shows the load (top) and response time (bottom) of a major e-commerce production application during the thanksgiving period. The dotted line in both charts represents the dynamic baseline of normal activity. You can see on Black Friday (23rd) and Cyber Monday (26th) that transaction throughput was peaking between 24,000 and 31,000 tpm on the application, spiking between 150 and 200% over the normal load the application experiences throughput the rest of the year.

Application response time during the period had one blip during the first minutes of Black Friday (9pm PCT/Midnight EST) with no major performance issues following thru into Cyber Monday. The blip in the application related to the web container thread pool becoming exhausted during peak load when the Black Friday promotions went live. Below you can see throughput was hitting 23,000 tpm.

Two business transactions “Product Display” and “Checkout” were breaching their performance baselines during that period. Looking at the average response times of 516ms and 733ms tells one story, looking at the maximum response time and number of slow/very slow transactions (calculated using SD) tells a completely different story.

Let’s take a look at the execution of one individual “Product Display” business transaction that was classified as very slow with a 66 second response time.

When we drill into the code execution and SQL activity we can see a simple SELECT SQL query had a response time of 588ms, the problem in this transaction was that this query was invoked 102 times resulting in a whopping 59.9 seconds of latency, its therefore no surprise that thread concurrency inside the JVM was high waiting for transactions like these to complete. If these queries are simply pulling back product data then there is no reason why a distributed cache can’t be used to store the data instead of expensive calls to a remote database like DB2.

Let’s look at the other “Checkout” transaction which was breaching during the performance spike. Here is a checkout which took 9.1 seconds and deviated significantly from its performance baseline. You can see from the screenshot below the latency or bottleneck is again coming from the DB2 database:

Hardly surprising given most application scalability issues these days still relate to data persistence between the JVM and database. So let’s drill down into the JVM for this transaction and understand what exactly is being invoked in the DB2 database:

Above is the code execution of that transaction and you immediately see 8.5 seconds of latency is spent in an EJB call which is performing an update. Let’s take a look at the invoked queries as part of that update:

Nice, a simple update query was taking 8.4 seconds, notice all the other SQL queries associated with a single execution of the “Checkout” transaction. The application during this performance spike was clearly database bound and as a result a few code changes were made overnight that reduced the amount of database calls the application was making. We had one retail e-commerce customer last year who found a similar bottleneck, a fix was applied that reduced the number of database calls per minute from 500,000 to a little under 150,000. While the problem may at first appear to be a database issue (for the DBA) it was actually application logic and the developers who were responsible for resolving the issue.

You can see in the first screenshot that application response time was stable throughout the rest of the thanksgiving period , no spikes or outages occurred for this customer and all was well. While every customer will do their best to catch performance defects in pre-production and test, sometimes its not possible to reproduce or simulate real application usage or patterns, especially in large scale high throughput production environments. This is where Application Performance Management (APM) solutions like AppDynamics can help – by monitoring your application in production so you can see whats happening. Get started today with a free 30-day trial.

Appman