Why Application Intelligence + Network Intelligence Equals Better Business Outcomes

As more enterprises distribute applications not only between data centers, but also across data centers and multiple clouds, the application footprint is growing in size and complexity. And with companies increasingly relying on better end-to-end performance as a key requirement for business success, performance implications for these highly distributed and scalable applications are greater than ever.

Indeed, application performance in today’s hyper-connected social world directly impacts a business’s brand, revenues and customer stickiness. Application performance monitoring (APM) is critically important, of course, but APM can provide far better results when application and business performance metrics are leveraged to program the underlying network policy. The end result can be application-driven, end-to-end control that’s highly effective regardless of the underlying network/cloud infrastructure.

In this blog—the first in a series—we’ll examine the pain points associated with the lack of application and network correlation, and discuss the benefits of APM when correlated with underlying network visibility and monitoring. We’ll explore how business and application performance metrics and policy, when correlated with underlying network information, can provide the fastest root cause analysis (RCA). We’ll also look at how this integration between application and network performance can reduce the risk of unexpected application outages, simplify application deployment, and boost trust and understanding across teams. These benefits ultimately will lead to better customer experiences and business outcomes for critical application and business transactions.

The Benefits of Modern Apps

Most applications developed in recent years are highly distributed from the ground up. Traditional client-server models have given way to containerized, virtualized, distributed apps built using state-of-the-art frameworks, technologies and specialized third-party services. A modern app may even be written as a wrapper/enclosure for a legacy application in application-modernization projects. And the use of agile DevOps methods to develop and operate these apps can mean frequent rollouts and changes to production environments.

Modern apps are growing in complexity and scale. They’re capable of running in multiple environments and are accessible via myriad devices, including PCs, mobile gadgets and IoT endpoints. These apps traverse a variety of networks, from traditional data centers to multiple WAN links to the cloud. Within the datacenter (DC)—whether a private DC or a public cloud colo facility—the size and complexity of the underlying network is growing to support modern application deployment models and to scale as needed. All of this is driving the need for faster root-cause identification of problems.

But Modern Apps Can Bring Pain, Too

In contrast to the growing complexity of modern-app deployment, the end-user experience requires great simplicity. Complicating matters is the fact that users demand flawless app performance 24/7. Unsurprisingly, many pain points are associated with achieving this goal.

Let’s examine traditional network issues that adversely affect application performance, which is critical to finding root cause faster. As you’re aware, application slowdowns or failures lead to a poor end-user experience. These incidents can be caused by a number of network-related issues, including:

  • Incorrect network configuration for the application’s needs; something as simple as the duplex or speed of a switch port can cause big problems.

  • Firewall or load balancer misconfiguration—not allowing traffic for a particular application component.

  • Improper permissions that block good traffic from accessing an application service or,  conversely, allow bad traffic to access an app component or service.

  • Packet loss due to overwhelming load on a network device, insufficient bandwidth, or other factors.

  • Packet loops or extra inefficient hops in the network.

  • Network policies that inadvertently impact application performance such as discussed below.

A large portion of modern enterprise application traffic can be classified as east-west—in a datacenter environment, that’s traffic moving between application servers, databases, firewalls, load balancers and enterprise storage devices. Some network issues are unique to modern data centers and can adversely impact both application performance and the end-user experience. Examples include:

  • Wrong mapping of application requirement (policy) to underlying switch fabric/ports.

  • Incorrect switch configuration, causing fabric loops for data between systems, or incorrect drops.

  • Wrong or outdated storage access policy or configuration.

  • Inefficient virtual machine-to-physical port configuration, i.e., wrong virtual-to-physical (v-to-p) or physical-to-virtual (p-to-v) mappings.

  • Cabling issues on top-of-rack (TOR) or end-of-row switches (EOR).

  • Inefficient or wrong power budget, and other factors.

Cloud-related network issues can also impact app performance, including incorrect configuration of virtual private gateway, security group, virtual router capacity, and traditional DC and cloud DC gateway settings.

The Problem with IT Silos

Application outages and slowdowns are often technological in nature, although many are exacerbated by organizational issues. Most IT organizations evolve from silo-based org structures and skill sets, including app opps, datacenter network, wide area network, security, desktops, cloud, and so on. In many cases, these siloed organizations don’t communicate or work well together.

Furthermore, these silos often use their own set of tools for performance monitoring and troubleshooting—different tools for network monitoring of routers, switches, firewalls and load-balancers, for instance. And while these tools may do a decent job of detecting problems, they solve siloed problems for their respective domains.

Another issue is that these tools don’t provide cross-domain correlation, nor are they able to map application slowdowns to specific network issues. And while some tools attempt to do this, they don’t map from business transactions—how an end-user interacts with or uses the application all the way through the network—without extensive war-room involvement.

In production environments (where there is tremendous pressure from the business), these balkanized orgs and tools focus on silo-specific, “not-my-problem” outcomes that fail to resolve end user or customer problems. This phenomenon, known as mean-time-to-innocence (MTTI), zaps time, effort and energy from companies, resulting in a loss of productivity and customer stickiness.

How the Integration of Network, APM and Troubleshooting Brings Value to Ops Teams

The ability to see application performance issues in near-real time, correlated to underlying network performance, is exceptionally valuable. Mapping application changes and policies to underlying data center policy can go a long way toward driving efficiencies inside an organization, as more than three-fourths of data center traffic is east-west, according to Cisco’s Global Cloud Index.

The ability to dynamically discover application topology, as well as proactively identify application performance bottlenecks all the way down to a specific data center or network segment, can prove very beneficial to an organization.

This integration of network, APM and troubleshooting offers many benefits. Some key ones are:

  • Fastest app-to-network root cause analysis: Fast and flexible mapping of application changes to the underlying DC network. By mapping application policy to underlying network policy, network ops teams can receive application-driven information quickly. This increases productivity by avoiding war-room scenarios, and is by far the biggest benefit in modern networks and data centers where enterprise apps are deployed.

  • Reduced risk of unexpected application outages: When app ops can provide proactive alerts to network ops on specific network or data center slowdowns involving an application or business transaction, network ops can focus on the root cause to prevent further performance degradation and/or outages.

  • Simplified application deployment: the ability to generate network policy based on application topology (the whitelist model) helps simplify app deployment.

Finally, from an organizational perspective, correlated views can reduce mean-time-to-innocence. This helps app ops work better with network ops when reporting slowdowns to the business. A common dashboard with important KPIs makes this effort a lot easier. This cooperation not only promotes trust between app ops/devOps and network ops teams, it also provides a better operational view for the business.

A Major Win for App Ops, Network Ops, and the Business

The correlation of application performance metrics—from business transaction and end-user experience all the way through the underlying network—is critical for business and operational excellence. This shared view of application and network performance delivers key benefits such as reduced mean-time-to-innocence, better cross-team collaboration, and a simplified operational business model. Having an app-centric and business-level view of underlying network performance bottlenecks leads to greater customer satisfaction overall.

Schedule a demo to learn how AppDynamics and Cisco are working together to bring this visibility to enterprises everywhere.

Healthcare Reform and Application Performance Monitoring

Regardless of your political views, the healthcare reform is truly, and no pun intended, reforming healthcare in the United States. Everyone is probably familiar with the Affordable Care Act  (ACA) of 2010, or “Obamacare” which was enacted to increase the quality and affordability of healthcare in the United States. Another legislation which affects the healthcare industry was enacted in 2009 and it is commonly known as the “Stimulus”. Among the many provisions of the “Stimulus” or “The American Recovery and Reinvestment Act (ARRA)” are new regulations around Healthcare IT (HIT), chief among those is Meaningful Use (MU).

Broken out in 3 stages, the MU programs provide financial incentives for the “meaningful use” of certified Electronic Medical Records (EHR) technology. To receive an EHR incentive payment, providers have to show that they are using their certified EHR technology by meeting certain measurement thresholds that range from recording patient information as structured data to exchanging summary care records.

Screen Shot 2014-03-31 at 11.17.28 AM

The HIT Industry is Slow to Change

While the ARRA provides financial incentives to hospitals and eligible professionals to automate medical records (let’s call this the carrot), it also penalizes hospitals and eligible professionals that do not demonstrate and attest to MU by reducing Medicare and Medicaid reimbursements over time (let’s call this the stick).

MU will require change and, based on my years experience as an HIT consultant and application provider, the healthcare industry is slow to adopt change.

Prior to joining AppDynamics, I participated on a major system upgrade for a large hospital system. This upgrade was necessary for Meaning Use Attestation. The hospital was three major releases behind from the current release of their EHR software and the features of the software required for MU attestation where only available on the latest release. By the way, this upgrade latency is not uncommon in HIT.

Because of the severity of the change and the complexity of the environment, contingency plans were put in place to assure the hospital could continue to care for patients should any of the EHR components fail to upgrade properly. However, nothing could prepare the team for what happened next. 

A Problem Arises

Two days after the final upgrade outage, and just as everyone was ready to head home after a number of sleepless nights, a frantic call came into the upgrade command center. The call came from the nurse shift supervisor of the emergency department (ED). If you have ever met an ED nurse, you will understand it when I say that an ED nurse is not someone you want to upset.

The vast array of people caring for patients in an emergency department can be overwhelming. In order to provide visibility and bring order into a very intense operation, the ED relies on a number of critical tools. One such tool is the Tracking Board application. The Tracking Board application provides visibility into length of stay, staff assignment, room assignment, lab order tracking, patient criticality and many more vital data points. All of which can have a major effect on patient safety – the top priority for any healthcare professional.


The ED tracking board was unusable and the ED was operating in the dark. Without the visibility provided by the Tracking Board application, the ED was at a stand still. While far from ideal, in such situations, the ED shuts down. But because this particular hospital is the only Level I trauma center in the region, this wasn’t an option. Because the software was composed of other modules that were working properly and the technical dependencies among modules, a downgrade wasn’t an option either.

The command center became a war room; Clinical analysts from the hospital, project managers from both sides of the implementation, a large ensemble of high-level clinical and executives from the hospital, the entire infrastructure team, DBA’s, interface engine administrators, and developers from 3 different continents, were all locked in and given clear instruction: “Don’t leave until this issue is resolved”.

Screen Shot 2014-04-23 at 10.02.04 PM

Minutes turned into hour, then into days. The situation in the emergency room was coincidentally turning into an emergency itself. The Chief Medical Information Officer (CMIO) of the hospital brought the vendor project manager to tears and the ED nurses were gathering their torches and pitchforks and marching against the IT department. All appeared lost and after days of outage and close to 1,000 man-hours spent trying to find the root cause of the problem, everyone was ready to walk out.

The patients however, could not walk away. Many of them had life threatening conditions, and the queue outside the ED was only growing longer. Patient safety is job #1 for everyone in healthcare, and the unavailability of the ED Tracking Board application was affecting every patient’s safety!

AppDynamics to the Rescue! 

Clearly, it was time for an intervention. Unlike the TV reality series, this intervention didn’t come in the form of over-emotional family members, but in the form of an APM solution. AppDynamics was deployed and quickly generated a flow map of the entire application environment. Within minutes, business transactions (BT) from within the software itself and from all adjacent systems that interface with the Tracking Board application began to pour in.

A business transaction (BT) is a key feature of AppDynamics, which, in simple terms, allows the users of AppDynamics to map the application based on how the users experiences it.

A key BT captured by AppDynamics was one happily named “UpdateCycle”. As reported by the Tracking Board application vendor, the ”UpdateCycle” BT was responsible for querying it’s own database, the interface engine, and a variety of disconjointed data sources and update an operational dashboard displayed via digital signage throughout the ED.

As the team monitored the application via AppDynamics, looking for clues as to why the application was failing, we noticed that the UpdateCycle transaction volume was 100x what was expected. In general, the Tracking Board dashboards update every minutes for each one of the viewers. Considering there were ~10 viewers at any given time, the system was designed to support tens of transaction per minute, and was failing because we were receiving thousands of transactions per minute.

A faulty client side configuration was overloading the server and causing it to generate slow responses back to the clients. The listener was working overtime, getting a response back every few seconds, and trying to update the ED tracking board constantly, resulting in constantly updates to the signage stations and webpages, making the system inoperable.

Using AppDynamics, the team was able to locate the root cause within one hour of deployment and the change itself took less than 5 minutes. The web server was restarted and all was calm in the kingdom of the ED.

Screen Shot 2014-04-24 at 8.02.32 AM

From the moment “yours truly” recommended the use of AppDynamics until everyone left the war room for good, less than 4 hours had elapsed. Let me say this again, 4 hours was all it took to download the software, install it, allow for traffic capturing and resolution!

Take a few minutes to get complete visibility into the performance of your production applications with AppDynamics today.

Introducing Nodetime for Node.js Monitoring

Node.js is rapidly becoming one of the most popular platforms for building fast, scalable web applications. According to a W3C tech survey, adoption of Node.js doubled in the last year alone, and the Node.js application server is currently the 14th most popular in the world. Today a range of organizations use Node.js to power their mobile applications, including LinkedIn, Walmart and Klout. With Nodetime you can monitor, troubleshoot and diagnose performance issues in your Node.js applications.

Nodetime reveals the internals of your application and infrastructure through profiling and proactive monitoring enabling detailed analysis, fast troubleshooting, performance and capacity optimization. Monitor realtime and historical state of the application by following multiple application metrics. Over three thousand organizations use Nodetime to monitor their Node.js applications, including Condé Nast, Kabam and Change.org.

“We’re very excited to see AppDynamics pursuing the latest and most innovative technologies with this acquisition,” said Nic Johnson, Web Architect at FamilySearch, a customer of both AppDynamics and Nodetime. “Both Nodetime and AppDynamics are essential parts of our toolset, and we’re very excited to see them united in the same product and the same great company. This will mean great things both for AppDynamics customers and for the Node.js community as a whole.”

Nodetime Dashboard for at a glance metrics of application health:

Nodetime Dashboard

CPU profiler showing a backtrace to find the root cause of a performance problem:



Nodetime metrics cover operating system state, garbage collection activity, application capacity, transactions and database calls for supported libraries, such as such as HTTP, File System, Socket.io, Redis, MongoDB, MySQL, PostgreSQL, Memcached and Cassandra.

Explore any application or OS metric via the Nodetime metric browser:



Get started for free with Node.js performance monitor with Nodetime.


Nodetime installation is exteremly easy:

npm install nodetime

Once the Nodetime module is available simply add the following to your node application:

accountKey: 'xxx',
appName: 'MyApp'

Adding application performance monitoring for Node.js application has never been easier. Enjoy!

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Thoughts? Let us know on Twitter @AppDynamics!

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:


When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:


If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :


Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.


Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.


The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.


Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

Turnkey APMaaS by AppDynamics

Since we launched our Managed Service Provider program late last year, we’ve signed up many MSPs that were interested in adding Application Performance Management-as-a-Service (APMaaS) to their service catalogs.  Wouldn’t you be excited to add a service that’s easy to manage but more importantly easy to sell to your existing customer base?

Service providers like Scicom definitely were (check out the case study), because they are being held responsible for the performance of their customer’s complex, distributed applications, but oftentimes don’t have visibility inside the actual application.  That’s like being asked to officiate an NFL game with your eyes closed.


The sad truth is that many MSPs still think that high visibility in app environments equates to high configuration, high cost, and high overhead.

Thankfully this is 2013.  People send emails instead of snail mail, play Call of Duty instead of Pac-Man, listen to Pandora instead of cassettes, and can have high visibility in app environments with low configuration, low cost, and low overhead with AppDynamics.

Not only do we have a great APM service to help MSPs increase their Monthly Recurring Revenue (MRR), we make it extremely easy for them to deploy this service in their own environments, which, to be candid, is half the battle.  MSPs can’t spend countless hours deploying a new service.  It takes focus and attention away from their core business, which in turn could endanger the SLAs they have with their customers.  Plus, it’s just really annoying.

Introducing: APMaaS in a Box

Here at AppDynamics, we take pride in delivering value quickly.  Most of our customers go from nothing to full-fledged production performance monitoring across their entire environment in a matter of hours in both on-premise and SaaS deployments.  MSPs are now leveraging that same rapid SaaS deployment model in their own environments with something that we like to call ‘APMaaS in a Box’.

At a high level, APMaaS in a Box is large cardboard box with air holes and a fragile sticker wherein we pack a support engineer, a few management servers, an instruction manual, and a return label…just kidding…sorry, couldn’t resist.

man in box w sticker

Simply put, APMaaS in a Box is a set of files and scripts that allows MSPs to provision multi-tenant controllers in their own data center or private cloud and provision AppDynamics licenses for customers themselves…basically it’s the ultimate turnkey APMaaS.

By utilizing AppDynamics’ APMaaS in a Box, MSPs across the world are leveraging our quick deployment, self-service license provisioning, and flexibility in the way we do business to differentiate themselves and gain net new revenue.

Quick Deployment

Within 6 hours, MSPs like NTT Europe who use our APMaaS in a Box capabilities will have all the pieces they need in place to start monitoring the performance of their customer’s apps.  Now that’s some rapid time to value!

Self-Service License Provisioning

MSPs can provision licenses directly through the AppDynamics partner portal.  This gives you complete control over who gets licenses and makes it very easy to manage this process across your customer base.


A MSP can get started on a month-to-month basis with no commitment.  Only paying for what you sell eliminates the cost of shelfware.  MSPs can also sell AppDynamics however they would like to position it and can float licenses across customers.  NTT Europe uses a 3-tier service offering so customers can pick and choose the APM services they’d like to pay for.  Feel free to get creative when packaging this service for customers!


As more and more MSPs move up the stack from infrastructure management to monitoring the performance of their customer’s distributed applications, choosing an APM partner that understands the Managed Services business is of utmost importance.  AppDynamics’ APMaaS in a box capabilities align well with internal MSP infrastructures, and our pricing model aligns with the business needs of Managed Service Providers – we’re a perfect fit.

MSPs who continue to evolve their service offerings to keep pace with customer demands will be well positioned to reap the benefits and future revenue that comes along with staying ahead of the market.  To paraphrase The Great One, MSPs need to “skate where the puck is going to be, not where it has been.”  I encourage all you MSPs out there to contact us today to see how we can help you skate ahead of the curve and take advantage of the growing APM market with our easy to use, easy to deploy APMaaS in a Box.  If you don’t, your competition will…

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

The Top 5 Advantages of SaaS-based Application Performance Management

Software-as-a-Service (SaaS) has received a lot of success and adoption in the past five years, but not as much in the field of application performance management (APM) than it has in other markets. With Cloud computing gaining momentum, you’re likely to see SaaS APM adoption increase significantly as more applications are deployed to the Cloud. Gartner also recently made SaaS a mandatory requirement for APM vendors to be included in their 2012 APM Magic Quadrant, so SaaS-based APM is definitely becoming hot right now!

Here’s the top 5 advantages that SaaS-based APM can offer:


SaaS-based APM can be deployed within your organization in the time it takes you to read this article. Think about that for a second – you get to experience the full benefits of APM in just a few minutes with no interaction from sales people or technical consultants. All you need to do is sign up for an account, take a free trial, and evaluate whether APM can meet your needs or solve your problems.

Many cloud providers are now actively partnering with APM vendors to embed agents within the servers they provision for customer applications. I personally know of a company that solved a 6 month production issue within an hour of deploying SaaS-based APM. How about that for ROI and time to value!


Simply put, subscription-based licenses are cheaper, more flexible and less risk than owning perpetual licenses. Annual maintenance is included in the subscription, as is the cost of managing and supporting the APM infrastructure required to monitor your applications. You don’t need to buy hardware to run your APM management server, and you also don’t need to pay someone to manage it either – you simply deploy your agents and you’re all done. There’s now no need to sign up to a multi-million dollar 3 year APM ELA agreement with a vendor; rather, you can pay as you go. If the APM software rocks, you renew your subscription. If the APM software sucks, you go elsewhere.


When a customer signs up for a SaaS account and evaluates APM for the first time, there is no pre-sales or technical consultant sitting next to them to configure or demo the solution. The experience from account registration to application monitoring is a journey taken alone by the customer.

First impressions are everything with SaaS. Therefore, the learning curve of APM in this context must be faster and easier, so the APM solution can sell itself to the customer.

SaaS-based APM solutions are also much younger than traditional on-premise software, meaning the technology, UI design principles, and concepts applied are more superior and interactive for the user. Try comparing the UI of an iPhone with a Nokia phone from 5 years ago and you’ll see my point.

First generation APM solutions were typically written for developers by developers. Today the value of APM touches many different user skill sets. It is therefore no surprise that SaaS-based APM can appeal to and be adopted by development, operations and business users.


When an APM vendor announces a new release of its software with lots of cool features, it’s normally down to the customers themselves to migrate to the new release. If things go well, they might spend several days or perhaps a few weeks performing the migration. If things go badly, they might end up spending several weeks working hand in hand with the vendor to complete the migration.

With SaaS-based APM, the vendors themselves are responsible for the migration. Customers simply login and they get the latest version and features automatically. They get to harness APM innovation as soon as it’s ready, rather than having to wait weeks or months to find the time to migrate by themselves. If anything goes wrong, then it’s the vendor who spends the time and money to fix it, rather than the customer.

Customers today will typically upgrade their APM software once a year because of the time and effort. With SaaS-based APM, they can receive multiple upgrades and always be on the latest version.


Enterprises and Cloud providers can manage lots of applications, which can span several thousand servers. It is one thing for a customer to deploy APM across two applications and a hundred servers in their organization. It is another thing to deploy it across fifty applications and a thousand servers.

Scaling APM has never been easy. The more agents you deploy, the more management servers you need to collect, process, and manage the data. How quickly can you purchase, provision, and maintain the APM management infrastructure when you’ve got hundreds of applications you want to monitor?

With SaaS-based APM, you let the vendor take care of that for you. I know of a SaaS-based APM user that monitors over 6,000 servers in their organization. Compare that with the largest APM on-premise deployment you know of and you can see why SaaS-based APM is a better scalability option.

So there you have it–five compelling reasons why you should consider SaaS-based APM in your organization. SaaS-based APM isn’t for everyone, though. I typically see less adoption in financial services customers where data privacy and security controls are much tighter.


Gartner positions AppDynamics as a Leader in 2012 APM Magic Quadrant

Application Performance Monitoring (APM) has been my life and world for almost a decade. I used APM as a developer, sold it as a sales engineer, built it as a product manager and now I’m evangelizing it as a superhero. In that time, I’ve seen APM evolve from being a pure JavaEE monitoring tool in 2002 that a few developers might use, to a full blown IT monitoring platform in 2012 that aligns development, operations and the business.

Today, the APM market has advanced tenfold, with the help from analysts like Gartner, who research APM, and literally take hundreds of inquiry calls a year from buyers. As industry and technology trends evolve like SOA, Agile, web 2.0, cloud computing, devops and big data, so do the market requirements for APM.  For APM to deliver the promised benefits, it must enable users to monitor and manage modern applications. If modern buyers commonly require X, Y and Z from APM, then APM vendors must offer X, Y and Z to be considered relevant in the market, and so they’re recognized by analyst research and reports such as the Gartner Magic Quadrant.

For example, let’s take a look at the inclusion criteria from 2012 for a vendor to be included in the Gartner Application Performance Monitoring Magic Quadrant (and if you’d like to get a complimentary copy, be our guest):

  • The vendor’s APM product must include all five dimensions of APM, including application runtime; application architecture discovery and modeling; deep-dive monitoring of one or more key application component types (e.g., database, application server); user-defined transaction profiling; and analytics applied to metric aggregation, trending and pattern discovery techniques.
  • The APM product must provide compiled Java or .NET code instrumentation in a production environment.
  • The vendor should have at least 50 customers that use its APM products actively in a production environment.
  • The APM offering must include part of or the entire solution as a service. This includes managed service provider hosting, regardless of other commercial arrangements, or SaaS delivery through its own distribution channels.
  • Total revenue (including new licenses, updates, maintenance, subscriptions, SaaS, hosting and technical support) must have exceeded $5 million in 2011.
  • Customer references must be located in at least three of the following geographic locations: North America, South America, EMEA, the Asia/Pacific region and/or Japan.
  • The vendor references must monitor more than 200 production application server instances in a production environment.

Raising the APM bar:

The rational for Gartner’s 2012 APM MQ inclusion criteria is available here. A vendor must provide a broad set of APM functionality, supporting all five dimensions of APM rather than just a few.

Other inclusion criteria I liked from above was that APM vendors must provide a compiled Java or .NET code instrumentation in a production environment, the offering must include part or the entire solution as-a-service, and that vendor references must now monitor over 200 application server instances in a production environment. These items pretty much hit the sweet spots of AppDynamics, in that we monitor some of the largest production Java and .NET applications in the world, and we offer  all 5 dimensions of APM in a single product, which can be deployed both as on-premise or via SaaS. Our largest Java deployment is over 6,000 nodes and our largest .NET deployment is now over 5,000 nodes – this is how easy our APM solution is to deploy and scale.

The adoption of public cloud, combined with the facts that APM buyers are looking to simplify their APM purchases, implementations and maintenance means that AppDynamics is well positioned to capitalize on these opportunities.

Love or Hate the Gartner Magic Quadrant, every vendor wants to be part of it, because everyone wants to be known as a leader in their field. To do this, vendors must meet or exceed Gartner’s inclusion criteria as well as their very detailed requirements matrix, which puts pressure on each vendor to constantly innovate, execute and demonstrate a compelling vision.

AppDynamics named a Leader in 2012:

What I’ve witnessed at AppDynamics since I joined back in 2011 has been nothing short of amazing. We’ve kicked a lot of ass in the last year and have had a lot of fun doing it. You could say AppDynamics being positioned as a leader in the 2012 MQ was perhaps the recognition we deserved for breaking the rules of traditional APM. We believe our MQ position represents a clear testament of our technology, tremendous customer success and disruption in the marketplace. We’re enormously proud and privileged at AppDynamics to be recognized as a leader, but we know our job isn’t done yet. We want to make APM easy to deploy, easy to use and affordable for everyone. We do this and they’ll be more organizations in the world leveraging the benefits of APM than ever before, which translates to faster applications for everyone. Not a bad thing at all.

You can sign up for a free 30-day trial of AppDynamics Pro right here, and see for yourself why we’ve become a leader in just two years.

App Man.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose

How Monitoring Analytics can make DevOps more Agile

The word “analytics” is an interesting and often abused term in the world of application monitoring. For the sake of correctness, I’m going to reference Wikipedia in how I define analytics:

Analytics is the discovery and communication of meaningful patterns in data.

Simply put, analytics should make IT’s life easier. Analytics should point out the bleeding obvious from all the monitoring data available, and guide IT so they can effectively manage the performance and availability of their application(s). Think of analytics as “doing the hard work” or “making sense” of the data being collected, so IT doesn’t have to spend hours figuring out for themselves what is being impacted and why.

This is about how effectively a monitoring solution can self-learn the environment it’s deployed in, so it’s able to baseline what is normal and abnormal for the environment. This is really important as every application and business transaction is different. A key reason why many monitoring solutions fail today is that they rely on users to manually define what is normal and abnormal using static or simplistic global thresholds. The classic “alert me if server CPU > 90%” and “alert me if response times are > 2 seconds,” both of which normally result in a full inbox (which everyone loves) or an alert storm for IT to manage.

The communication bit of analytics is equally as important as the discovery bit. How well can IT interpret and understand what the monitoring solution is telling them? Is the data shown actionable–or does it require manual analysis, knowledge or expertise to arrive at a conclusion? Does the user have to look for problems on their own or does the monitoring solution present problems by itself? A monitoring solution should provide answers rather than questions.

One thing we did at AppDynamics was make analytics central to our product architecture. We’re about delivering maximum visibility through minimal effort, which means our product has to do the hard work for our users. Our customers today are solving issues in minutes versus days thanks to the way we collect, analyze and present monitoring data. If your applications are agile, complex, distributed and virtual then you probably don’t want to spend time telling a monitoring solution what is normal, abnormal, relevant or interesting. Let’s take a look at a few ways AppDynamics Pro is leveraging analytics:

Seeing The Big Picture
Seeing the bigger picture of application performance allows IT to quickly prioritize whether a problem is impacting an entire application or just a few users or transactions. For example, in the screenshot to the right we can see that in the last day the application processed 19.2 million business transactions (user requests), of which 0.1% experienced an error. 0.4% of transactions were classified as slow (> 2 SD), 0.3% were classified as very slow (> 3 SD) and 94 transaction stalled. The interesting thing here is that AppDynamics used analytics to automatically discover, learn and baseline what normal performance is for the application. No static, global or user defined thresholds were used – the performance baselines are dynamic and relative to each type of business transaction and user request. So if a credit card payment transaction normally takes 7 seconds, then this shouldn’t be classified as slow relative to other transactions that may only take 1 or 2 seconds.

The big picture here is that application performance generally looks OK, with 99.3% of business transactions having a normal end user experience with an average response time of 123 milliseconds. However, if you look at the data shown, 0.7% of user requests were either slow or very slow, which is almost 140,000 transactions. This is not good! The application in this example is an e-commerce website, so it’s important we understand exactly what business transactions were impacted out of those 140,000 that were classified as slow or very slow. For example, a slow search transaction isn’t the same as a slow checkout or order transaction – different transactions, different business impact.

Understanding the real Business Impact
The below screenshot shows business transaction health for the e-commerce application sorted by number of very slow requests. Analytics is used in this view by AppDynamics so it can automatically classify and present to the user which business transactions are erroneous, slow, very slow and stalling relative to their individual performance baseline (which is self-learned). At a quick glance, you can see two business transactions–“Order Calculate” and “OrderItemDisplayView”–are breaching their performance baseline.

This information helps IT determine the true business impact of a performance issue so they can prioritize where and what to troubleshoot. You can also see that the “Order Calculate” transaction had 15,717 errors. Clicking on this number would reveal the stack traces of those errors, thus allowing the APM user to easily find the root cause. In addition, we can see the average response time of the “Order Calculate” transaction was 576 milliseconds and the maximum response time is just over 64 seconds, along with 10,393 very slow requests. If AppDynamics didn’t show how many requests were erroneous, slow or very slow, then the user could spend hours figuring out the true business impact of such incident. Let’s take a look at those very slow requests by clicking on the 10,393 link in the user interface.

Seeing individual slow user business transactions
As you can probably imagine, using average response times to troubleshoot business impact is like putting a blindfold over your eyes. If your end users are experiencing slow transactions, then you need to see those transactions to effectively troubleshoot them. For example, AppDynamics uses real-time analytics to detect when business transactions breach their performance baseline, so it’s able to collect a complete blueprint of how those transactions executed across and inside the application infrastructure. This enables IT to identify root cause rapidly.

 In the screenshot above you can see all “OrderCalculate” transactions have been sorted in descending order by response time, thus making it real easy for the user to drill into any of the slow user requests. You can also see looking at the summary column that AppDynamics continuously monitors the response time of business transactions using moving averages and standard deviations to identify real business impact. Given the results our customers are seeing, we’d say this is a pretty proven way to troubleshoot business impact and application performance. Let’s drill into one of those slow transactions…

Visualizing the flow of a slow transaction
Sometimes a picture says a thousands words, and that’s exactly what visualizing the flow of a business transaction can do for IT. IT shouldn’t have to look through pages of metrics, or GBs of log files to correlate and guess why a transaction maybe slow. AppDynamics does all that for you! Look at the screenshot below that shows the flow of a “OrderCalculate” transaction–which takes 63 seconds to execute across 3 different application tiers as shown below. You can see the majority of time spent is calling the DB2 database and an external 3rd party HTTP web service. Let’s drill down to see what is causing that high amount of latency.

Automating Root Cause Analysis
Finding the root cause of a slow transaction isn’t trivial, because a single transaction can invoke several thousand lines of code–kind of like finding a needle in a haystack. Call graphs of transaction code execution are useful, but it’s much faster and easier if the user can shortcut to hotspots. AppDynamics uses analytics to do just that by presenting code hotspots to the user automatically so they can pinpoint the root cause in seconds. You can see in the below screenshot that almost 30 seconds (18.8+6.4+4.1+0.6) was spent in a web service call “calculateTaxes” (which was called 4 times) with another 13 seconds being spent in a single JDBC database call (user can click to view SQL query). Root cause analysis with analytics can be a powerful asset for any IT team.

Verifying Server Resource or Capacity
It’s true that application performance can be impacted by server capacity or resource constraints. When a transaction or user request is slow, it’s always a good idea to check what impact OS and JVM resource is having. For example, was the server maxed out on CPU? Was Garbage Collection (GC) running? If so, how long did GC run for? Was the database connection pool maxed out? All these questions require a user to manually look at different OS and JVM metrics to understand whether resource spikes or exhaustion was occurring during the slowdown. This is pretty much what most sysadmins do today to triage and troubleshoot servers that underpin a slow running application. Wouldn’t it be great if a monitoring solution could answer these questions in a single view, showing IT which OS and JVM resource was deviating from its baseline during the slowdown? With analytics it can.

AppDynamics introduced a new set of analytics in version 3.4.2 called “Node Problems” to do just this. The above screenshot shows this view whereby node metrics (e.g. OS, JVM and JMX metrics) are analyzed to determine if any were breaching their baseline and contributing to the slow performance of the “OrderCalculate” transaction. The screenshot above shows that % CPU idle, % memory used and MB memory used have deviated slightly from their baseline (denoted by blue dotted lines in the charts). Server capacity on this occasion was therefore not a contributing factor to the slow application performance. Hardware metrics that did not deviate from their baseline are not shown, thus reducing the amount of data and noise the user has to look at in this view.

Analytics makes IT more Agile
If a monitoring solution is able to discover abnormal patterns and communicate these effectively to a user, then this significantly reduces the amount of time IT has to spend managing application performance, thus making IT more agile and productive. Without analytics, IT can become a slave to data overload, big data, alert storming and silos of information that must be manually stitched together and analyzed by teams of people. In today’s world, “manually” isn’t cool or clever. If you want to be agile then you need to automate the way you manage application performance, or you’ll end up with the monitoring solution managing you.

If your current monitoring solution requires you to manually tell it what to monitor, then maybe you should be evaluating a next generation monitoring solution like AppDynamics.

App Man.