The End of my Affair with Apdex

A decade ago, when I first learned of Apdex, it was thanks to a wonderful technology partner, Coradiant. At the time, I was running IT operations and web operations, and brought Coradiant into the fold. Coradiant was ahead of its time, providing end-user experience monitoring capabilities via packet analysis. The network-based approach was effective in a day when the web was less rich. Coradiant was one of the first companies to embed Apdex in its products.

As a user of APM tools, I was looking for the ultimate KPI, and the concept of Apdex resonated with me and my senior management. A single magical number gave us an idea of how well development, QA, and operations were doing in terms of user experience and performance. Had I found the metric to rule all metrics? I thought I had, and I was a fan of Apdex for many years leading up to 2012, when I started to dig into the true calculations behind this magical number.

As my colleague Jim Hirschauer pointed out in a 2013 blog post, the Apdex index is calculated by putting the number of satisfied versus tolerating requests into a formula. The definition of a user being “satisfied” or “tolerating” has to do with a lot more than just performance, but the applied use cases for Apdex are unfortunately focused on performance only. Performance is still a critical criterion, but the definition of satisfied or tolerating is situational.

I’m currently writing this from 28,000 feet above northern Florida, over barely usable in-flight internet, which makes me wish I had a 56k modem. I am tolerating the latency and bandwidth, but not the $32 I paid for this horrible experience , but hey, at least Twitter and email work. I self-classify as an “un-tolerating” user, but I am happy with some connectivity. People who know me will tell you I have a bandwidth and network problem. Hence, my level of a tolerable network connection is abnormal. My Apdex score would be far different than the average user due to my personal perspective, as would the business user versus the consumer, based on their specific situation as they use an application. Other criteria that affect satisfaction include the type of device in use and connection type of that device.

The thing that is missing from Apdex is the notion of a service level. There are two ways to manage service level agreements. First, a service level may be calculated, as we do at AppDynamics with our baselines. Secondarily, it may be a static threshold, which the customer expects; we support this use case in our analytics product. These two ways of calculating an SLA cover the right ways to measure and score performance.

This is AppDynamics’ Transaction Analytics Breakdown for users who had errors or poor user experience over the last week, and their SLA class:

 

 

Simplistic SLAs are in the core APM product. Here is a view showing requests that were below the calculated baseline, showing which were in SLA violation.

The notion of combining an SLA with Apdex will result in a meaningful number being generated. Unfortunately, I cannot take credit for this idea. Alain Cohen, one of the brightest minds in performance analysis, was the co-founder and CTO (almost co-CEO) of OPNET. Alain discussed his ideas with me around this new performance index concept called OpDex, which fixes many of the ApDex flaws by applying an SLA. Unfortunately, Alain is no longer solving performance problems for customers; he’s decided to take his skills and talents elsewhere after a nice payout.

Alain shared his OpDex plan with me in 2011; thankfully all of the details are outlined in this patent, which was granted in 2013. But OPNET’s great run of innovation has ended, and Riverbed has failed to pick up where they left off, but at least they have patents to show for these good ideas and concepts.

The other issue with Apdex is that users are being ignored by the formula. CoScale outlined this issues in a detailed blog post.They explain that histograms are far better ways to analyze a variant population. This is no different than looking at performance metrics coming from the infrastructure layer, but the use of histograms and heat charts tend to provide much better visual analysis.

AppDynamics employs automated baselines for every metric collected, and measures based on deviations out of the box. We also support static SLA thresholds as needed. Visually, AppDynamics has a lot of options including viewing data in histograms, looking at percentiles, and providing an advanced analytics platform for whatever use cases our users come up with. We believe these are valid approaches to the downsides of using Apdex extensively in a product, which has it’s set of downsides.

 

 

What APM Vendors can learn from building Supercars

P1McLaren this year will launch their P1 Supercar, which will turn the average driver into a track day hero. What’s significant about this particular car is that it relies on modern day technology and innovation to transform a drivers ability to accelerate, corner and stop faster than any other car on the planet–because it has:

  1. 903bhp on tap derived from a combined V8 Twin Turbo and KERS setup, meaning it has a better power/weight ratio than a Bugatti Veyron
  2. Active aerodynamics & DRS to control the airflow so it remains stable under acceleration and braking without incurring drag
  3. Traction control and brake steer to minimize slip and increase traction in and out of corners
  4. 600Kg of downforce at 150mph so it can corner on rails up to 2G
  5. Lightness–everything exists for a purpose so there is less weight to transfer under braking and acceleration

You don’t have to be Lewis Hamilton or Michael Schumacher to drive it fast. The P1 creates enormous amounts of mechanical grip, traction, acceleration and feedback so the driver feels “confident” in their ability to accelerate, corner and stop, without losing control and killing themselves. I’ve been lucky enough to sit in the drivers seat of a McLaren MP4-12C and it’s a special experience – you have a driving wheel, some dials and some pedals – that’s really it, with no bells or whistles that you normally get in a Mercedes or Porsche. It’s “Focused” and “Pure” so the driver has complete visibility to drive as fast as possible, which is ultimately the whole purpose of the car.

How does this relate to Application Performance Monitoring (APM)?

Well, how many APM solutions today allow a novice user to solve complex application performance problems? Erm, not many. You need to be an uber geek with most because they’ve been written for developers by developers. Death by drill-down is a common symptom because novice APM users have no idea how to interpret metrics or what to look for. It would be like McLaren putting their F1 wheel with a thousand buttons in the new P1 road car for us novice drivers to play with.

It’s actually a lot worse than that though, because many APM vendors sell these things called “suites” that are enormously complex to install, configure and use. Imagine if you paid $1.4m and McLaren delivered you a P1 in 5 pieces and you had to assemble the engine, gearbox, chassis, suspension and brakes yourself? You’d have no choice but to pay McLaren for engineers to assemble it for with your own configuration. This is pretty much how most vendors have sold APM over the past decade–hence why they have hundreds of consultants. The majority of customers have spent more time and effort maintaining APM than using it to solve performance issues in their business. It’s kinda like buying a supercar and not driving it.

Fortunately, a few vendors like AppDynamics have succeeded in delivering APM through a single product that combines End User Monitoring, Application Discovery and Mapping, Transaction Profiling, Deep Diagnostics and Analytics. You download it, install it and you solve your performance issues in minutes–it just works out-of-the-box. What’s even great is that you can lease the APM solution through annual subscriptions instead of buying it outright with expensive perpetual licenses and annual maintenance.

If you want an APM solution that lets you manage application performance, then make sure it does just that for you. If you don’t get value from an APM solution in the first 20 minutes, then put it in the trash can because that’s 20 minutes of your time you’ve wasted not managing application performance. Sign up for a free trial of AppDynamics and find out how easy APM can be. If these vendors built their solutions like car manufacturers build supercars, then the world would be a faster place (no pun intended).

Appman.

The APPrentice

Screen Shot 2013-05-28 at 3.04.27 PMIn this week’s episode, Donald Trump enlists Team ROI and Team Overhead to solve a Severity1 incident on the “Trump Towers Website”. Team Overhead used “Dynoscope” and took 3 weeks to solve the incident, while Team ROI took 15 minutes by using AppDynamics.

 

Glassdoor proves AppDynamics is a Great Place to Work!

AD TeamIt’s been almost two years since I joined AppDynamics and it’s been one of the best career moves I’ve ever made. I used to work at a competitor, and quickly realized I was working for the wrong company. Sometimes you just have to trust your gut feeling when it comes to technology–you’ve either got a product that’s special or you don’t, and I know what it’s like to experience both feelings.

At AppDynamics the technology is definitely special, but I also joined a group of like-minded people who shared the same passion as I did for application monitoring. The no-compromise approach to figuring out new ways of doing things that couldn’t be done previously, along with a laser-focus on solving real world problems for customers, is pretty inspiring. Things are never perfect at any company but the passion to make our customers successful, and the will to win business professionally, is unique at AppDynamics. We really believe that enterprise software doesn’t have to suck, it should never be shelfware, and it should be affordable by everyone–which is one of the reasons why we created a free product AppDynamics Lite that now has over 100,000 users and our commercial product AppDynamics Pro is reasonably priced.

In just two years we’ve disrupted an application monitoring market that was previously dominated by expensive complex solutions that quite frankly sucked. This disruption was one of the reasons why Gartner recognized AppDynamics as a Leader in their 2012 APM Magic Quadrant, and we’ve only been selling our product for two years! This speaks volumes for what we’ve achieved in such a short period of time. What’s also great is that our customers are very vocal about their success; our case study page is packed with customer success stories, with several customers willing to publish actual ROI results from their AppDynamics deployments. How many real customer ROI stories have you read recently from any vendor? My guess is not many.

One online community that provides an accurate inside look at companies is Glassdoor.com. It basically lets employees rate different aspects of the company they work for, from compensation all the way through to culture and leadership. If you search for all the APM companies on Glassdoor.com that are currently recognized in the Gartner’s APM Magic Quadrant, here is what the top 10 looks like:

Glassdoor APM ratings

*Glassdoor ratings correct as of 1/10/2013

I’m pretty proud to work for a company where employees are very satisfied and give their CEO 100% approval. That says a lot about the success and leadership of the company–happy employees also means a happy place to work and trust me, this is pretty important when you spend most of your life at work!

One company that didn’t score well was Compuware. Only 38% of employees would recommend a friend and only 68% approve of their CEO. Not particularly encouraging when you need your employees to innovate, run through walls, and beat the competition. A hedge fund recently put an offer on the table to take Compuware private–let’s hope those guys can get the employees jazzed.

If you’re looking for the next challenge, cool technology and a great place to work, you should consider joining AppDynamics. We’ve got 21 positions currently open and we need great people to help scale the great company we’re building!

With customers like Netflix, Orbitz, Fox News, Vodafone and Yahoo you’ll experience the ins and outs of monitoring some of the largest applications in the world.

Oh, and you get to work with a superhero like me!

Appman.

Finding the Root Cause of Application Performance Issues in Production

The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.

It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.

To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.

End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.

For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.

Why Deep Diagnostics for Production Monitoring Matters

A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.

AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.

Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.

Needle #1 – Slow SQL Statement

Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan

Needle #2 – Slice of Death in Cassandra

Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra

Needle #3 – Slow & Chatty Web Service Calls

Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)

Needle #4 -Extreme XML processing

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.

Needle #5 – Mail Server Connectivity

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity

 Needle #6 – Slow ResultSet Iteration

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data

Needle #7 – Slow Security 3rd Party Framework

Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code

Needle #8 – Excessive SQL Queries

Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction

Needle #9 – Commit Happy

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.

Needle #10 – Locking under Concurrency

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency

 Needle #11 – Slow 3rd Party Search Service

Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code

 Needle #12 – Connection Pool Exhaustion

Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries

Needle #13 – Excessive Cache Usage

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration

If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.

Now go find those needles that are hurting your end users!

App Man.

AppDynamics recognized by Forrester in APM market overview

Interest in the Application Performance Management (APM) category is very high right now.   To stay one step ahead of their clients, the Industry Analysts who cover the category and write research to advise their clients have been very busy.  In December alone, there were six different analyst reports being researched by the major analyst firms.

Forrester published the results of their research in the 2nd week of December with the report: Market Overview: Application Performance Management, Q4 2011.  Forrester clients can access the report at www.forrester.com. In this report, Forrester provides very sound advice on why APM exists and what it should do for clients. Forrester has created their own “Reference Model” for APM and evaluated the vendor landscape against those criteria.

Raison d’etre for APM

Forrester VP and Principal Analyst, JP Garbani, gives readers very pragmatic advice on the raison d’etre for APM.  Simply put, APM’s job is to:

1) Alert IT to application performance and availability issues before a full-scale outage occurs

2) Isolate or pinpoint the problem source

3) Provide deep-diagnostics to enable IT to determine the root cause

For several years now, JP Garbani has been on the forefront of proclaiming that modern APM solutions should enable IT organizations to manage apps not by gauging the heath of their servers or servlets, but instead by assessing what the customer or end-user cares about most – whether their Business Transaction completes quickly and doesn’t make them wait.  He states that this has become even more critical as applications have gotten more distributed and complex.

France’s #1 Travel Site Karavel Selects AppDynamics for APM over Compuware Dynatrace & CA Wily

AppDynamics vs CA Wily vs DynaTrace2011 was an amazing year for AppDynamics. We experienced tremendous growth and success, largely down to the many customers around the world who believed in our vision, technology, and ability to help Dev and Ops teams better manage application performance in production. The Application Performance Management (APM) market isn’t an easy market to succeed in, with well over 30 vendors competing against each other. In just three years we’ve managed to take on the big players like Compuware DynaTrace, CA Wily, HP and IBM to change the industry perception that APM is expensive to own and difficult to deploy/use.

We feel APM should be for everyone. It should be affordable, it should be easy to deploy, and easy to use. APM should not be a luxury that only an elite group of enterprises can afford. Today, we have customers who monitor applications with 5 nodes, 50 nodes, 500 nodes and 5,000 nodes. Application performance impacts organizations of all sizes; that’s why we wanted our APM solution to be accessible to the masses over the web via our free download and SaaS trial. We wanted to be transparent with our buyers and demonstrate that they can evaluate and use our solution all by themselves with no account manager or technical consultant by their side. We really wanted prospects to see for themselves that APM can be simple to deploy and easy to use.

A major validation of this market disruption was when a customer called Karavel in France was looking for an APM solution and evaluated CA Wily, Compuware dynaTrace and AppDynamics. Karavel requested a trial, downloaded our software and we sent them a trial license key for 30 days. The whole AppDynamics install, deployment and evaluation was solely conducted by the customer on their own. This might not sound that impressive, but this is what the software buying experience should be all about: the customer and the solution. If the customer can’t install, deploy and evaluate an APM solution on their own, how will they manage this process when it comes to a production deployment? Software should sell itself these days–if it requires an army of people to sell it, it probably requires an army of people to implement it as well.

You can read the full Karavel press release here:
http://www.appdynamics.com/press/press-release-01-03-12.php

Full case study is available here also:
http://www.appdynamics.com/documents/roi_studies/AppDynamics_ROI_Karavel.pdf

Remember, software like APM doesn’t have to be complex and expensive. With the internet these days, there is no excuse why a prospect can’t download or evaluate solutions online in just a few hours.

App Man.

Storm Clouds in 2012? – Results of AppDynamics APM Survey

We recently finished conducting our annual Application Performance Management survey. Over 250 IT professionals participated, and they shared insights such as:
– Many Ops and Dev teams are anticipating growth in their applications by 20% or more
– Over 50% are planning to move to the cloud, and are architecting brand-new applications to be cloud-ready
– Most teams are using log files to monitor application performance, rather than an Application Performance Management (APM) tool.

We’ll release the full report soon, but here’s an infographic that summarizes some of the main findings:

AppDynamics Inforgraphic - Storm Clouds in 2012

Embed this image on your site:

What I found personally surprising was the heavy reliance on log files. When you’re troubleshooting distributed architectures, time is of the essence–and there’s no way to cut your MTTR down when you’re relying on log files to identify root cause.

In fact, there’s only one guy who ever made using a log file look cool:

And I think we can all agree that’s a pretty unique use case.

We’ll have the full survey results available soon.

 

 

Not Everyone is an Application Expert

The majority of us in IT are specialists, with the exception of a few VPs of engineering who are “special” in their own “special” world of being “special.” What I mean by this is that no single person has the skills or experience to do everything well in IT. IT is too big for me to explain or summarize in a few words, other than it requires a lot of different people with different skills to make it tick along. Despite applications being the living breathing entities of the business, a large portion of folk in IT have little context of how applications are built, how they execute, and how they consume resource across the IT infrastructure. Many people simply don’t care as their responsibilities are completely void of anything application related. That’s fine–but the reality is that everyone in IT should have one eye on the business. The whole reason IT exists is so the business can be more competitive and make more money. If this happens, IT gets more budget and is allowed to innovate more. IT and the business need each other to survive, which is why when applications slow down or break, both parties bitch at each other.

Operations need better visibility

Unfortunately for both the business and IT, the people (Operations) who manage the performance and availability of applications in production aren’t application experts. They are also not stupid either; their skills sets are wide and broad across many technologies and platforms that underpin applications. They manage a lot of things that application developers take for granted, like networks, databases, storage and virtualization. While Operations monitor the health of these infrastructure components, they often get bombarded with crap from the business when end users and business transactions are being impacted by slow performance, despite all system monitoring showing everything is fine. This lack of understanding between the Business and Operations is because both parties see things from different perspectives.