Fat or Fit – You Choose

There was a time when I thought more was better. More french fries are better; more ice cream is better; more of everything is always better. I followed this principle as a way of life for many years until one day I woke up and realized that I was obese and slow. I realized that more is not always better and that moderation and balance was the an important key to a happy and healthy life. The same exact rule applies to IT monitoring and in this blog I’ll explain how to identify high fat, high calorie, low nutritional content monitoring solutions.

binge

Fat, Dumb, and Happy

The worst offenders in the battle against data center obesity are those companies that claim to be “always-on”. Always gathering mountains of data when there are no problems to solve is equivalent to entering a hotdog eating contest every day of the year. Why would you do that to yourself? Your IT bloating will reach epic proportions in no time and your CIO/CTO will eventually start asking why you are spending so much money on all of that storage space and all of those monitoring servers.

Let’s use an example to explore this scenario. A user of your application logs in, searches for some new running shoes, adds their favorite ones to their cart, checks out and happily disappears into the ether awaiting their shoe delivery so they can get ready for their local charity fun run. This same pattern is repeated for many users of your application.

Scenario 1: “Always on” bloated consumerism – Your monitoring software:

  • Tracked the response time of each function the user performed (small amount of data)
  • Tracked the execution details of many of the method calls involved in each and every function (lots and lots of data)
  • Sent all of this across the network to be compiled and stored
  • This happens for every single function that every single user executes regardless of if there is a problem or not.

Smart and Fit

Scenario 2: The intelligent fitness pro – Your monitoring software:trinity

  • Tracked the response time of each function the user performed (small amount of data)
  • Periodically tracked the execution details of all the method calls involved in each function so that you have a reference point for “good” transactions (small amount of data)
  • Tracked the execution details of all method calls for every bad (or slow) function (business transaction) so that you have the information you need to solve the problem (small – medium data)
  • The built in analytics decide when slow business transactions are impacting your users and automatically collect all the appropriate details.

How often do you look at deep granular monitoring details when there are no application issues to resolve? I was an application monitoring expert at a major investment bank and I never looked at those details when there were no problems. AppDynamics is a new breed of monitoring tool that is based upon intelligent analytics to keep your data center fast and fit. I think John Martin from Edmunds.com said it best in his case study “AppDynamics intelligence just says, ‘Hey something interesting is going on, I’m going to collect more data for you’.”

Smart people choose smart tools. You owe it to yourself to take a free trial of AppDynamics today and make us prove our value to you.

The Most Important Lesson I Ever Learned About Solving Performance Problems

I’m an operations guy. I’ve been one for over 15 years. From the time when I was a Systems Administrator I was always intrigued by application performance and jumped at every opportunity to try and figure out a performance problem. All of that experience has taught me that there is one aspect of troubleshooting that makes the biggest difference in the most cases.

My Charts Will Save The Day

Before I jump right in with that single most important lesson learned I want to tell the story that set me on my path to learning this lesson. I was sitting at my desk one day when I got called into a P1 issue (also called Sev 1, customers were impacted by application problems) for an application that had components on some of my servers. This application had many distributed components like most of the applications at this particular company did. I knew I was prepared for this moment since I had installed OS monitoring that gave me charts on every metric I was interested in and I had a long history of these charts (daily dating back for months).

Simply put, I was confident I had the data I needed to solve the problem. So I joined the 20+ other people on the conference call, listened to hear what the problem was and what had already been done, and began digging through my mountains of graphs. Within the first 30 minutes of pouring over my never ending streams of data I realized that I had no clue where any of the data points should be for each metric at any given time. I had no reference point to decipher good data points from bad data points. “No problem!” I thought to myself. I have months of this data just waiting for me to look at and determine what’s gone wrong.

Now I don’t know if you’ve ever tried to manually compare graphs to each other but I can tell you that comparing 2 charts that represent 2 metrics on 2 different days is pretty easy. Comparing ~50 daily charts to multiple days or weeks in history is a nightmare that consumes a tremendous amount of time. This was the Hell I had resigned myself to when I made that fateful statement in my head “No problem!”.

bangheadSkip ahead a few hours. I’ve been flipping between multiple workbooks in Excel to try and visually identify where the charts are different. I’ve been doing this for hours. Click-flip, click-flip, click-flip, click-flip… My eyes are strained and my head is throbbing. I want the pain to end but I’m a performance geek that doesn’t give up. I’ve looked at so many charts by now that I can no longer remember why I was zeroing in on a particular metric in the first place. I’m starting to think my initial confidence was a bit misguided. I slowly start banging my head on my desk in frustration.

From Hours To Seconds

Isn’t this one of the most commonly asked questions in any troubleshooting scenario? “What changed?” It’s also one of the toughest questions to answer in a short amount of time. If you want to resolve problems in minutes you need to know the answer to this question immediately. So that leads me to the most important lesson I ever learned about solving performance problems. I need something that will tell me exactly what has changed at any given moment in time.

I need a system that tracks my metrics, automatically baselines their normal behavior, and can tell me when these metrics have deviated from their baselines and by how much. Ideally I want this all in context of the problem that has been identified either by an alert or an end user calling in a trouble ticket (I’d rather know about the problem before a customer calls though).

Thankfully today this type of system does exist. Within AppDynamics Pro, every metric is automatically baselined and a candidate for alerting based upon deviation from that baseline. By default all business transactions are classified as slow or very slow based upon how much they deviate from their historic baselines but this is only the tip of the iceberg. The really cool feature is available after you drill down into a business transaction. Take a look at the screen grab below. This grab was taken from a single “Product Search” business transaction that was slow. Notice we are in the “Node Problems” area. I’ve requested that the software automatically find any JVM metrics that have deviated higher than their baseline during the time of this slow transaction. The charts on the right side of the screen are the resulting data set in descending order of most highly deviated to least highly deviated.

Screen Shot 2013-07-19 at 7.01.48 AM

Whoa… we just answered the “What changed?” question in 30 seconds instead of manually doing hours of analysis. I wish I had this functionality years ago. It would have saved me countless hours and countless forehead bruises. We veterans of the performance wars now have a bigger gun in the battle to restore performance faster. Leave the manual analysis and correlation to the rookies and click here to start your free trial of AppDynamics Pro right now so you can test this out for yourself.