My Top 3 Automated Tasks for Finding and Fixing Problems

July 26 2013
 


Gears-BrainAutomation sets apart organizations at the top of their game from the rest of the pack. The limiting factor in most organizations is that they are usually too busy putting out fires and keeping up with all of their other obligations to expend the effort required to envision and build out their automation strategy. With that in mind I have created a small list of the automation tasks that I feel provide the most value to an organization. Along with these tasks I explain the type of effort involved and reward associated with each one. All of the information presented is based upon my 15 years of troubleshooting within enterprise operations and applications environments.

IT Operations

1. Collect troubleshooting metrics – To me this is the no-brainer of automation tasks. For each particular type of problem you always want certain information to help resolve that issue. This is also the easiest of my top three to implement but may provide less value than the other two. Here are some examples…

  • Hung/Unresponsive JVM/CLR – Initiate and store thread dump, restart application on offending node.
  • Slow transactions plus high server CPU utilization – Collect process listing to determine CPU contributors and spin up extra instance of application.
  • Transactions throwing excessive errors – search log files for errors and send list to appropriate personnel, based upon error type possibly probe individial components deeper (see #2 below)

Application Operations

2. Probe application components – This one is really useful for figuring out difficult application problems but requires more effort to set up than #1. The basic concept is that you need to find out from the application support team what steps they would take manually to trouble shoot their application if it were slow or broken. The usual responses are things like “Check this log file for this word”, “Run this query against this database and if the output is -3 then I know what the problem is”, “Hit this URL to see what the response looks like. If the page does not return properly I know there is a problem with this component”, etc…

It may seem like a lot of work to set up this type of automated probing at first but once it is set up it becomes an invaluable troubleshooting tool. Imagine that the application response times get slow so these troubleshooting measures are automatically invoked and you get an email with the exact root cause within minutes of performance degradation. With this type of automation there is usually a known resolution based upon the probing results so that could be automated too. How handy is that?

Business Operations

Blame_Flowchart3. Alert the business to changing conditions – This is where the IT staff has the ability to really make an impression and impact on the business. One of the most overlooked aspects of monitoring and automation is the ability to gather, baseline, alert, and act based upon pure business metrics. Here’s an example scenario…

Bobs business has an e-commerce website that sells many different products. They use AppDynamics Pro to track the quantity of each item sold along with the total revenue of all sales throughout the business day (using the information point functionality). One day Bob gets an alert that the latest greatest widgets are selling way below their typical volume but the automation engine searched the prices of major competitors to find that one competitor lowered prices and is undercutting business. With this actionable information Bob is able to immediately match the pricing of the competition and sales rates return to normal before too much damage is done.

Obviously business operations automation can be the most complex but also the most rewarding. Reaching out to the business and having a conversation about their activities and processes can seem daunting but I can tell you from personal experience that the business is more than willing to participate in initiatives that save them time and money. This type of conversation also normally leads to the business asking about other ways to collaborate to make them more effective so it is a great way to improve the overall level of communication with the business.

AppDynamics Pro enables a new level of automation because it knows exactly what is going on inside of your applications from both a business and IT perspective. Your level of automation is limited only by your imagination. I recommend that you start out small with a single automation use case and build outward from there. You can use AppDynamics Pro for free to try out our application runbook automation functionality on your specific use case by clicking here.

Jim Hirschauer
Jim Hirschauer is a Technology Evangelist for AppDynamics. He has an extensive background working in highly available, business critical, large enterprise IT operations environments. Jim has been interested in application performance testing and monitoring since he was a Systems Administrator working in a retail bank. His passion for performance analysis led him down a path where he would design, implement and manage the cloud computing monitoring architecture for a top 10 investment bank. During his tenure at the investment bank, Jim created new processes and procedures that increased overall code release quality and dramatically improved end user experience.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form