6 Strategic Steps to Rock Solid IT Service Assurance

IT Service AssuranceIn most organizations, managing the service level of critical applications is still a challenge. For some there is a lack of strategic planning, for others it’s simply not applying the proper tools and methodology to their everyday work. Regardless of the reason there are steps that need to be taken in all organizations to avoid costly and damaging service disruption.

We’ve Stopped Making Money

One day, while working at an investment bank, I got a phone call requesting my help (it was really more like a plea and an order at the same time) with troubleshooting a business-critical application. I had even heard  chatter in the office about how this application had been unusable for days before I was asked to participate. My role at the bank was as a monitoring architect who tested, reviewed, purchased, and on-boarded new tools among  other responsibilities. As a result, I was one of the people who would get a phone call when difficult problems went unsolved for too long so  I could apply my tools and expertise.

This was a time of great instability in the stock market and our traders were very active. This was also the time when the traders needed this particular application the most and when the bank should have made a small transaction fee for every completed transaction through this application. Simply put, the bank was loosing millions of dollars while this application was performing so poorly.

I started my work with the development team by getting a breakdown of the problem, the conditions leading up to the problem, an overview of the technology, and a demonstration of them recreating the problem in a test environment. Next, I deployed some application monitoring tools into their test environment — since they only had basic OS monitoring and the data that was coming from their load test tool — and watched as they ran more load tests. I could see certain parts of their code degrading as load ramped up and this led me to ask a lot of questions about the logic associated with these parts of the code.

Developer IdeasI worked together with the development team for 2 days asking questions, seeing the mental light bulbs explode through the look in their eyes, and testing the new code they feverishly created after each bottleneck was removed and a new one discovered. After all was said and done the application was upgraded in production at the end my 2 days of involvement. Capable of handling 5 times the throughput helped the traders do their jobs, and most importantly, the bank was ringing the cash register again for each transaction.

Strategic Planning

The worst part is the situation could have been completely avoided. By following a few key rules, the application team could have detected this problem in it’s infancy and minimized or avoided the lengthy and embarrassing production impact.

  1. Where the rubber meets the road – Application performance monitoring IN PRODUCTION is a requirement for any business, mission, or revenue critical applications.
  2. Dev and QA monitoring – Using application monitoring tools in PRE-PRODUCTION will dramatically improve quality of production application releases.
  3. Feedback loop – Constantly apply the information gained in production to your pre-production environment. Use production loading and performance patterns shown by your monitoring tools to prioritize development work and the create more realistic load tests.
  4. Collaboration is king – Development AND Operations personnel should have access to and use the same monitoring tools during load tests to gain the most benefit.
  5. Think strategic instead of tactical – Implement a well thought out monitoring and management strategy starting with your most critical revenue generating applications and working down from there (after rigorous testing of course).
  6. Identify and fix small problems before they turn into big problems – Alerting should be based off of deviation from normal (baseline) behavior in most situations. Minimize the number of static thresholds you use to trigger alerts and make an investment in analytics-driven monitoring platforms. Static thresholds should mostly be used to identify service level breaches.

The reality of the 6 points outlined above is that it takes some initial effort to make the required organization and process changes as well as getting the right tools in place. However, the fact remains the investment is well worth it for business-critical applications. I’ve seen so many groups think they don’t have enough time to invest in strategic initiatives and then they constantly run around firefighting the next battle which should have been avoided in the first place. It’s a vicious cycle that needs to be end. Consider the tips listed above and break the cycle starting right now.

IT holds more business influence than they realise

A ‘well oiled’ organization is one where IT and the rest of the business are working together and on the same page. In order to achieve this there needs to be good communication, and for good communication there needs to be a common language.

In most organizations, while IT are striving to achieve their goal of 99.999% availability, the rest of the business is looking to drive additional revenue, increase user satisfaction, and reduce customer churn.

Ultimately everyone should be working towards a common goal: SUCCESS. Unfortunately different teams define their success in different ways and this lack of alignment often results in a mistrust between IT departments and the rest of the business.

Measuring success

Let’s look at how various teams within a typical organization define success today:

Operations:
IT ops teams are responsible for reducing risk, ensuring the application is available and the ‘lights are green’. The number ‘99.9’ can either be IT Ops best friend or its worst enemy. Availability targets such as these are often the only measure of ops success or failure, meaning many of the other things you are doing often go unnoticed.

Availability targets don’t show business insight, or the positive impact you’re having on the business. For instance, how much did performance improve after you implemented that change last week? Has the average order size increased? How many additional orders can the application process since re-platforming? Is anyone measuring what the performance improvement gains were for that change you implemented last week?

Development:
Dev teams are focussed on change. The Business demands they release more frequently, with more features, less defects, less resources and often less sleep! Dev teams are often targeted according to the number of updates and changes they can release. But nobody is measuring the success of these changes. Can anyone in your dev team demonstrate what the impact of your last code release was? Did revenues climb? Were users more satisfied? Were there an increased number of orders placed?

‘The Business’:
The business is focussed on targets; last month’s achievements and end of year goals. This means they concentrate on the past and the future, but have little or no idea what impact IT is having on the business in the present. Consulting a data warehouse to gather ‘Business Intelligence’ at the end of the month does not allow you to keep your finger on the pulse of the business.

With everyone focussing on different targets there is no real alignment to the overall business goals between different parts of an organization. One reason for this disconnect is due to the lack of meaningful shared metrics. More specifically, it’s access to these metrics in real-time that is the missing link.

If I asked how much revenue has passed through your application since reading this blogpost, or what impact your last code release had on customer adoption, how quickly could you find the answers? How quickly could anyone in your organization find the answers?

What if answers to these questions only took seconds?

Monitoring the Business in Real-time

In a previous post, I introduced AppDynamics Real-time Business Metrics which enables you to easily collect, auto-baseline, alert, and report on the Business data that is flowing through your applications… as it’s really happening.

This post demonstrates how to configure AppDynamics to extract all checkout revenue values from every business transaction and make this available as a new metric “Checkout Revenue” which can be reported in real-time just like any other AppDynamics metric.

With IT Ops, Dev and Business Owners all supporting business critical applications that are responsible for generating revenue, it is a great example of a business metric that could be used by every team to measure success.

Let’s look at a few examples of how this could change the way you do business, if everyone was jointly focussed on the same business metric.

Outage cost
The below example shows the revenue per minute vs. the response time per minute of an application. This application has obviously suffered an outage that lasted approximately 50 mins and it’s clear to see the impact it has had on the business in lost revenue. The short spike/increase in revenue seen after the outage indicates users who returned to complete their transaction, but this is not enough to recover the lost revenue for the period.

RtBM - outage

Impact of agile releases
This example shows the result of a performance improvement program that has taken place. The overall response time has improved by over a second across three code releases and you can clearly see the additional revenue that has been generated as a result of the code releases.

RtBM - agile releases

Here a 1 second improvement in response time has increased the revenue being generated by the online booking system by more than 30%. The value a development team is delivering back to the business is clearly visible with each new release, allowing developers to focus on the areas that drive the most return and quantify the value they are delivering.

Marketing campaign
This example is a little more complex. At midday there is a massive increase in the number of people visiting this eCommerce website due to an expensive TV advertising campaign. The increased load on the system has resulted in a small increase in the overall response time but nothing too significant. However, despite the increased traffic to the site, the revenue has not improved. If we take a look at the Number of Checkouts, which is a second Business Metric that has been configured, it’s clear the advertising campaign has driven additional users to the site, but these users have not generated additional revenue.

RtBM - marketing

Common metrics for common success

With traditional methods of measuring success in different ways it’s impossible to to align towards a common goal. This creates silo’d working environments that make it impossible for teams to collaborate and prioritise.

By enabling all parts of the business to focus on the business metrics that really matter, organizations benefit from being able to proactively prioritise and resolve issues when they occur. It helps IT truly align with the priorities and needs of the business, allowing them to speak the same language and manage the bottom line. For example, after implementing AppDynamics Real-time Business Metrics Doug Strick, who is the Internet Application Admin at Garmin, said the following:

“We can now understand how the application is growing over time. This data will prove invaluable in guiding future decisions in IT.”
-Doug Strick, Internet Application Admin

AppDynamics Real-time Business Metrics enable you to identify business challenges and react to them immediately, instead of waiting hours, days or even weeks for answers. Correlating performance, user experience, and Business metrics together in real-time and in one place.

If you want to capture the business performance and measure your success against it in real-time , you can get started today with Real-time Business Metrics by signing up and taking a free trial of AppDynamics Pro here.

Introducing Nodetime for Node.js Monitoring

Node.js is rapidly becoming one of the most popular platforms for building fast, scalable web applications. According to a W3C tech survey, adoption of Node.js doubled in the last year alone, and the Node.js application server is currently the 14th most popular in the world. Today a range of organizations use Node.js to power their mobile applications, including LinkedIn, Walmart and Klout. With Nodetime you can monitor, troubleshoot and diagnose performance issues in your Node.js applications.

Nodetime reveals the internals of your application and infrastructure through profiling and proactive monitoring enabling detailed analysis, fast troubleshooting, performance and capacity optimization. Monitor realtime and historical state of the application by following multiple application metrics. Over three thousand organizations use Nodetime to monitor their Node.js applications, including Condé Nast, Kabam and Change.org.

“We’re very excited to see AppDynamics pursuing the latest and most innovative technologies with this acquisition,” said Nic Johnson, Web Architect at FamilySearch, a customer of both AppDynamics and Nodetime. “Both Nodetime and AppDynamics are essential parts of our toolset, and we’re very excited to see them united in the same product and the same great company. This will mean great things both for AppDynamics customers and for the Node.js community as a whole.”

Nodetime Dashboard for at a glance metrics of application health:

Nodetime Dashboard

CPU profiler showing a backtrace to find the root cause of a performance problem:

Nodetime

NodeTime

Nodetime metrics cover operating system state, garbage collection activity, application capacity, transactions and database calls for supported libraries, such as such as HTTP, File System, Socket.io, Redis, MongoDB, MySQL, PostgreSQL, Memcached and Cassandra.

Explore any application or OS metric via the Nodetime metric browser:

Nodetime

NodeTime

Get started for free with Node.js performance monitor with Nodetime.

Nodetime

Nodetime installation is exteremly easy:

npm install nodetime

Once the Nodetime module is available simply add the following to your node application:


require('nodetime').profile({
accountKey: 'xxx',
appName: 'MyApp'
});

Adding application performance monitoring for Node.js application has never been easier. Enjoy!

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Thoughts? Let us know on Twitter @AppDynamics!

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

Turnkey APMaaS by AppDynamics

Since we launched our Managed Service Provider program late last year, we’ve signed up many MSPs that were interested in adding Application Performance Management-as-a-Service (APMaaS) to their service catalogs.  Wouldn’t you be excited to add a service that’s easy to manage but more importantly easy to sell to your existing customer base?

Service providers like Scicom definitely were (check out the case study), because they are being held responsible for the performance of their customer’s complex, distributed applications, but oftentimes don’t have visibility inside the actual application.  That’s like being asked to officiate an NFL game with your eyes closed.

ref

The sad truth is that many MSPs still think that high visibility in app environments equates to high configuration, high cost, and high overhead.

Thankfully this is 2013.  People send emails instead of snail mail, play Call of Duty instead of Pac-Man, listen to Pandora instead of cassettes, and can have high visibility in app environments with low configuration, low cost, and low overhead with AppDynamics.

Not only do we have a great APM service to help MSPs increase their Monthly Recurring Revenue (MRR), we make it extremely easy for them to deploy this service in their own environments, which, to be candid, is half the battle.  MSPs can’t spend countless hours deploying a new service.  It takes focus and attention away from their core business, which in turn could endanger the SLAs they have with their customers.  Plus, it’s just really annoying.

Introducing: APMaaS in a Box

Here at AppDynamics, we take pride in delivering value quickly.  Most of our customers go from nothing to full-fledged production performance monitoring across their entire environment in a matter of hours in both on-premise and SaaS deployments.  MSPs are now leveraging that same rapid SaaS deployment model in their own environments with something that we like to call ‘APMaaS in a Box’.

At a high level, APMaaS in a Box is large cardboard box with air holes and a fragile sticker wherein we pack a support engineer, a few management servers, an instruction manual, and a return label…just kidding…sorry, couldn’t resist.

man in box w sticker

Simply put, APMaaS in a Box is a set of files and scripts that allows MSPs to provision multi-tenant controllers in their own data center or private cloud and provision AppDynamics licenses for customers themselves…basically it’s the ultimate turnkey APMaaS.

By utilizing AppDynamics’ APMaaS in a Box, MSPs across the world are leveraging our quick deployment, self-service license provisioning, and flexibility in the way we do business to differentiate themselves and gain net new revenue.

Quick Deployment

Within 6 hours, MSPs like NTT Europe who use our APMaaS in a Box capabilities will have all the pieces they need in place to start monitoring the performance of their customer’s apps.  Now that’s some rapid time to value!

Self-Service License Provisioning

MSPs can provision licenses directly through the AppDynamics partner portal.  This gives you complete control over who gets licenses and makes it very easy to manage this process across your customer base.

Flexibility

A MSP can get started on a month-to-month basis with no commitment.  Only paying for what you sell eliminates the cost of shelfware.  MSPs can also sell AppDynamics however they would like to position it and can float licenses across customers.  NTT Europe uses a 3-tier service offering so customers can pick and choose the APM services they’d like to pay for.  Feel free to get creative when packaging this service for customers!

Conclusion

As more and more MSPs move up the stack from infrastructure management to monitoring the performance of their customer’s distributed applications, choosing an APM partner that understands the Managed Services business is of utmost importance.  AppDynamics’ APMaaS in a box capabilities align well with internal MSP infrastructures, and our pricing model aligns with the business needs of Managed Service Providers – we’re a perfect fit.

MSPs who continue to evolve their service offerings to keep pace with customer demands will be well positioned to reap the benefits and future revenue that comes along with staying ahead of the market.  To paraphrase The Great One, MSPs need to “skate where the puck is going to be, not where it has been.”  I encourage all you MSPs out there to contact us today to see how we can help you skate ahead of the curve and take advantage of the growing APM market with our easy to use, easy to deploy APMaaS in a Box.  If you don’t, your competition will…

4 Lessons that Operations Can Learn From Batman

In honor of the summer blockbuster The Dark Knight Rises, let’s take a look at one of the most celebrated heroes in history—Batman. (Okay, granted, he’s not real. But that doesn’t mean he can’t help us understand some best practices regarding application uptime and availability, right? Look, just bear with me here.)

1. Always have the right tools.

Batman knows his limitations: he doesn’t have superpowers (unlike my esteemed colleague, App Man). He’s just an ordinary guy with a weird-looking suit and five tons of cash. So what does he do? He buys batarangs, batcopters, bat anti-shark repellant, etc. He knows that in order to give himself an edge against the forces of darkness, he needs serious gadgets.

Similarly, Application Operations teams need to recognize that they shouldn’t wade into a production outage or firefight without some kind of application management solution. It’s amazing how many times we talk to teams that are simply using log files to manage a production application. Log files are not effective tools in a true firefight, and they’re no way to get to root cause quickly when your end users experience poor performance. Get an APM solution that equips you with 10x visibility and code-level detail in production, at less than 2% overhead. Otherwise, your office may start to feel like the rough & tumble streets of Gotham.

2. The bad guys always come back.

How many times has Batman stuck the Joker in Arkham Asylum, only see him escape and start inventing new death traps? But the Dark Knight doesn’t grumble—well, actually, he does, but let’s not worry about that right now—he just suits up and goes back on patrol.

Similarly, poor performing business transactions, memory leaks, and slow SQL calls are never going to disappear entirely, no matter how many iterations your developers perform. Agile release cycles and complex code mean that no environment will be completely free of evil. It’s important to have the right web application monitoring solution to put your rogues gallery on ice, time and time again.

3. War Room sessions are a waste of time.

“It’s the network.” “No, it’s the database.” “No, it’s the application.” How much time does your team spend pointing fingers and establishing innocence? And in the meantime, how many more issues are bound to terrorize your end users?

Batman doesn’t sit in a fire circle and hold hands with his colleagues. He doesn’t parley with the police and he doesn’t make long speeches. He gets the job done. The right application performance management tool can do the same for your own attempts at application heroism: less on the fingerpointing, more on the Mean-Time-to-Resolution.

4. Never stop until you find root cause.

One of the things that people forget about Batman is that he’s not just a superhero: he’s also the World’s Greatest Detective. He uses his brains as often as his fists. And when it comes to a mystery, he’s not content to just a get a vague sense of the story, or poke around the edges of a crime scene. Rather, he wants to find the killer and the smoking gun.

In the world of application performance, that means getting to code-level detail in a production environment. It does not mean using a tool to monitor infrastructure or a network monitor that skims the surface of the application layer.  Those tools can get close, but “close” only works in horseshoes and hand grenades. What’s necessary is to find the true culprit, the method and class that’s causing a production outage or performance issue. And if you have the right tool for the job, you won’t be patrolling all night in order to do it. Rather, you can solve the mystery in literally minutes.

One lesson you might not want to take from the Caped Crusader is to cough up a huge P.O. for your APM solution. Not everyone can be a bazillionaire like Bruce Wayne. And it’s possible to arm yourself for battle without making your IT budget as big and bloated as the Penguin.

Don’t believe me? Check out the pricing for AppDynamics Pro for yourself—then send up a signal to take our 30-Day Trial. When you get a taste of what we can do, you may find yourself swinging from the rooftops.

Finding the Root Cause of Application Performance Issues in Production

The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.

It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.

To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.

End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.

For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.

Why Deep Diagnostics for Production Monitoring Matters

A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.

AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.

Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.

Needle #1 – Slow SQL Statement

Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan

Needle #2 – Slice of Death in Cassandra

Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra

Needle #3 – Slow & Chatty Web Service Calls

Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)

Needle #4 -Extreme XML processing

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.

Needle #5 – Mail Server Connectivity

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity

 Needle #6 – Slow ResultSet Iteration

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data

Needle #7 – Slow Security 3rd Party Framework

Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code

Needle #8 – Excessive SQL Queries

Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction

Needle #9 – Commit Happy

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.

Needle #10 – Locking under Concurrency

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency

 Needle #11 – Slow 3rd Party Search Service

Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code

 Needle #12 – Connection Pool Exhaustion

Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries

Needle #13 – Excessive Cache Usage

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration

If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.

Now go find those needles that are hurting your end users!

App Man.

Why Testing in Production isn’t as stupid as it sounds

One of my colleagues this week was consolidating the results from our recent Application Performance Management survey, and one interesting finding was that 40% of customers have at least one release cycle a month. Out of those respondents, one third experience a Severity-1 incident each month as well. That’s a pretty compelling pair of statistics, and they might explain the continued frustration and conflict between development and operations teams. It’s also perhaps the reason why this DevOps underground movement can no longer be ignored (even by Gartner). There is no doubt development organizations have become agile, but does deploying this frequent change make the business more or less agile? For example, if one in three releases creates a Severity-1 incident, then surely agile development becomes a risk to the business. We’re at the point where Operations either has to start managing change better or simply restrict the amount of change that can occur.

So why are Sev 1 incidents so common? Based on my experiences and customer interaction, I’d strongly argue that testing in development isn’t enough. At the very least, it’s certainly not an insurance policy for deploying an application in production. When a Formula 1 team designs a car in a wind tunnel and tests it on a simulator pre-season, they don’t assume that the performance they see in test will mirror the results they see in a race. Yet, that is pretty much what happens today in the application development lifecycle. Development teams build and test their apps in pre-production before handing it off to operations for deployment in production, and they assume everything will work just fine. This is probably the worst assumption IT has made over the last decade, because development and production environments differ significantly. It’s also a lame excuse for any development team to use when a production issue occurs: “Well it ran fine in test so you must have deployed it wrong.” Yes people make mistakes occasionally, but if one in every three releases has an issue, deployment error may not be the sole reason. If development never get to see how their baby runs in production, they’ll never learn how to build robust, scalable, and high-performance applications.

Online media company gets proactive with application monitoring in production

On Wednesday I delivered a keynote at WJAX in Munich. Everything went really well, but I was a little shocked at the response I got when I asked the audience “How many of you monitor the performance of your apps in production?” As I scanned the audience, I counted 9 out of ~950 developers had put their hands up, meaning about 1% had visibility of how their applications actually performed in production. I know what you’re thinking: “But isn’t application performance in production the responsibility of Operations?”  Well, it is and it isn’t. Most organizations think that when an application has an issue, it’s related to the infrastructure it runs on. That’s like saying when a car crashes, it’s because a part failed on the car whereas in actual fact most accidents are caused by the driver. Yes, hardware fails occasionally, but application logic and configuration drives how infrastructure resource is used, which is why most issues today occur when new code is deployed in production.