How APM Can Support Service Assurance

I’m a New Zealand-based Senior Sales Engineer for AppDynamics with more than 12 years of experience in the IT sector. I’ve had a lot of interesting work-related experiences over the years—living in Thai hospitals, traveling by armed convoy to remote gold mining sites, and even getting down and dirty with calculating the yield of offal and tallow within the primary industries. And while the organizations I’ve worked with spanned a variety of industries, each shared a common objective: the need for service assurance.

The idea of assurance came to me from an AppDynamics customer, whose company was trying to move all of its IT expenditures from CapEx to OpEx, which meant a major shift to managed services. The company had a few key partnerships it relied upon heavily, partners that provided Level 1 support, infrastructure and, in some cases, both application-level support and applications-as-a-service (SaaS).

Rather than buy an application performance monitoring solution directly from an APM provider like AppDynamics, the company wanted to consume APM as a service from another vendor such as a service provider. Why take this approach? The customer—as is often the case with many organizations—was looking for service assurance that an MSP could provide. A laudable goal, certainly, but one with considerable risk as well.

Retaining Visibility

When an organization outsources numerous managed services, it starts to lose visibility into its operation. It grows unsure of what’s going on behind the scenes. And while this organization may be very happy with the managed services it’s receiving, it has no visibility into what’s coming down the line. If, for instance, it’s 10 minutes away from a critical failure, it may not be able to see the event coming. In short, it lacks visibility into its own operations.

In this scenario, the company becomes uncomfortably reliant on its partners, who may not be willing to readily accept blame for a problem, largely because the admission of fault may reflect badly upon them and trigger contractual obligations. And if the company uses multiple MSPs, the situation will be worse, with each MSP pointing the finger away from themselves and saying, “It’s that guy’s fault.”

How APM Can Help

By providing advanced visibility into an application, APM enables the end customer to quickly find the fault, work with the appropriate MSP, and not waste support hours diagnosing a problem with the wrong vendors.

APM offers comfort—or assurity—that the end customer can very quickly identify the area at fault, thereby reducing its mean time to identify (MTTI). And because the customer is working with a single APM provider, it can reduce its mean time to recovery (MTTR) as well.

MSPs benefit, too. Providing service providers with access and visibility into faults can help with reductions in MTTR. And when multiple vendors are involved, the single-pane-of-glass view provided by an APM dashboard can be utilised by all parties for diagnosis and repair.

Keeping Them ‘Honest’

As an end customer, what level of assurance do you have that your MSP is operating as expected? And how do you know if an application, which you invested heavily in, is behaving properly? If your MSP is only telling you that the app is “available,” that’s not good enough.

It’s not always easy to determine whether an application is performing well, though, and which metrics best gauge its overall business impact. In addition to technical metrics, you’ll also need to identify the business transactions (BTs) that are most valuable to you, and whether they’re delivering their expected value.

You should encourage your MSP to have a service-level agreement (SLA) that covers not only technical metrics and BTs, but also valuable business metrics such as sales conversion, digital service adoption, and so on.

APM delivers these insights by providing real-time visibility into the performance of your business. Instead of relying on daily or monthly business intelligence reports to see if you’re meeting your SLAs from a technical and business standpoint, you get this information in real time.

In addition to helping you get the best return from your MSP, these insights enable you to make informed, moment-by-moment decisions, such as driving users to a particular channel as part of a digital transformation, i.e. steering customers away from an overloaded call center and toward online support.

The MSP also benefits from APM, which becomes a value-added service they can offer customers, one providing complete transparency and forging a true strategic partnership with customers.

A True Strategic Partnership

Outsourcing has its advantages but can also lead a loss of control, creating risk for organizations and individuals alike. Service assurance provides a level of comfort, giving an organization control and insight into the performance of its application, as well as confidence that its MSP is providing value.

From the perspective of the MSP, this also demonstrates a willingness and openness for a true strategic partnership with its customer. This strong and trusted partnership is critical to ensure success for the customer, the MSP and, most importantly, service for the end user.

CGI, a leading IT and business process services firm, had a major infrastructure contract that required end-to-end service delivery, but the nature of the environment made this difficult to achieve. To comply with its SLA, CGI needed an efficient way to measure business transactions end-to-end. CGI integrated AppDynamics into its existing platform and immediately began getting insights into system performance, enabling it to demonstrate SLA compliance, get complete end-to-end visibility of business transactions, and build a more robust process between its development, testing and production environments.

Schedule a demo to learn how AppDynamics can help assure your own service success!

6 Strategic Steps to Rock Solid IT Service Assurance

IT Service AssuranceIn most organizations, managing the service level of critical applications is still a challenge. For some there is a lack of strategic planning, for others it’s simply not applying the proper tools and methodology to their everyday work. Regardless of the reason there are steps that need to be taken in all organizations to avoid costly and damaging service disruption.

We’ve Stopped Making Money

One day, while working at an investment bank, I got a phone call requesting my help (it was really more like a plea and an order at the same time) with troubleshooting a business-critical application. I had even heard  chatter in the office about how this application had been unusable for days before I was asked to participate. My role at the bank was as a monitoring architect who tested, reviewed, purchased, and on-boarded new tools among  other responsibilities. As a result, I was one of the people who would get a phone call when difficult problems went unsolved for too long so  I could apply my tools and expertise.

This was a time of great instability in the stock market and our traders were very active. This was also the time when the traders needed this particular application the most and when the bank should have made a small transaction fee for every completed transaction through this application. Simply put, the bank was loosing millions of dollars while this application was performing so poorly.

I started my work with the development team by getting a breakdown of the problem, the conditions leading up to the problem, an overview of the technology, and a demonstration of them recreating the problem in a test environment. Next, I deployed some application monitoring tools into their test environment — since they only had basic OS monitoring and the data that was coming from their load test tool — and watched as they ran more load tests. I could see certain parts of their code degrading as load ramped up and this led me to ask a lot of questions about the logic associated with these parts of the code.

Developer IdeasI worked together with the development team for 2 days asking questions, seeing the mental light bulbs explode through the look in their eyes, and testing the new code they feverishly created after each bottleneck was removed and a new one discovered. After all was said and done the application was upgraded in production at the end my 2 days of involvement. Capable of handling 5 times the throughput helped the traders do their jobs, and most importantly, the bank was ringing the cash register again for each transaction.

Strategic Planning

The worst part is the situation could have been completely avoided. By following a few key rules, the application team could have detected this problem in it’s infancy and minimized or avoided the lengthy and embarrassing production impact.

  1. Where the rubber meets the road – Application performance monitoring IN PRODUCTION is a requirement for any business, mission, or revenue critical applications.
  2. Dev and QA monitoring – Using application monitoring tools in PRE-PRODUCTION will dramatically improve quality of production application releases.
  3. Feedback loop – Constantly apply the information gained in production to your pre-production environment. Use production loading and performance patterns shown by your monitoring tools to prioritize development work and the create more realistic load tests.
  4. Collaboration is king – Development AND Operations personnel should have access to and use the same monitoring tools during load tests to gain the most benefit.
  5. Think strategic instead of tactical – Implement a well thought out monitoring and management strategy starting with your most critical revenue generating applications and working down from there (after rigorous testing of course).
  6. Identify and fix small problems before they turn into big problems – Alerting should be based off of deviation from normal (baseline) behavior in most situations. Minimize the number of static thresholds you use to trigger alerts and make an investment in analytics-driven monitoring platforms. Static thresholds should mostly be used to identify service level breaches.

The reality of the 6 points outlined above is that it takes some initial effort to make the required organization and process changes as well as getting the right tools in place. However, the fact remains the investment is well worth it for business-critical applications. I’ve seen so many groups think they don’t have enough time to invest in strategic initiatives and then they constantly run around firefighting the next battle which should have been avoided in the first place. It’s a vicious cycle that needs to be end. Consider the tips listed above and break the cycle starting right now.