Managing Software Reliability Metrics: How to Build SRE Dashboards That Drive Positive Business Outcomes

Customers expect your business application to perform consistently and reliably at all times—and for good reason. Many have built their own business systems based on the reliability of your application. This reliability target is your service level objective (SLO), the measurable characteristics of a service level agreement (SLA) between a service provider and its customer.

The SLO sets target values and expectations on how your service(s) will perform over time. It includes service level indicators (SLIs)—quantitative measures of key aspects of the level of service—which may include measurements of availability, frequency, response time, quality, throughput and so on.

If your application goes down for longer than the SLO dictates, fair warning: All hell may break loose, and you may experience frantic pages from customers trying to figure out what’s going on. Furthermore, a breach to your SLO error budget—the rate at which service level objectives can be missed—could have serious financial implications as defined in the SLA.

Why an Error Budget?

Developers are always eager to release new features and functionality. But these upgrades don’t always turn out as expected, and this can result in an SLO violation. With that being said, your SRE team should be able to do deployments and system upgrades as needed, but anytime you make changes to applications, you introduce the potential for instability.

An error budget states the numeric expectations of SLA availability. Without one, your customer may expect 100% reliability at all times. The benefit of an error budget is that it allows your product development and site reliability engineering (SRE) teams to strike a balance between innovation and reliability. If you frequently violate your SLO, the teams will need to decide whether its best to pull back on deployment and spend more time investigating the cause of the SLO breach.

For example, imagine that an SLO requires a service to successfully serve 99.999% of all queries per quarter. This means the service’s error budget has a failure rate of 0.001% for a given quarter. If a problem causes a 0.0002% failure rate, it will consume 20% of the service’s quarterly error budget.

Don’t Aim for Perfection

Developing a workable SLO isn’t easy. You need to set realistic goals, as aiming for perfection (e.g. 100% availability) can prove very expensive and nearly  impossible to achieve. Your SRE team, which is responsible for the daily operation of an application in production, must work with interested parties (e.g., product owners) to find the correct transactions to monitor for your SLO.

To begin, you must define your SLIs to determine healthy levels of service, and then use metrics that expose a negative user experience. Your engineering and application teams must decide which metric(s) to monitor, since they know the application best. A typical approach is to find a key metric that represents your SLO. For instance, Netflix uses its starts-per-second metric as an indicator of overall system health, because its baselining has led the company to expect X number of starts within any given timeframe.

Once you’ve found the right metrics, make them visible on a dashboard. Of course, not all metrics are useful. Some won’t need alerts or dashboard visibility, and you’ll want to avoid cluttering your dashboard with too many widgets. Treat this as an iterative process. Start with just a few metrics as you gain a better understanding of your system’s performance. You also can implement alerting—email, Slack, ticketing and so on—to encourage a quick response to outages and other problems.

People often ask, “What happens when SLOs aren’t met?”

Because an SLA establishes that service availability will meet certain thresholds over time, there may be serious consequences for your business—including the risk of harming your reputation and, of course, financial loss resulting from an SLO breach and a depletion of your error budget. Since the penalty for an SLA violation can be severe, your SRE team should be empowered to fix problems within the application stack. Depending on the team’s composition, it’s possible they’ll either release a fix to the feature code, make changes to the underlying platform architecture or, in a severe case, ask the feature team to halt all new development until your service returns to an acceptable level of stability as defined by the error budget.

How AppDynamics Helps You

AppDynamics enables you to track numerous metrics for your SLI.

But you may be wondering, “Which metrics should I use?”

AppD users are often excited—maybe even a bit overwhelmed—by all the data collected, and they assume everything is important. But your team shouldn’t constantly monitor every metric on a dashboard. While our core APM product provides many valuable metrics, AppDynamics includes many additional tools that deliver deep insights as well, including End User Monitoring (EUM), Business iQ and Browser Synthetic Monitoring.

 Let’s break down which AppDynamics components your SRE team should use to achieve faster MTTR:

  • APM: Say your application relies heavily on APIs and automation. Start with a few API you want to monitor and ask, “Which one of these APIs, if it fails, will impact my application or affect revenue?”  These calls usually have a very demanding SLO.

  • End User Monitoring: EUM is the best way to truly understand the customer experience because it automatically captures key metrics, including end-user response time, network requests, crashes, errors, page load details and so on.

  • Business iQ: Monitoring your application is not just about reviewing performance data.  Biz iQ helps expose application performance from a business perspective, whether your app is generating revenue as forecasted or experiencing a high abandon rate due to degraded performance.

  • Browser Synthetic Monitoring: While EUM shows the full user experience, sometimes it’s hard to know if an issue is caused by the application or the user. Generating synthetic traffic will allow you to differentiate between the two.

So how does AppDynamics help monitor your error budget?

After determining the SLI, SLO and error budget for your application, you can display your error budget on a dashboard. First, convert your SLA to minutes—for example, 99.99% SLO allows 0.01% error budget and only 8.77 hours (526 minutes) of downtime per year. You can create a custom metric to count the duration of SLO violation and display it in a graph. Of course, you’ll need to take maintenance and planned downtime into consideration as well.

With AppDynamics you can use key metrics such as response time, HTTP error count, and timeout errors. Try to avoid using system metrics like CPU and memory because they tell you very little about the user experience. In addition, you can configure Slow Transaction Percentile to show which transactions are healthy.

Availability is another great metric to measure, but keep in mind that even if your application availability is 100%, that doesn’t mean it’s healthy. It’s best to start building your dashboard in the pre-prod environment, as you’ll need to time tweak thresholds and determine which metric to use with each business transaction. The sooner AppDynamics is introduced to your application SDLC, the more time your developers and engineers will have to get acclimated to it.

 What does the ideal SRE dashboard look like? Make sure it has these KPIs:

  • SLO violation duration graph, response time (99th percentile) and load for your critical API calls

  • Error rate

  • Database response time

  • End-user response time (99th percentile)

  • Requests per minute

  • Availability

  • Session duration

Providing Value to Customers with Software Reliability Metric Monitoring

SLI, SLO, SLA and error budget aren’t just fancy terms. They’re critical to determining if your system is reliable, available or even useful to your users. You should be able to measure these metrics and tie them to your business objectives, as the ultimate goal of your application is to provide value to your customers.

Learn how AppDynamics can help measure your business success.

How to Ensure Your Applications Meet Business Goals

It makes sense that every application you use should provide tangible business value. But IT has long been stymied in proving this is true.

Traditional monitoring tools let you know how your systems were performing but offered little insight into whether a delayed response from a database—or any other performance issue—was having a cascading effect on business goals. Conversely, web analytics tools revealed how website interactions affected revenue but failed to connect adverse user behavior—like a spike in abandoned shopping carts—with a root cause.

For years, this lack of visibility has been accepted as the status quo. No longer. CIOs are increasingly taking action to capture business value. In this blog post, I’ll explain how you too can monitor your applications to ensure they are contributing to business goals as well as meeting more traditional IT performance benchmarks.

To begin, you’ll want to put together a cross-functional team that includes someone familiar with the application or applications in question from a technical perspective, someone from operations who is familiar with production support of the application, and someone who has a high-level understanding of the value the application provides to the business. This team will be responsible for defining the value of the application in measurable terms and identifying the relevant use case(s).

Defining Business Value

Applications that have a significant impact on business results usually fall into three buckets. The first bucket is obvious: Externally facing applications that bring in revenue like eCommerce or Bill Pay immediately affect the bottom line if they slow down or become unavailable. What needs to be determined by the team is how many dollars are at stake under different scenarios. In the second bucket are externally facing applications that mediate customer interactions and are generally not directly linked to revenue. However, these apps, which allow a customer to check an account balance, look up product information, or request help from a service rep, can be closely tied to other important metrics like new signups, adoption of new products and services, and customer churn. Performance failures affect those metrics, which in turn affects revenue. Over time, the poor performance of these apps will erode the value of a brand. The third bucket includes internally facing applications that deliver core business functionality like underwriting, policy quoting, and order fulfillment. The apps in this bucket drive revenue, but it is more common for them to affect metrics related to employee productivity. If these applications go down, employees are not able to get work done.

Other apps may not have a noticeable impact on the outward-facing business if they go down, but their effect on productivity and employee morale makes monitoring necessary. These include internally facing tertiary applications that handle human resource functions like onboarding, employee wellness, or expense management.

Establishing a Use Case

Once the business value of an application is determined, the next step is to understand your use case. Common use cases include business health monitoring, user journey monitoring, business journey monitoring, customer segment monitoring, and release validation. I’ll explain more about each use case below. While it is typically easier to brainstorm your own use case after learning what others have done, I expect many companies will end up creating their own unique use cases or blending common use cases together.

Business Health Monitoring: This applies to business owners who want to understand the impact of application performance on key business drivers. An example is an e-commerce site managing sales for different brands. The business owner is interested in conversion rates, the number of orders processed, total sales, and the percentage of customers moving into a loyalty program. He or she uses business health monitoring to determine the root cause behind a change in a KPI. For example, a decline in sales may be caused by an application error or a business problem.

User Journey Monitoring: User journeys are the pathways that users take through an application from start to finish. Applications that benefit from user journey monitoring are those in which the business has a vested interest in the user finishing a journey. These can be as simple as a conversion funnel for a user attempting to buy a book or a complex combination of searching, quoting, and purchasing an insurance policy. We want to understand where our users are falling out (or choosing not to continue) during their journeys so we can easily determine if their behavior is caused by application issues or something external to the application.

Business Journey Monitoring:  Rather than analyzing performance through the lens of a user, business journey monitoring is a way of evaluating the success of a holistic business process. The challenge is that a business process is likely to span applications, services, and events and involve multistep workflows. Looking at whether a complex process is meeting business objectives typically involves breaking it up into milestones and events that comprise those milestones. For a loan application, these would include submission, documents verification, credit approval, insurance underwriting, and final approval.

Customer Segment Monitoring: Identifying the most valuable users of an application and protecting them from performance problems is one way of ensuring better business outcomes. Take the case of an insurance company that manages a portal for financial professionals who want to purchase annuities on behalf of their clients. The professionals with the largest books of business are likely to generate the most revenue for the insurance company. By segmenting customers into tiers based on their number of clients, the insurance company can create health rules around their most valuable users and prioritize their issues.

Release Validation: Whether you are updating an existing application in your data center or migrating to the cloud, you need to be able to compare the before and after state of the application in reference to business KPIs. If you operate a hotel chain, your KPIs will likely include total revenue, average daily rate, and number of bookings. If KPIs fall after a new release or migration, you will want to determine the root cause and prioritize the resolution based on its impact on revenue. You may also want to measure how well a new release improved those KPIs to help validate the decision to invest the time and resources to improve that application.

Executing a Use Case

After you have determined the use case that is appropriate to your business, you will need to identify key metrics. Metrics can be as simple as the dollar value in a cart for an eCommerce application or the size or type of an internal transaction for an underwriting application. As you identify your metrics, you need to decide how you want to tell the story. It’s typically best to start with a whiteboard session where you outline what you want your dashboard(s) to look like.

This helps ensure that a layperson will be able to quickly and easily understand if the business is healthy or not.

The final step is to define your data collectors and build your dashboard. You will also want to create alerts if trends begin to deviate from established baselines. You are now ready to let the rubber of business value, use cases, and key metrics meet the road of real-time data. As you learn how application performance is affecting your business, you can now act on those results!

Learn more about how AppDynamics helps you drive business performance through application performance with Business iQ!

SAP Hybris: Achieving Omni-channel with Application Performance

The retail industry is transforming in today’s digital age. Consumer behavior keeps evolving and online spending is skyrocketing. In addition, retail customers are looking for a blended online and in-store experience and expecting brick-and-mortar stores and online channels to be integrated through an omni-channel strategy. This requires retailers to become customer obsessed and strive to deliver personalized, convenient and seamless shopping experiences online and in the stores.

SAP Hybris omni-channel commerce is well positioned to help these customer focused retailers in their journey of digital transformation by offering a single system for managing product content, commerce operations and channels. Thus helping retailers, manufacturers and others to create a unified and seamless cross-channel experience for their customers – from online, to in-store, to mobile and beyond.

Performance monitoring of SAP Hybris E-Commerce environment is key to delivering exceptional customer experience  

Today, e-commerce, mobile applications, and an integrated omni-channel strategy are key to success for retailers. However, poor performing e-commerce and mobile applications taking over 3 seconds to respond are fatal to retailer’s reputation, brand and revenue.There are no second chances in this digital world: when you’re not available, someone else is – your competitors are just a click away. This means that ensuring flawless performance and optimizing customer experience is critical for retail success.

Although the SAP Hybris E-Commerce solution is very well positioned to help retailers by offering a single system for managing product content, commerce operations and channels, a sophisticated end-to-end monitoring solution for SAP Hybris becomes necessary to quickly isolate and resolve performance issues in order to ensure exceptional end-user experience.

Retailers deploying SAP Hybris based E-Commerce and mobile apps face many challenges as they try to effectively manage the end-to-end customer experience, including:

  • Avoiding application performance problems impacting the consumer. Technical glitches, especially during peak periods (i.e. Black Friday and Cyber Monday) impact revenue immediately as many of the consumers would be deterred from using a retailer again after a negative experience.

  • Promoting agility in software management processes. To stay ahead of the competition, retailers need to move to an agile operating model. This is essential because with the right management solutions it allows for a fast Mean Time To Resolution (MTTR) of application performance issues and enables teams to work together when developing or enhancing application offerings.

  • Securing 5 star rated mobile apps. The retail application landscape is as competitive as the industry as a whole. The number of apps in use is growing by day meaning highly responsive, convenient and usable apps are a must to secure 5 star app ratings.

  • Correlating application and customer experience data. Applications are the primary channel for customer engagement. Unfortunately, without substantial investment into building a custom analytics solution, retailers can’t get actionable insights.

SAP Hybris: Open and extensible Java based architecture

The execution environment for the Hybris platform is a Java EE Servlet Container, for example Tomcat  or VMware vFabric Server, which is also based on Tomcat but provides commercial support.

The platform and all extensions to it run within the Spring environment, which allows easy configuration of each component and provides generic logic such as security, caching, clustering and persistence.  

 

Screen Shot 2016-03-16 at 5.24.10 PM.png

 

Figure 1: Architectural overview of the hybris Commerce Suite

An extension may simply provide additional business logic without exposing a visible UI or it may also contain a Frontend Java Web Application. A natural framework choice to realize the frontend web application in hybris Spring environment is to use the Spring MVC Framework, but any Java Web Framework such as JSF or Struts may be used.

AppDynamics Application Intelligence Platform ensures flawless SAP Hybris Commerce performance

As mentioned above, the execution environment of Hybris platform is a Java EE Servlet Container and most of the extensions and front-end applications are developed using Java or Java framework.

AppDynamics Application Performance Management (APM), a module of AppDynamics Application Intelligence Platform, is one of the leading Java APM solutions in the industry. Retailers can get complete visibility into the most complex Java powered Hybris based retail and omni-channel commerce application out of the box with AppDynamics APM. With APM, end-user monitoring, infrastructure visibility and application analytics modules, AppDynamics Application Intelligence Platform integrates monitoring, troubleshooting, and analytics capabilities to provide real-time, actionable IT operational and business insights into Hybris based application performance, user experience, and business outcomes — all in real time, and all in production.

Picture1.png

Figure 2: AppDynamics Flowmap of  Hybris based retail commerce application

AppDynamics delivers a comprehensive solution to help retailers maximize their business performance. The platform embraces three key principles:

  • See faster with Unified Monitoring: Identify customer-impacting issues quickly with end-to-end business transaction monitoring.

  • Act sooner with Unified Troubleshooting: Minimize business impact with rapid problem resolution.

  • Know more with Unified Analytics: Correlate application performance to business impact.

All of this happens in real time, in production, giving retailers more visibility, understanding, and control across applications, infrastructure, and user experience. The platform offers the added flexibility of SaaS or on-premises deployment, in order to match and flex with business requirements and data ownership.

Key AppDynamics Features  

  • Automatically visualize and map Java based Hybris solution dependencies

  • Monitor JVM health and performance

  • Automatically baseline performance to alert and address  emerging issues in context of Business Transactions

  • Quickly isolate and resolve production java application performance issues at code-level depth with minimal overhead

  • Enhance Dev & Ops collaboration with role-based views and Virtual War Room

  • End to end visibility into application environment with End-user Monitoring, APM and Infrastructure visibility modules

  • Actionable insights into application performance, user experience, and business outcomes

Enterprises of every description are pursuing digital transformation to satisfy user expectations for always-on, always effective engagement, and to realize the competitive efficiencies and advantages of digital delivery. Digital Transformation is not a choice for retailers, it’s a business imperative. This means that ensuring flawless performance and optimizing customer experience is critical to retail success.

AppDynamics Application Intelligence helps retailers, including those leveraging Hybris to power their applications, take their digital strategies from good to great by ensuring mobile and eCommerce performance, allowing business, dev and ops teams to collaborate easily and automatically correlating technical performance with business outcomes. Take a walk through the platform and see how your enterprise can leverage application performance to gain the most of their applications. 

Top 10 Application Performance Problems

A transaction is defined as one or multiple threads running in or across server boundaries on multiple runtime environments. And with today’s rapid enterprise growth, the demand for high-performance transactions through software run applications is higher than ever before, which means a greater need for these threads to be run on their respective platforms efficiently, and without delay.

Principal Sales Engineer, Hugh Brien, shares some of his findings on the common application problems discovered in transaction threads and the core concepts to think about while investigating issues to ultimately optimize the end user’s experience. Some of the main findings include server configuration in thread pools, identifying a correlation between load and response times to find request overloads, I/O bottlenecks, and improper memory configurations for transaction volumes. Hugh also walks us through strategies and measures specific to Java, .NET and more to further identify potential problem areas, and how APM exposes them beforehand.

Brien also spoke on what it takes to start constructing a defined process to understand a customer’s pain points, and further clarify the best practices on how to best utilize application intelligence and avoid future troubleshooting.

Browse through the deck today!

Top 10 Application Problems from AppDynamics

Speed Kills: Every (milli)second counts [INFOGRAPHIC]

Application performance tends to be one of those things everyone knows is important, but it’s hard to put a specific number to just how important.

Earlier this year, Gartner published their findings (subscription required) from Google, Amazon, Yahoo, Microsoft, and many others that showed as performance, most notably speed, slows down, their revenue is negatively affected as well. Though not inherently surprising, what is surprising is by how much. These companies would lose millions of dollars in revenue for every fraction of a second their application slowed down. Running at anything but optimal speed was taking a chunk out of their bottom line.

We took some of these stats and created an infographic to paint the visual about how important every millisecond is in regards to your application performance, and bottom line.

Speed-Kills-8

Don’t lose millions off your bottom line, try AppDynamics for FREE today!

Insights from an Investment Banking Monitoring Architect

To put it very simply, Financial Services companies have a unique set of challenges that they have to deal with every day. They are a high priority target for hackers, they are highly regulated by federal and state governments, they deal with and employ some of the most demanding people on the planet, problems with their applications can have an impact on every other industry across the globe. I know this from first hand experience; I was an Architect at a major investment bank for over 5 years.

In this blog post I’m going to show you what’s really important when Financial Services companies consider application monitoring solutions and warn you about the hidden realities that only expose themselves after you’ve installed a large enough monitoring footprint.

1 – Product Architecture Plays a Major Role in Long Term Success or Failure

Every monitoring tool has a different core architecture. On the surface these architectures may look similar but it is imperative to dive deeper into the details of how all monitoring products work. We’ll use two real product architectures as examples.

Monitoring Architecture“APM Solution A” is an agent based solution. This means that a piece of vendor code is deployed to gather monitoring information from your running applications. This agent is intelligent and knows exactly what to monitor, how to monitor, and when to dial itself back to do no harm. The agent sends data back to central collector (called a controller) where this data is correlated, analyzed, and categorized automatically to provide actionable intelligence to the user. With this architecture the agent and the controller are very loosely coupled which lends itself well to highly distributed, virtualized environments like you see in modern application architectures.

“APM Solution B” is also agent based. They have a 3 tiered architecture which consists of agents, collectors, and servers. On the surface this architecture seems reasonable but when we look at the details a different story emerges. The agent is not intelligent therefore it does not know how to instrument an application. This means that every time an application is re-started, the agent must send all of the methods to the collector so that the collector can tell the agent how and what to instrument. This places a large load on the network, delays application startup time, and adds to the amount of hardware required to run your monitoring tool. After the collector has told the agent what to monitor the collectors job is to gather the monitoring data from the agent and pass it back to the server where it is stored and viewed. For a single application this architecture may seem acceptable but you must consider the implications across a larger deployment.

Choosing a solution with the wrong product architecture will impact your ability to monitor and manage your applications in production. Production monitoring is a requirement for rapid identification, isolation and repair of problems.

2 – Monitoring Philosophy

Monitoring isn’t as straight forward as collecting, storing, and showing data. You could use that approach but it would not provide much value. When looking at monitoring tools it’s really important to understand the impact of monitoring philosophy on your overall project and goals. When I was looking at monitoring tools I needed to be able to solve problems fast and I didn’t want to spend all of my time managing the monitoring tools. Let’s use examples to illustrate again.

Application Monitoring Philosophy“APM Solution A” monitors every business transaction flowing through whatever application it is monitoring. Whenever any business transaction has a problem (slow or error) it automatically collects all of the data (deep diagnostics) you need to figure out what caused the problem. This, combined with periodic deep diagnostic sessions at regular intervals, allows you to solve problems while keeping network, storage, and CPU overhead low. It also keeps clutter down (as compared to collecting everything all the time) so that you solve problems as fast as possible.

“APM Solution B” also monitors every transaction for each monitored application but collects deep diagnostic data for all transactions all the time. This monitoring philosophy greatly increases network, storage, and CPU overhead while providing massive amounts of data to work with regardless of whether or not there are application problems.

When I was actively using monitoring tools in the Investment Bank I never looked at deep diagnostic data unless I was working on resolving a problem.

3 – Analytics Approach

Analytics comes in many shapes and sizes these days. Regardless of the business or technical application, analytics does what humans could never do. It creates actionable intelligence from massive amounts of data and allows us to solve problems much faster than ever before. Part of my process for evaluating monitoring solutions has always been determining just how much extra help each tool would provide in identifying and isolating (root cause) application problems using analytics. Back to my example…

“APM Solution A” is an analytics product at it’s core. Every business transaction is analyzed to create a picture of “normal” response time (a baseline). When new business transactions deviate from this baseline they are automatically classified as either slow or very slow and deep diagnostic information is collected, stored, and analyzed to help identify and isolate the root cause. Static thresholds can be set for alerting but by default, alerts are based upon deviation from normal so that you can proactively identify service degradation instead of waiting for small problems to become major business impact.

“APM Solution B” only provides baselines for the business transactions you have specified. You have to manually configure the business transactions for each application. Again, on small scale this methodology is usable but quickly becomes a problem when managing the configuration of 10’s, 100’s, or 1000’s of applications that keep changing as development continues. Searching through a large set of data for a problem is much slower without the assistance of analytics.

Monitoring Analytics

4 – Vendor Focus

When you purchase software from a vendor you are also committing to working with that vendor. I always evaluated how responsive every vendor was during the pre-sales phase but it was hard to get a good measure of what the relationship would be like after the sale. No matter how good the developers are, there are going to be issues with software products. What matters the most is the response you get from the vendor after you have made the purchase.

5 – Ease of Use

This might seem obvious but ease of use is a major factor in software delivering a solid return on investment or becoming shelf-ware. Modern APM software is powerful AND easy to use at the same time. One of the worst mistakes I made as an Architect was not paying enough attention to ease of use during product evaluation and selection. If only a few people in a company are capable of using a product then it will never reach it’s full potential and that is exactly what happened with one of the products I selected. Two weeks after training a team on product usage, almost nobody remembered how to use it. That is a major issue with legacy products.

Enterprise software is undergoing a major disruption. If you already have monitoring tools in place, now is the right time to explore the marketplace and see how your environment can benefit from modern tools. If you don’t have any APM software in place yet you need catch up to your competition since most of them are already looking or have already implemented APM for their critical applications. Either way, you can get started today with a free trial of AppDynamics.

AppDynamics goes to OSCON

The AppDynamics team is in Portland, Oregon this week to showcase AppDynamics at the O’Reilly Open Source Convention. This event is a great opportunity to meet with the community and connect with thought leaders in the open source world. Stop by our booth for office hours if you’re in the area this week!

AppDynamics at OSCON

If you are attending OSCON join my birds of a feather sessions to talk about scaling PHP in the real world and quantifying the value of DevOps.

Find out more about AppDynamics and get started with a free 15 day trial.

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

Turnkey APMaaS by AppDynamics

Since we launched our Managed Service Provider program late last year, we’ve signed up many MSPs that were interested in adding Application Performance Management-as-a-Service (APMaaS) to their service catalogs.  Wouldn’t you be excited to add a service that’s easy to manage but more importantly easy to sell to your existing customer base?

Service providers like Scicom definitely were (check out the case study), because they are being held responsible for the performance of their customer’s complex, distributed applications, but oftentimes don’t have visibility inside the actual application.  That’s like being asked to officiate an NFL game with your eyes closed.

ref

The sad truth is that many MSPs still think that high visibility in app environments equates to high configuration, high cost, and high overhead.

Thankfully this is 2013.  People send emails instead of snail mail, play Call of Duty instead of Pac-Man, listen to Pandora instead of cassettes, and can have high visibility in app environments with low configuration, low cost, and low overhead with AppDynamics.

Not only do we have a great APM service to help MSPs increase their Monthly Recurring Revenue (MRR), we make it extremely easy for them to deploy this service in their own environments, which, to be candid, is half the battle.  MSPs can’t spend countless hours deploying a new service.  It takes focus and attention away from their core business, which in turn could endanger the SLAs they have with their customers.  Plus, it’s just really annoying.

Introducing: APMaaS in a Box

Here at AppDynamics, we take pride in delivering value quickly.  Most of our customers go from nothing to full-fledged production performance monitoring across their entire environment in a matter of hours in both on-premise and SaaS deployments.  MSPs are now leveraging that same rapid SaaS deployment model in their own environments with something that we like to call ‘APMaaS in a Box’.

At a high level, APMaaS in a Box is large cardboard box with air holes and a fragile sticker wherein we pack a support engineer, a few management servers, an instruction manual, and a return label…just kidding…sorry, couldn’t resist.

man in box w sticker

Simply put, APMaaS in a Box is a set of files and scripts that allows MSPs to provision multi-tenant controllers in their own data center or private cloud and provision AppDynamics licenses for customers themselves…basically it’s the ultimate turnkey APMaaS.

By utilizing AppDynamics’ APMaaS in a Box, MSPs across the world are leveraging our quick deployment, self-service license provisioning, and flexibility in the way we do business to differentiate themselves and gain net new revenue.

Quick Deployment

Within 6 hours, MSPs like NTT Europe who use our APMaaS in a Box capabilities will have all the pieces they need in place to start monitoring the performance of their customer’s apps.  Now that’s some rapid time to value!

Self-Service License Provisioning

MSPs can provision licenses directly through the AppDynamics partner portal.  This gives you complete control over who gets licenses and makes it very easy to manage this process across your customer base.

Flexibility

A MSP can get started on a month-to-month basis with no commitment.  Only paying for what you sell eliminates the cost of shelfware.  MSPs can also sell AppDynamics however they would like to position it and can float licenses across customers.  NTT Europe uses a 3-tier service offering so customers can pick and choose the APM services they’d like to pay for.  Feel free to get creative when packaging this service for customers!

Conclusion

As more and more MSPs move up the stack from infrastructure management to monitoring the performance of their customer’s distributed applications, choosing an APM partner that understands the Managed Services business is of utmost importance.  AppDynamics’ APMaaS in a box capabilities align well with internal MSP infrastructures, and our pricing model aligns with the business needs of Managed Service Providers – we’re a perfect fit.

MSPs who continue to evolve their service offerings to keep pace with customer demands will be well positioned to reap the benefits and future revenue that comes along with staying ahead of the market.  To paraphrase The Great One, MSPs need to “skate where the puck is going to be, not where it has been.”  I encourage all you MSPs out there to contact us today to see how we can help you skate ahead of the curve and take advantage of the growing APM market with our easy to use, easy to deploy APMaaS in a Box.  If you don’t, your competition will…