How to Identify Impactful Business Transactions in AppDynamics

New users of APM software often believe their company has hundreds of critical business transactions that must be monitored. But that’s not the case. In my role as Professional Services Consultant (EMEA) at AppDynamics, I’ve worked at dozens of customer sites, and the question of “What to monitor?” is always foremost in new users’ minds.

AppDynamics’ Business Transactions (BTs) reflect the core value of your applications. Since our inception a decade ago, we’ve built our APM solution around this concept. Given the critical importance of Business Transactions, you’ll want to configure them the right way. While AppDynamics will automatically create BTs for you, you’ll benefit greatly by taking a few extra steps to optimize your monitoring environment.

APM users often think of a BT as a technical transaction in their system, but it’s much more than that. The BT is a key component of effective application monitoring. It consists of all required  services within your environment—things like login, search and checkout—that are utilized to fulfill and respond to a user-initiated request. These transactions reflect the logical way users interact with your applications. Activities such as adding an item to a shopping cart or checking out will summon various applications, databases, third-party APIs and web services.

If you’re new to APM, you may find yourself asking “Where should I begin?” By applying essential best practices, BT configuration can be a smooth and orderly process.

Start by asking yourself two key questions:

  1. What are my business goals for monitoring?
  2. What pain points am I trying to address by using APM?

You may already know the answers. Perhaps you want to resolve major problems that consume a lot of your time and resources, or insure that your most critical business operations are performing optimally. From there, you can drill down to more specific goals and operations to focus on. A retail website, for instance, may choose to focus on its checkout or catalog operation. Financial services firms may focus on the most-used APIs provided for their mobile clients. By prioritizing your business goals early in the process, you’ll find BTs much easier to configure.

AppDynamics automatically discovers and maps Business Transactions for you. Actions like Add to Cart are tagged and traced across every component of your application and visualized on a topology map, helping you to better understand performance across an entire application.

It’s tempting to think configuration is complete once you’ve instrumented with an agent and start seeing traffic coming in. But that’s just the technical side of things. You’ll also need to align with the business, asking questions like, “Do we have SLAs on this?” and “What’s the performance requirement?” You’ll also need to establish health rules and work with the business to determine, for instance, what action to take if a particular rule is violated.

Choose Your BTs Wisely

At a high level, a Business Transaction is more like a use case, even though users often think of it as a technical transaction. Sometimes I must remind users: “No, this activity you want to monitor is not a business transaction. It’s just a technical functionality of the system, but it’s not being used by a customer or an API user.” These cross-cutting metrics may be better served by monitoring through views like Service Endpoints or specific technical metrics.

Be very selective when choosing your Business Transactions. Here’s a rule of thumb: Configure up to 20 to 30 BTs per business application. This may not seem like a lot, but really it is. One of AppDynamics’ largest banking customers identified that 90% of its business activity was reflected in just 25 or so business transactions.

It’s not uncommon for new users to balk at this. They may say, “But we have many more important processes to track!” Fear not: the recommended number of BTs isn’t set in stone, although our 20-to-30 guideline is a good starting point. You may have 20 key Business Transactions and another 20 that are less critical, but you really want to monitor all 40. You can do this, of course, but you’ll need to prioritize these transactions. Capturing too many BTs can lead users to miss the transactions that are truly important to the business.

Best Practices

During APM setup, you’ll have many questions. Should you work exclusively with your own technical team? With the application owner? The business that’s using the application?

Start with these three key steps:

  1. Get to know your business.
  2. Identify the major flows.
  3. Talk to the application owner.

 

Whenever I’m onsite with a customer, the first thing I advise is that we login as an end user to see how they use the system. For example, we’ll order a product or renew a subscription, and then track these transactions end-to-end through the system. This very important step will help you identify the transactions you want to monitor.

It’s also critical to check the current major incidents you have, or at least the P1s and P2s. Find out what problems you’re experiencing right now. What are the major complaints involving the application?

Focus on the the low-hanging fruit—your most troublesome applications—which you’ll find by instrumenting systems and talking to applications owners. This will deliver value in the early setup stage, providing information you can take to the business to make them more receptive to working with you.

Prioritize Your Operations

Business Transactions are key to configuring APM. Before starting configuration, ask yourself these critical questions:

  1. What are my business goals for monitoring?
  2. What pain points am I trying to solve with AppDynamics?
  3. What are the typical problems that take up my time and resources?
  4. What are the most critical business operations that need to perform optimally?

 

Then take a closer look at your application. Decide which operations you must focus on to achieve your goals.

These key steps will help you prioritize operations and make it easier to configure them as Business Transactions. Go here to learn more!

Pivotal Cloud Foundry: Two Tiles Are Better Than One

In an earlier blog in our series on monitoring applications deployed to the Pivotal Cloud Foundry (PCF) platform, my colleague Jeff Holmes described how AppDynamics provides an intuitive and user-friendly dashboard for a single view of all your key performance indicators for system health and availability. This broke new ground and has been warmly welcomed by our many customers who rely on PCF to run their business applications. One enhancement request we heard from several customers was that they wanted to separate the task of configuring their APM agents from that of setting up the infrastructure monitoring. That made sense, and so we now offer two distinct Service Broker tiles on the Pivotal Network:

  1. AppDynamics Application Monitoring for PCF provides a single, convenient way to configure APM agents in all the various buildpacks that might deploy to PCF. It allows you to define different configurations for multiple environments (e.g., dev/test/stage) and also configure AppDynamics Transaction Analytics to take advantage of our unique Business iQ platform.

  2. AppDynamics Platform Monitoring for PCF is where you go to set up infrastructure monitoring with all the KPI dashboard, health rules and alerting that Jeff describes in his blog. For advanced users, it also allows you to configure exactly how AppDynamics will fetch those KPI metrics from the Loggregator system.

There are at least a couple of reasons why this is an ideal duo. The first is that now you can choose exactly which Service Broker functionality you need for a given environment. If you only need to deploy our APM agents into a dev/test environment and don’t need platform monitoring capabilities, you can deploy only the first tile. In production environments, you would deploy both.

The second reason, although less obvious, is also important. By separating out the tiles in this way, we can make them much more lightweight—a significant benefit, particularly for those customers who routinely deploy multiple Service Broker tiles. In fact, the two new tiles together are now an order of magnitude smaller than the original, single tile.

There are some important changes under the hood as well. We no longer need to deploy an instance of the AppDynamics Machine Agent to host the custom nozzle implementation that we use to gather KPI metrics from the Loggregator. Instead, we have a small, lightweight agent written in Go (to match the Loggregator libraries from Pivotal that are part of the core platform). This agent uses our libagent shared library to connect to the AppDynamics controller. The result is a small, highly performant nozzle that is closely aligned with Cloud Foundry’s metrics and logging systems.

We’ve already received great feedback from early adopters, and we believe this is something that our many customers who use AppDynamics to monitor apps on PCF will want to check out. Upgrading from previous versions of the AppDynamics Service Broker Tile is simple, too. The new tiles are available from the Pivotal Network and there is full documentation on our Pivotal Partner Docs site. This definitely is a case where two tiles are better than one!

AppDynamics Application Monitoring for PCF
Download: https://network.pivotal.io/products/p-appdynamics
Docs: https://docs.pivotal.io/partners/appdynamics/index.html

AppDynamics Platform Monitoring for PCF
Download: https://network.pivotal.io/products/appdynamics-platform
Docs: https://docs.pivotal.io/partners/appdynamics-platform/index.html

 

Monitoring Kubernetes and OpenShift with AppDynamics

Here at AppDynamics, we build applications for both external and internal consumption. We’re always innovating to make our development and deployment process more efficient. We refactor apps to get the benefits of a microservices architecture, to develop and test faster without stepping on each other, and to fully leverage containerization.

Like many other organizations, we are embracing Kubernetes as a deployment platform. We use both upstream Kubernetes and OpenShift, an enterprise Kubernetes distribution on steroids. The Kubernetes framework is very powerful. It allows massive deployments at scale, simplifies new version rollouts and multi-variant testing, and offers many levers to fine-tune the development and deployment process.

At the same time, this flexibility makes Kubernetes complex in terms of setup, monitoring and maintenance at scale. Each of the Kubernetes core components (api-server, kube-controller-manager, kubelet, kube-scheduler) has quite a few flags that govern how the cluster behaves and performs. The default values may be OK initially for smaller clusters, but as deployments scale up, some adjustments must be made. We have learned to keep these values in mind when monitoring OpenShift clusters—both from our own pain and from published accounts of other community members who have experienced their own hair-pulling discoveries.

It should come as no surprise that we use our own tools to monitor our apps, including those deployed to OpenShift clusters. Kubernetes is just another layer of infrastructure. Along with the server and network visibility data, we are now incorporating Kubernetes and OpenShift metrics into the bigger monitoring picture.

In this blog, we will share what we monitor in OpenShift clusters and give suggestions as to how our strategy might be relevant to your own environments. (For more hands-on advice, read my blog Deploying AppDynamics Agents to OpenShift Using Init Containers.)

OpenShift Cluster Monitoring

For OpenShift cluster monitoring, we use two plug-ins that can be deployed with our standalone machine agent. AppDynamics’ Kubernetes Events Extension, described in our blog on monitoring Kubernetes events, tracks every event in the cluster. Kubernetes Snapshot Extension captures attributes of various cluster resources and publishes them to the AppDynamics Events API. The snapshot extension collects data on all deployments, pods, replica sets, daemon sets and service endpoints. It captures the full extent of the available attributes, including metadata, spec details, metrics and state. Both extensions use the Kubernetes API to retrieve the data, and can be configured to run at desired intervals.

The data these plug-ins provide ends up in our analytics data repository and instantly becomes available for mining, reporting, baselining and visualization. The data retention period is at least 90 days, which offers ample time to go back and perform an exhaustive root cause analysis (RCA). It also allows you to reduce the retention interval of events in the cluster itself. (By default, this is set to one hour.)

We use the collected data to build dynamic baselines, set up health rules and create alerts. The health rules, baselines and aggregate data points can then be displayed on custom dashboards where operators can see the norms and easily spot any deviations.

An example of a customizable Kubernetes dashboard.

What We Monitor and Why

Cluster Nodes

At the foundational level, we want monitoring operators to keep an eye on the health of the nodes where the cluster is deployed. Typically, you would have a cluster of masters, where core Kubernetes components (api-server, controller-manager, kube-schedule, etc.) are deployed, as well as a highly available etcd cluster and a number of worker nodes for guest applications. To paint a complete picture, we combine infrastructure health metrics with the relevant cluster data gathered by our Kubernetes data collectors.

From an infrastructure point of view, we track CPU, memory and disk utilization on all the nodes, and also zoom into the network traffic on etcd. In order to spot bottlenecks, we look at various aspects of the traffic at a granular level (e.g., reads/writes and throughput). Kubernetes and OpenShift clusters may suffer from memory starvation, disks overfilled with logs or spikes in consumption of the API server and, consequently, the etcd. Ironically, it is often monitoring solutions that are known for bringing clusters down by pulling excessive amounts of information from the Kubernetes APIs. It is always a good idea to establish how much monitoring is enough and dial it up when necessary to diagnose issues further. If a high level of monitoring is warranted, you may need to add more masters and etcd nodes. Another useful technique, especially with large-scale implementations, is to have a separate etcd cluster just for storing Kubernetes events. This way, the spikes in event creation and event retrieval for monitoring purposes won’t affect performance of the main etcd instances. This can be accomplished by setting the –etcd-servers-overrides flag of the api-server, for example:

–etcd-servers-overrides =/events#https://etcd1.cluster.com:2379;https://etcd2. cluster.com:2379;https://etcd3. cluster.com:2379

From the cluster perspective we monitor resource utilization across the nodes that allow pod scheduling. We also keep track of the pod counts and visualize how many pods are deployed to each node and how many of them are bad (failed/evicted).

A dashboard widget with infrastructure and cluster metrics combined.

Why is this important? Kubelet, the component responsible for managing pods on a given node, has a setting, –max-pods, which determines the maximum number of pods that can be orchestrated. In Kubernetes the default is 110. In OpenShift it is 250. The value can be changed up or down depending on need. We like to visualize the remaining headroom on each node, which helps with proactive resource planning and to prevent sudden overflows (which could mean an outage). Another data point we add there is the number of evicted pods per node.

Pod Evictions

Evictions are caused by space or memory starvation. We recently had an issue with the disk space on one of our worker nodes due to a runaway log. As a result, the kubelet produced massive evictions of pods from that node. Evictions are bad for many reasons. They will typically affect the quality of service or may even cause an outage. If the evicted pods have an exclusive affinity with the node experiencing disk pressure, and as a result cannot be re-orchestrated elsewhere in the cluster, the evictions will result in an outage. Evictions of core component pods may lead to the meltdown of the cluster.

Long after the incident where pods were evicted, we saw the evicted pods were still lingering. Why was that? Garbage collection of evictions is controlled by a setting in kube-controller-manager called –terminated-pod-gc-threshold.  The default value is set to 12,500, which means that garbage collection won’t occur until you have that many evicted pods. Even in a large implementation it may be a good idea to dial this threshold down to a smaller number.

If you experience a lot of evictions, you may also want to check if kube-scheduler has a custom –policy-config-file defined with no CheckNodeMemoryPressure or CheckNodeDiskPressure predicates.

Following our recent incident, we set up a new dashboard widget that tracks a metric of any threats that may cause a cluster meltdown (e.g., massive evictions). We also associated a health rule with this metric and set up an alert. Specifically, we’re now looking for warning events that tell us when a node is about to experience memory or disk pressure, or when a pod cannot be reallocated (e.g., NodeHasDiskPressure, NodeHasMemoryPressure, ErrorReconciliationRetryTimeout, ExceededGracePeriod, EvictionThresholdMet).

We also look for daemon pod failures (FailedDaemonPod), as they are often associated with cluster health rather than issues with the daemon set app itself.

Pod Issues

Pod crashes are an obvious target for monitoring, but we are also interested in tracking pod kills. Why would someone be killing a pod? There may be good reasons for it, but it may also signal a problem with the application. For similar reasons, we track deployment scale-downs, which we do by inspecting ScalingReplicaSet events. We also like to visualize the scale-down trend along with the app health state. Scale-downs, for example, may happen by design through auto-scaling when the app load subsides. They may also be issued manually or in error, and can expose the application to an excessive load.

Pending state is supposed to be a relatively short stage in the lifecycle of a pod, but sometimes it isn’t. It may be good idea to track pods with a pending time that exceeds a certain, reasonable threshold—one minute, for example. In AppDynamics, we also have the luxury of baselining any metric and then tracking any configurable deviation from the baseline. If you catch a spike in pending state duration, the first thing to check is the size of your images and the speed of image download. One big image may clog the pipe and affect other containers. Kubelet has this flag, –serialize-image-pulls, which is set to “true” by default. It means that images will be loaded one at a time. Change the flag to “false” if you want to load images in parallel and avoid the potential clogging by a monster-sized image. Keep in mind, however, that you have to use Docker’s overlay2 storage driver to make this work. In newer Docker versions this setting is the default. In addition to the Kubelet setting, you may also need to tweak the max-concurrent-downloads flag of the Docker daemon to ensure the desired parallelism.

Large images that take a long time to download may also cause a different type of issue that results in a failed deployment. The Kubelet flag –image-pull-progress-deadline determines the point in time when the image will be deemed “too long to pull or extract.” If you deal with big images, make sure you dial up the value of the flag to fit your needs.

User Errors

Many big issues in the cluster stem from small user errors (human mistakes). A typo in a spec—for example, in the image name—may bring down the entire deployment. Similar effects may occur due to a missing image or insufficient rights to the registry. With that in mind, we track image errors closely and pay attention to excessive image-pulling. Unless it is truly needed, image-pulling is something you want to avoid in order to conserve bandwidth and speed up deployments.

Storage issues also tend to arise due to spec errors, lack of permissions or policy conflicts. We monitor storage issues (e.g., mounting problems) because they may cause crashes. We also pay close attention to resource quota violations because they do not trigger pod failures. They will, however, prevent new deployments from starting and existing deployments from scaling up.

Speaking of quota violations, are you setting resource limits in your deployment specs?

Policing the Cluster

On our OpenShift dashboards, we display a list of potential red flags that are not necessarily a problem yet but may cause serious issues down the road. Among these are pods without resource limits or health probes in the deployment specs.

Resource limits can be enforced by resource quotas across the entire cluster or at a more granular level. Violation of these limits will prevent the deployment. In the absence of a quota, pods can be deployed without defined resource limits. Having no resource limits is bad for multiple reasons. It makes cluster capacity planning challenging. It may also cause an outage. If you create or change a resource quota when there are active pods without limits, any subsequent scale-up or redeployment of these pods will result in failures.

The health probes, readiness and liveness are not enforceable, but it is a best practice to have them defined in the specs. They are the primary mechanism for the pods to tell the kubelet whether the application is ready to accept traffic and is still functioning. If the readiness probe is not defined and the pods takes a long time to initialize (based on the kubelet’s default), the pod will be restarted. This loop may continue for some time, taking up cluster resources for no reason and effectively causing a poor user experience or outage.

The absence of the liveness probe may cause a similar effect if the application is performing a lengthy operation and the pod appears to Kubelet as unresponsive.

We provide easy access to the list of pods with incomplete specs, allowing cluster admins to have a targeted conversation with development teams about corrective action.

Routing and Endpoint Tracking

As part of our OpenShift monitoring, we provide visibility into potential routing and service endpoint issues. We track unused services, including those created by someone in error and those without any pods behind them because the pods failed or were removed.

We also monitor bad endpoints pointing at old (deleted) pods, which effectively cause downtime. This issue may occur during rolling updates when the cluster is under increased load and API request-throttling is lower than it needs to be. To resolve the issue, you may need to increase the –kube-api-burst and –kube-api-qps config values of kube-controller-manager.

Every metric we expose on the dashboard can be viewed and analyzed in the list and further refined with ADQL, the AppDynamics query language. After spotting an anomaly on the dashboard, the operator can drill into the raw data to get to the root cause of the problem.

Application Monitoring

Context plays a significant role in our monitoring philosophy. We always look at application performance through the lens of the end-user experience and desired business outcomes. Unlike specialized cluster-monitoring tools, we are not only interested in cluster health and uptime per se. We’re equally concerned with the impact the cluster may have on application health and, subsequently, on the business objectives of the app.

In addition to having a cluster-level dashboard, we also build specialized dashboards with a more application-centric point of view. There we correlate cluster events and anomalies with application or component availability, end-user experience as reported by real-user monitoring, and business metrics (e.g., conversion of specific user segments).

Leveraging K8s Metadata

Kubernetes makes it super easy to run canary deployments, blue-green deployments, and A/B or multivariate testing. We leverage these conveniences by pulling deployment metadata and using labels to analyze performance of different versions side by side.

Monitoring Kubernetes or OpenShift is just a part of what AppDynamics does for our internal needs and for our clients. AppDynamics covers the entire spectrum of end-to-end monitoring, from the foundational infrastructure to business intelligence. Inherently, AppDynamics is used by many different groups of operators who may have very different skills. For example, we look at the platform as a collaboration tool that helps translate the language of APM to the language of Kubernetes and vice versa.

By bringing these different datasets together under one umbrella, AppDynamics establishes a common ground for diverse groups of operators. On the one hand you have cluster admins, who are experts in Kubernetes but may not know the guest applications in detail. On the other hand, you have DevOps in charge of APM or managers looking at business metrics, both of whom may not be intimately familiar with Kubernetes. These groups can now have a productive monitoring conversation, using terms that are well understood by everyone and a single tool to examine data points on a shared dashboard.

Learn more about how AppDynamics can help you monitor your applications on Kubernetes and OpenShift.

Automation Framework in Analytics – Part 1

This blog series highlights how we use our own products to test our events service which currently ingests more than three trillion events per month.

With fast iterations and deliverables, testing has always been a continuously evolving machine — and a reason why AppDynamics is aligning toward microservices-based architectures. While there are multiple ways to prudently handle the problem of testing, we’d like to share some of the learnings and key requirements which have shaped our elastic-testing framework, powered by Docker and AWS.

Applying this framework helped us deliver stellar results:

  • Ability to bring up complex test environments on the fly, based on testing needs.
  • 80% increase in speed of running and finding bugs earlier in the release cycle.
  • The flexibility to simulate environment instabilities, which potentially occur in any production (or like) environment.
  • Helps with plans to move towards continuous integration (CI).
  • Predictable testing time.
  • A robust environment to allow us to run pre-checkin as well as nightly build tests.
  • Ease of running tests more frequently for small changes vs. full cycle.

Below we will share some of the challenges we faced while end-to-end testing the AppDynamics Events Service, data store for on-premises Application Analytics, End User Monitoring (EUM) deployments, and Database Monitoring deployments. We’ll provide our approach towards solving these challenges, discuss best practices for integration with a continuous development cycle, and share ways to reduce cost on testing infrastructure when testing the application.

By sharing our experience, we hope to provide a case study that will help you and your team avoid similar challenges.

What is Application Analytics?

Application Analytics refers to the real-time analysis and visualization of automatically collected and correlated data. In our case, analytics reveal insights into IT operations, customer experience, and business outcomes. With this next generation of IT operations analytics platform, IT and business users are empowered to quickly answer more meaningful questions than ever before, all in real-time. Analytics is backed by a very powerful events service to store the ingested events, so that data can be queried back. This service is highly scalable – handling more than 3 trillion events per month.

Deployment Background

Our Unified Analytics product can be deployed in two ways:

  • on-premises deployment
  • SaaS deployment

Events Service

The AppDynamics events service is architected to cater to customers based on the deployment chosen. The events service offers a lightweight deployment for on-premises deployment to ease the handling of operating data. It will also have minimal components, which allows the events service to cater to the scalability and volume of data to be handled – a typical use case for any SaaS-based service.

The SaaS events service has:

  1. API Layer: Entry point service
  2. Kafka queue
  3. Indexer Layer, which consumes the data from kafka queue and writes to an event store
  4. Event Store – Elasticsearch

The on-premises events service has:

  1. API Interface / REST Endpoint for the service
  2. Event Store

 Architecture of events platform

Operation/Environment Matrix

The operation bypasses a few layers when it comes to on-premises deployments. A SaaS ingestion layer prevents data-loss through a kafka layer that helps coordinate the ingestion. However, in an on-premises environment, the ingestion happens directly to elasticsearch through the API interface.

Objectives for testing the Events Service:

  • CI tests can run in build systems consistently.
  • The tests are easily pluggable and can run based on the deployment type.
  • Ease of running tests in different environment types (either locally or in cloud) for the benefit of time and to ensure that the tests are environment agnostic.
  • The framework could be scalable and could also be used for functionality, performance, and scalability tests.

These objectives are mandatory to take us towards continuous deployment, where production deployment is just one-click away from committing the code.

Building the Test Framework

To build our testing framework, we analyzed the various solutions available. Below are the options we went through:

  1. Bring the whole Saas environment into a local environment via individual processes such as  elasticsearch, kafka, and web servers, and testing them in a local box.
  2. Have some separate VMs/Bare metal hosts allocated for these tests so that we deploy these components there and run.
  3. Use AWS for deploying these components and use them for testing.
  4. Use Docker containers to create a secluded environment, deploy, and test.
  5. We reviewed each option listed above and conducted a detailed analysis to understand the pros and cons of each and every option. The outcome of this exercise enabled us to pick the right choice for the testing environment.

Stay Tuned

We will publish a follow-up blog to shed more light on:

  1. The pros and cons of every option we had
  2. What choice did we come up with and why
  3. Architecture of our framework
  4. Test flow
  5. Performance of our infra-setup time and infra-based test running time

Swamy Sambamurthy works as a Principal Engineer at AppDynamics and have 11+ years of experience in building scalable automation frameworks. In the past and currently in AppDynamics, Swamy helped in building automation frameworks against distributed systems and big-data environments, which has the ability to scale through huge number of ingestion and querying requests.

The APPrentice

Screen Shot 2013-05-28 at 3.04.27 PMIn this week’s episode, Donald Trump enlists Team ROI and Team Overhead to solve a Severity1 incident on the “Trump Towers Website”. Team Overhead used “Dynoscope” and took 3 weeks to solve the incident, while Team ROI took 15 minutes by using AppDynamics.

 

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

Turnkey APMaaS by AppDynamics

Since we launched our Managed Service Provider program late last year, we’ve signed up many MSPs that were interested in adding Application Performance Management-as-a-Service (APMaaS) to their service catalogs.  Wouldn’t you be excited to add a service that’s easy to manage but more importantly easy to sell to your existing customer base?

Service providers like Scicom definitely were (check out the case study), because they are being held responsible for the performance of their customer’s complex, distributed applications, but oftentimes don’t have visibility inside the actual application.  That’s like being asked to officiate an NFL game with your eyes closed.

ref

The sad truth is that many MSPs still think that high visibility in app environments equates to high configuration, high cost, and high overhead.

Thankfully this is 2013.  People send emails instead of snail mail, play Call of Duty instead of Pac-Man, listen to Pandora instead of cassettes, and can have high visibility in app environments with low configuration, low cost, and low overhead with AppDynamics.

Not only do we have a great APM service to help MSPs increase their Monthly Recurring Revenue (MRR), we make it extremely easy for them to deploy this service in their own environments, which, to be candid, is half the battle.  MSPs can’t spend countless hours deploying a new service.  It takes focus and attention away from their core business, which in turn could endanger the SLAs they have with their customers.  Plus, it’s just really annoying.

Introducing: APMaaS in a Box

Here at AppDynamics, we take pride in delivering value quickly.  Most of our customers go from nothing to full-fledged production performance monitoring across their entire environment in a matter of hours in both on-premise and SaaS deployments.  MSPs are now leveraging that same rapid SaaS deployment model in their own environments with something that we like to call ‘APMaaS in a Box’.

At a high level, APMaaS in a Box is large cardboard box with air holes and a fragile sticker wherein we pack a support engineer, a few management servers, an instruction manual, and a return label…just kidding…sorry, couldn’t resist.

man in box w sticker

Simply put, APMaaS in a Box is a set of files and scripts that allows MSPs to provision multi-tenant controllers in their own data center or private cloud and provision AppDynamics licenses for customers themselves…basically it’s the ultimate turnkey APMaaS.

By utilizing AppDynamics’ APMaaS in a Box, MSPs across the world are leveraging our quick deployment, self-service license provisioning, and flexibility in the way we do business to differentiate themselves and gain net new revenue.

Quick Deployment

Within 6 hours, MSPs like NTT Europe who use our APMaaS in a Box capabilities will have all the pieces they need in place to start monitoring the performance of their customer’s apps.  Now that’s some rapid time to value!

Self-Service License Provisioning

MSPs can provision licenses directly through the AppDynamics partner portal.  This gives you complete control over who gets licenses and makes it very easy to manage this process across your customer base.

Flexibility

A MSP can get started on a month-to-month basis with no commitment.  Only paying for what you sell eliminates the cost of shelfware.  MSPs can also sell AppDynamics however they would like to position it and can float licenses across customers.  NTT Europe uses a 3-tier service offering so customers can pick and choose the APM services they’d like to pay for.  Feel free to get creative when packaging this service for customers!

Conclusion

As more and more MSPs move up the stack from infrastructure management to monitoring the performance of their customer’s distributed applications, choosing an APM partner that understands the Managed Services business is of utmost importance.  AppDynamics’ APMaaS in a box capabilities align well with internal MSP infrastructures, and our pricing model aligns with the business needs of Managed Service Providers – we’re a perfect fit.

MSPs who continue to evolve their service offerings to keep pace with customer demands will be well positioned to reap the benefits and future revenue that comes along with staying ahead of the market.  To paraphrase The Great One, MSPs need to “skate where the puck is going to be, not where it has been.”  I encourage all you MSPs out there to contact us today to see how we can help you skate ahead of the curve and take advantage of the growing APM market with our easy to use, easy to deploy APMaaS in a Box.  If you don’t, your competition will…

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

Deploying APM in the Enterprise Part 4: The Path of the Rockstar

APM RockstarWelcome to Part 4 of my series Deploying APM in the Enterprise. In the last installment we covered how you find, test, and justify purchasing an APM solution. This blog will focus on what to do after you’ve made a purchase and started down the path of deploying your coveted APM tool (ahem, ahem, AppDynamics, ahem). Just clearing my throat, let’s jump right in…

It’s time for a celebration, time to break out the champagne, time to spike the football and do your end zone dance (easy there Michael Jackson, don’t hurt yourself). All of the hours you spent turning data into meaningful information, dealing with software vendors, writing requirements, testing solutions, documenting your findings, writing business justifications, and generally bending over backwards to ensure that no objection would stand in your way has culminated in management approving your purchase of APM software. Now the real work begins…