Automation Framework in Analytics – Part 1

This blog series highlights how we use our own products to test our events service which currently ingests more than three trillion events per month.

With fast iterations and deliverables, testing has always been a continuously evolving machine — and a reason why AppDynamics is aligning toward microservices-based architectures. While there are multiple ways to prudently handle the problem of testing, we’d like to share some of the learnings and key requirements which have shaped our elastic-testing framework, powered by Docker and AWS.

Applying this framework helped us deliver stellar results:

  • Ability to bring up complex test environments on the fly, based on testing needs.
  • 80% increase in speed of running and finding bugs earlier in the release cycle.
  • The flexibility to simulate environment instabilities, which potentially occur in any production (or like) environment.
  • Helps with plans to move towards continuous integration (CI).
  • Predictable testing time.
  • A robust environment to allow us to run pre-checkin as well as nightly build tests.
  • Ease of running tests more frequently for small changes vs. full cycle.

Below we will share some of the challenges we faced while end-to-end testing the AppDynamics Events Service, data store for on-premises Application Analytics, End User Monitoring (EUM) deployments, and Database Monitoring deployments. We’ll provide our approach towards solving these challenges, discuss best practices for integration with a continuous development cycle, and share ways to reduce cost on testing infrastructure when testing the application.

By sharing our experience, we hope to provide a case study that will help you and your team avoid similar challenges.

What is Application Analytics?

Application Analytics refers to the real-time analysis and visualization of automatically collected and correlated data. In our case, analytics reveal insights into IT operations, customer experience, and business outcomes. With this next generation of IT operations analytics platform, IT and business users are empowered to quickly answer more meaningful questions than ever before, all in real-time. Analytics is backed by a very powerful events service to store the ingested events, so that data can be queried back. This service is highly scalable – handling more than 3 trillion events per month.

Deployment Background

Our Unified Analytics product can be deployed in two ways:

  • on-premises deployment
  • SaaS deployment

Events Service

The AppDynamics events service is architected to cater to customers based on the deployment chosen. The events service offers a lightweight deployment for on-premises deployment to ease the handling of operating data. It will also have minimal components, which allows the events service to cater to the scalability and volume of data to be handled – a typical use case for any SaaS-based service.

The SaaS events service has:

  1. API Layer: Entry point service
  2. Kafka queue
  3. Indexer Layer, which consumes the data from kafka queue and writes to an event store
  4. Event Store – Elasticsearch

The on-premises events service has:

  1. API Interface / REST Endpoint for the service
  2. Event Store

 Architecture of events platform

Operation/Environment Matrix

The operation bypasses a few layers when it comes to on-premises deployments. A SaaS ingestion layer prevents data-loss through a kafka layer that helps coordinate the ingestion. However, in an on-premises environment, the ingestion happens directly to elasticsearch through the API interface.

Objectives for testing the Events Service:

  • CI tests can run in build systems consistently.
  • The tests are easily pluggable and can run based on the deployment type.
  • Ease of running tests in different environment types (either locally or in cloud) for the benefit of time and to ensure that the tests are environment agnostic.
  • The framework could be scalable and could also be used for functionality, performance, and scalability tests.

These objectives are mandatory to take us towards continuous deployment, where production deployment is just one-click away from committing the code.

Building the Test Framework

To build our testing framework, we analyzed the various solutions available. Below are the options we went through:

  1. Bring the whole Saas environment into a local environment via individual processes such as  elasticsearch, kafka, and web servers, and testing them in a local box.
  2. Have some separate VMs/Bare metal hosts allocated for these tests so that we deploy these components there and run.
  3. Use AWS for deploying these components and use them for testing.
  4. Use Docker containers to create a secluded environment, deploy, and test.
  5. We reviewed each option listed above and conducted a detailed analysis to understand the pros and cons of each and every option. The outcome of this exercise enabled us to pick the right choice for the testing environment.

Stay Tuned

We will publish a follow-up blog to shed more light on:

  1. The pros and cons of every option we had
  2. What choice did we come up with and why
  3. Architecture of our framework
  4. Test flow
  5. Performance of our infra-setup time and infra-based test running time

Swamy Sambamurthy works as a Principal Engineer at AppDynamics and have 11+ years of experience in building scalable automation frameworks. In the past and currently in AppDynamics, Swamy helped in building automation frameworks against distributed systems and big-data environments, which has the ability to scale through huge number of ingestion and querying requests.

The APPrentice

Screen Shot 2013-05-28 at 3.04.27 PMIn this week’s episode, Donald Trump enlists Team ROI and Team Overhead to solve a Severity1 incident on the “Trump Towers Website”. Team Overhead used “Dynoscope” and took 3 weeks to solve the incident, while Team ROI took 15 minutes by using AppDynamics.

 

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

Turnkey APMaaS by AppDynamics

Since we launched our Managed Service Provider program late last year, we’ve signed up many MSPs that were interested in adding Application Performance Management-as-a-Service (APMaaS) to their service catalogs.  Wouldn’t you be excited to add a service that’s easy to manage but more importantly easy to sell to your existing customer base?

Service providers like Scicom definitely were (check out the case study), because they are being held responsible for the performance of their customer’s complex, distributed applications, but oftentimes don’t have visibility inside the actual application.  That’s like being asked to officiate an NFL game with your eyes closed.

ref

The sad truth is that many MSPs still think that high visibility in app environments equates to high configuration, high cost, and high overhead.

Thankfully this is 2013.  People send emails instead of snail mail, play Call of Duty instead of Pac-Man, listen to Pandora instead of cassettes, and can have high visibility in app environments with low configuration, low cost, and low overhead with AppDynamics.

Not only do we have a great APM service to help MSPs increase their Monthly Recurring Revenue (MRR), we make it extremely easy for them to deploy this service in their own environments, which, to be candid, is half the battle.  MSPs can’t spend countless hours deploying a new service.  It takes focus and attention away from their core business, which in turn could endanger the SLAs they have with their customers.  Plus, it’s just really annoying.

Introducing: APMaaS in a Box

Here at AppDynamics, we take pride in delivering value quickly.  Most of our customers go from nothing to full-fledged production performance monitoring across their entire environment in a matter of hours in both on-premise and SaaS deployments.  MSPs are now leveraging that same rapid SaaS deployment model in their own environments with something that we like to call ‘APMaaS in a Box’.

At a high level, APMaaS in a Box is large cardboard box with air holes and a fragile sticker wherein we pack a support engineer, a few management servers, an instruction manual, and a return label…just kidding…sorry, couldn’t resist.

man in box w sticker

Simply put, APMaaS in a Box is a set of files and scripts that allows MSPs to provision multi-tenant controllers in their own data center or private cloud and provision AppDynamics licenses for customers themselves…basically it’s the ultimate turnkey APMaaS.

By utilizing AppDynamics’ APMaaS in a Box, MSPs across the world are leveraging our quick deployment, self-service license provisioning, and flexibility in the way we do business to differentiate themselves and gain net new revenue.

Quick Deployment

Within 6 hours, MSPs like NTT Europe who use our APMaaS in a Box capabilities will have all the pieces they need in place to start monitoring the performance of their customer’s apps.  Now that’s some rapid time to value!

Self-Service License Provisioning

MSPs can provision licenses directly through the AppDynamics partner portal.  This gives you complete control over who gets licenses and makes it very easy to manage this process across your customer base.

Flexibility

A MSP can get started on a month-to-month basis with no commitment.  Only paying for what you sell eliminates the cost of shelfware.  MSPs can also sell AppDynamics however they would like to position it and can float licenses across customers.  NTT Europe uses a 3-tier service offering so customers can pick and choose the APM services they’d like to pay for.  Feel free to get creative when packaging this service for customers!

Conclusion

As more and more MSPs move up the stack from infrastructure management to monitoring the performance of their customer’s distributed applications, choosing an APM partner that understands the Managed Services business is of utmost importance.  AppDynamics’ APMaaS in a box capabilities align well with internal MSP infrastructures, and our pricing model aligns with the business needs of Managed Service Providers – we’re a perfect fit.

MSPs who continue to evolve their service offerings to keep pace with customer demands will be well positioned to reap the benefits and future revenue that comes along with staying ahead of the market.  To paraphrase The Great One, MSPs need to “skate where the puck is going to be, not where it has been.”  I encourage all you MSPs out there to contact us today to see how we can help you skate ahead of the curve and take advantage of the growing APM market with our easy to use, easy to deploy APMaaS in a Box.  If you don’t, your competition will…

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

Deploying APM in the Enterprise Part 4: The Path of the Rockstar

APM RockstarWelcome to Part 4 of my series Deploying APM in the Enterprise. In the last installment we covered how you find, test, and justify purchasing an APM solution. This blog will focus on what to do after you’ve made a purchase and started down the path of deploying your coveted APM tool (ahem, ahem, AppDynamics, ahem). Just clearing my throat, let’s jump right in…

It’s time for a celebration, time to break out the champagne, time to spike the football and do your end zone dance (easy there Michael Jackson, don’t hurt yourself). All of the hours you spent turning data into meaningful information, dealing with software vendors, writing requirements, testing solutions, documenting your findings, writing business justifications, and generally bending over backwards to ensure that no objection would stand in your way has culminated in management approving your purchase of APM software. Now the real work begins…

Covis Software GmbH Speeds up CRM Application by 33x with AppDynamics

I received an X-Ray from Covis Software GmbH in Germany who provides CRM solutions. They’ve been managing their .NET application performance with AppDynamics Pro for several months now in production. In the below X-Ray (as documented by the customer), Covis were able to improve the performance of a mission-critical business transaction from over 10 seconds to around 300 milliseconds, representing a 33x improvement. The business impact of such change meant average call time for some CRM agents dropped by almost 10 seconds.

If you would like to get started with AppDynamics you can download AppDynamics Lite (our free version) or you can take a free 30-day trial of AppDynamics Pro.

App Man.

2 Fast 2 Furious: When Organizations become too Agile

Taken from JAX Conference Keynote 2012 in Mainz:

Declaring yourself “Agile” no longer means you’re automatically cool or competitive. It might have in the olden days, when teams would be considered agile if they did 5 releases a year–but the word today has assumed a completely different meaning. Now, many organizations will happily admit to doing multiple releases a day. The problem is this: the majority of self-styled agile teams focus on speed, innovation and change, but very few focus on results.

This session takes a look at what happens when organizations become too agile, and how this addiction can become terminal for the business. We’ll cover real-life examples outlining the challenges and pain points of organizations striving to be agile. It will also offer top tips for dev teams to do agile the “right” way, helping them better manage change and understand the real impact that frequent releases have upon their business.

Slides available here on Slideshare.

How Fast are your Web Services?

Everyday in our life we rely on services provided by other people. Making a phone call, getting a car fixed, or ordering a pizza – and yet we want those things to happen as quickly as possible, because time often means money. If you take your car to a Mercedes or BMW dealer, you will understand this point better than anyone. Our productivity (and often happiness) is therefore controlled, everyday, by different organizations and people. When things slow down or don’t happen we get upset, frustrated, and sometimes rant on twitter like these folk:

If your application today has SOA design principles, is heavily distributed and relies on 3rd party service providers, then you’ve probably become frustrated at some point when your application slows down or crashes. The problem is this: your end user experience and quality of service (QoS) is only as good as the QoS of your service providers. So, unless you monitor QoS you can’t measure QoS–and if you can’t measure QoS, you can’t manage your service providers and your end user experience. For example, take a look at this customer e-commerce application which has 7 JVM’s, 1 database and 7 external web service providers:

This customer recently had a slowdown with their e-commerce production application. After a few minutes browsing AppDynamics, they successfully identified that one of their web service providers was having latency issues (AppDynamics automatically baselines performance and flags deviations for each web service provider as shown in the above screenshot). The customer called their service provider, and sure enough the service provider admitted to having issues. A few hours later the service provider called back and said “we fixed the problem, everything should be back to normal”–yet the customer could clearly see latency issues still occurring in AppDynamics. So they sent their service provider a screenshot showing the evidence. The service provider then checked again, and called back a few minutes later saying “Yes, sorry a few customers are still being impacted.” Without this level of visibility, many organizations are simply blind to how external service providers impact their end user experience and business.

Being able to troubleshoot slow performance in minutes is helpful, but what about being able to report the exact service level you receive–say, from each of your service providers over a period of time? Did your service improve over time or did it regress? How many outages or severity 1 incidents did your service providers cause this week for your application?

Take the below screenshot from AppDynamics, which plots the maximum response time for five different web services consumed by an application over the last week. You can see that three out of the five web services (denoted by pink, blue and turquoise lines) consistently deliver sub-second response times and provide a great service level. However, the other two web services (red and green lines) show performance spikes with response times of between 14 and 22 seconds. The green web service in particular is very inconsistent and shows several performance spikes in two days.

Below is the response time of another web service (PayPal) for a customer application over the last 3 months. Notice the spikes in response time and look at the deviation between average and maximum response time over the time period. What’s impressive is that despite the occasional service blip, the PayPal service has slowly improved by 14% from 450 milliseconds to around 385 milliseconds. It’s also been very stable the last few weeks, along with having a consistent service (small deviation from average and maximum response time).

If your application relies on one or more 3rd party web services, you should periodically check and report what level of service you are receiving each week. That way, you can truly understand your service provider QoS and its impact on your end user experience and application performance. You can also keep your service providers honest, with complete visibility of whether QoS is improving or degrading over time as service outages occur and are fixed.

The next time you experience a slow down or outage in your application, you should first check external web services before you start to troubleshoot your own. The last thing you want to be doing is debugging your own code, when it could be someone else’s service and code that is causing the issue. Using AppDynamics it’s possible to monitor, measure, and manage the QoS from each of your web service providers. You can get started right now by downloading AppDynamics Lite (our free edition) for a single JVM or IIS web server, or you can request a 30-day trial of AppDynamics Pro (our commercial edition) for Java or .NET applications with multiple JVMs and IIS web servers.