Proactively manage your user experience with EUM

As companies embark on a digital transformation to better serve their customers the challenge of managing the performance and satisfaction with each user becomes ever more critical to the success of the business. When we look at the breakout companies today like Uber, Airbnb, and Slack it’s evident that software is at the core of their success in each industry.  Consumers have gone from visiting a bank’s branch to making a transfer to using their desktop or mobile device to fulfill it instantaneously.  If the web application is not responding then it could be the equivalent of walking into a long line at the branch and walking out creating a negative impression of that brand. In a digital business making each customer interaction with your digital storefront successful should be at the core of the business objectives.

So, how do we ensure that every digital interaction is successful and performs quickly? The answer lies in using multiple approaches to manage that end-user experience. In our tool arsenal, we have real-user monitoring tools and synthetic monitoring tools that when combined with APM capabilities can help us quickly identify poor performing transactions and quickly triage the root cause to minimize end-user impact. Each tool covers a core area for the web application that gives visibility into the whole experience. For real user monitoring tools, having an understanding of the performance and sequence of every step in the customer path is critical to identifying areas of opportunity in the funnel and increase conversions. Real user monitoring provides the measurement breadth impossible to achieve with only synthetic tools and puts back end performance into a user context. On the synthetic side of monitoring, a repeatable and reproducible set of measurements focused on visual based timings can be of great value in baselining the user experience between releases, pro-actively capturing errors before users are impacted, benchmarking against competitors, and baselining and holding 3rd party content providers accountable for the performance of their content.  Synthetic measurements also allow for script assertions to validate that the expected content on a page is delivered in a timely way and accurately alert when there are deviations from the baseline or errors occur.

In a recent survey sponsored by AppDynamics, over 50% of people who manage web and mobile web applications identified 3rd party content as a factor in delivering a good end-user experience.  Our modern web and mobile sites most often contain some kind of 3rd party resource from analytics tracking, to social media integration, all the way to authentication functionality. Understanding how each of these components affects the end user experience is critical in maintaining a healthy and well performing site. Using a real-user monitoring tool like AppDynamics Browser EUM solution, you can visualize slow content that may be affecting a page load and identify the provider.  The challenge now is how do you see if this provider is living up to their performance claims and how do you hold them accountable.

Third party benchmarking is a capability that a synthetic monitoring solution best provides.  With a synthetic transaction you are able to control many variables that are impossible to do on a real user measurement.  A synthetic measurement will always use the same browser/browser version, connectivity profile, hardware configuration, and is free from spyware, virus, or adware. Using this clean room environment, you can see what the consistent performance of a page along with every resource to manage and track each element downloaded from multiple synthetic locations worldwide. In this instance, when your monitoring system picks up an unusually high number of slow transactions you can directly drill down and isolate the cause either to a core site slowdown or a 3rd party slowdown and compare the performance across synthetic to determine if it’s a user/geography centric issue or something happening across the board.

In managing the user experience, having all pertinent data in real-time on a consolidated system can be the difference between a 5 min performance degradation or a 5 hour site outage while multiple sources of discordant information are compiled and rationalized. The intersection of data from real-user monitoring and synthetic monitoring can bring context to performance events by correlating user session information like engagement and conversion with changes in performance of 3rd party content or end-user error rates. A 360 degree view of the customer experience will help ensure a positive experience for your customers.

Interested in learning more? Make sure to watch our Winter ’16 Release webinar

 

Why web analytics aren’t enough!

Every production web application should use web analytics. There are many great free tools for web analytics, the most popular of which is Google Analytics. Google Analytics helps you analyze visitor traffic and paint a complete picture of your audience and their needs. Web analytics solutions provide insight into how people discover your site, what content is most popular, and who your users are. Modern web analytics also provide insight into user behavior, social engagement, client-side page speed, and the effectiveness of ad campaigns. Any responsible business owner is data-driven and should leverage web analytics solutions to get more information about your end users.

Web Analytics Landscape

Google Analytics

While Google Analytics is the most popular and the de facto standard in the industry, there are quite a few quality web analytics solutions available in the marketplace:

The Forrester Wave Report provides a good guide to choosing an analytics solution.

Forrester Wave

There are also many solutions focused on specialized web analytics that I think are worth mentioning. They are either geared towards mobile applications or getting better analytics on your customers’ interactions:

Once you understand your user demographics, it’s great to be able to get additional information about how performance affects your users. Web analytics only tells you one side of the story, the client-side. If you are integrating web analytics, check out Segment.io which provides analytics.js for easy integration of multiple analytics providers.

It’s all good – until it isn’t

Using Google Analytics on its own is fine and dandy – until you’re having performance problems in production you need visibility into what’s going on. This is where application performance management solutions come in. APM tools like AppDynamics provide the added benefit of understanding both the server-side and the client-side. Not only can you understand application performance and user demographics in real time, but when you have problems you can use the code-level visibility to understand the root cause of your performance problems. Application performance management is the perfect complement to web analytics. Not only do you understand your user demographics, but you also understand how performance affects your customers and business. It’s important to be able to see from a business perspective how well your application is performing in production:

 

Screen Shot 2013-10-29 at 1.34.01 PM

Since AppDynamics is built on an extensible platform, it’s easy to track custom metrics directly from Google Analytics via the machine agent.

The end user experience dashboard in AppDynamics Pro gives you real time visibility where your users are suffering the most:

Profile-PageView

Capturing web analytics is a good start, but it’s not enough to get an end-to-end perspective on the performance of your web and mobile applications. The reality is that understanding user demographics and application experience are two completely separate problems that require two complementary solutions. O’Reilly has a stellar article on why real user monitoring is essential for production applications.

Get started with AppDynamics Pro today for in-depth application performance management.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Scaling our End User Monitoring Cloud

Why End User Monitoring?

In a previous post, my colleague Tom Levey explained the value of Monitoring the Real End User Experience. In this post, we will dive into how we built a service to scale to billions of users.

The “new normal” for enterprise web applications includes multiple application tiers communicating via a service-oriented architecture that interacts with several databases and third-party web services. The modern application has multiples clients from browser-based desktops to native applications on mobile. At AppDynamics, we believe that application performance monitoring should cover all aspects of your application from the client-side to the server-side all the way back to the database. The goal of end user monitoring is to provide insight into client-side performance and capture errors from modern javascript-intensive applications. The challenge of building an end user monitoring service is that every single request needs to be instrumented. This means that for every request your application processes, we will process a beacon. With clients like FamilySearch, Fox News, BackCountry, ManPower, and Wowcher, we have to handle millions of concurrent requests.

1geo

AppDynamics End User Monitoring enables application owners to:

  • Monitor Their Global Audience and track End User Experience across the World to pinpoint which geo-locations may be impacted by poor Application Performance
  • Capture end-to-end performance metrics for all business transactions – including page rendering time in the Browser, Network time, and processing time in the Application Infrastructure
  • Identify bottlenecks anywhere in the end-to-end business transaction flow to help Operations and Development teams triage problems and troubleshoot quickly
  • Compare performance across all browsers types – such as Internet Explorer, FireFox, Google Chrome, Safari, iOS and Android
  • Track javascript errors

“Fox News already depends upon AppDynamics for ease-of-use and rapid troubleshooting capability in our production environment,” said Ryan Jairam, Internet Operations Lead at Fox News. “What we’ve seen with AppDynamics’ End-User Monitoring release is an even greater ability to understand application performance, from what’s happening on the browser level to the network all the way down to the code in the application. Getting this level of insight and visibility for an application as complex and agile as ours has been a tremendous benefit, and we’re extremely happy with this powerful new addition to the AppDynamics Pro solution.”

EUM Cloud Service

The End User Monitoring cloud is our super-scalable platform for data analysis and processing end user requests. In this post we will discuss some of the design challenges of building a cloud service capable of supporting billions of requests and the underlying architecture. Once End User Experience monitoring is enabled in the controller, your application’s requests are automatically instrumented with a very small piece of javascript that allows AppDynamics to capture critical performance metrics.

Screen Shot 2013-07-25 at 9.47.14 AM

The javascript agent leverages Web Episodes javascript timing library and the W3C Navigation Timing Specification to capture the end user experience metrics. Once the metrics are collected, they are pushed to the End User Monitoring cloud via a beacon for processing.

EUM (End User Monitoring) Cloud Service is our on-demand, cloud based, multi-tenant SaaS infrastructure that acts as an aggregator for the entire EUM metrics traffic. All the EUM metrics from the end user browsers from different customers are reported to EUM Cloud service. The raw browser information received from the browser is verified, aggregated, and rolled up at the EUM Cloud Service. All the AppDynamics Controllers (SaaS or on-premise) connect to the EUM Cloud service to download metrics every minute, for each application.

Design Challenges

On-Demand highly available

End users access customer web applications anywhere in the world and any time of the day in different time zones, whenever an AppDynamics instrumented web page is accessed. From the browser, EUM metrics are reported to the EUM Cloud Service. This requires a highly available on-demand system accessed from different geo locations and different time zones.

Extremely Concurrent usage

All end users of all AppDynamics customers using EUM solution continuously report browser information on the same EUM Cloud Service. EUM Cloud Service processes all the reported browser information concurrently and generate metrics and collect snapshot samples continuously.

High Scalability

The usage pattern for different applications throughout the day is different; the number of records to be processed at EUM Cloud vary with different applications at different times. The EUM Cloud Service automatically scale up to handle any surge in the incoming records and accordingly scale down with lower load.

Multi Tenancy support

The EUM Cloud Service process EUM metrics reported from different applications for different customers; the cloud service provides multi-tenancy. The reported browser information is partitioned based on customers and their different applications. EUM Cloud Service provides a mechanism for different customer controllers to download aggregated metrics and snapshots based on customer and application identification.

Cost

The EUM Cloud Service needs to be able to dynamically scale based on demand. The problem with supporting massive scale is that we have to pay for hardware upfront and over provision to handle huge spikes. One of the motivating factors when choosing to use Amazon Web Services is that costs scale linearly with demand.

Architecture

The EUM Cloud Service is hosted on Amazon Web Services infrastructure for horizontal scaling. The service has two functional components – collector and aggregator. Multiple instances of these components work in parallel to collect and aggregate the EUM metrics received from the end user browser/device. The transient metric data be transient is stored in Amazon S3 buckets. All the meta data information related to applications and other configuration is stored in the Amazon DynamoDB tables.

A single page load will send one or more beacon–one per base page and every iframe onload and one per ajax request. Javascript errors occurring post page load are also sent as error beacons.

The functionality of the nodes is to receive the metric data from the browser and process it for the controller:

  • Resolve the GEO information (request coming from the country/region/city) and add it to the metric using a in-process maxmind Geo-resolver.
  • Parse the User-Agent information and add browser information, device information and OS information to the metrics.
  • Validate the incoming browser reported metrics and discard invalid metrics
  • Mark the metrics/snapshots SLOW/VERY SLOW categories based on a dynamic standard deviation algorithm or using static threshold

Load Testing

For maximum scalability, we leverage Amazon Web Services global presence for optimal performance in every region (Virginia, Oregon, Ireland, Tokyo, Singapore, Sao Paulo). In our most recent load test, we tested the system as a collective to about 6.5 B requests per day. The system is designed to easily scale up as needed to support infinite load. We’ve tested the system running at many billions of requests per day without breaking a sweat.

Check out your end user experience data in AppDynamics

4breakdown

Find out more about AppDynamics Pro and get started monitoring your application with a free 15 day trial.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Monitoring the Real End User Experience

Web application performance is fundamentally associated in the mind of the end user; with a brands reliability, stability, and credibility. Slow or unstable performance is simply not an option when your users are only a click away from taking their business elsewhere. Understanding your users, the locations they are coming from, and the devices/browsers they are using is crucial to ensuring customer loyalty and growth.

Today’s modern web applications are architected to be highly interactive, often executing complex client side logic in order to provide a rich and engaging experience to the user. This added complexity means it is no longer good enough to simply measure the effects users have on the back-end. It is also necessary to measure and optimize the client-side performance to ensure the best possible experience for your users.

Determining the root cause for poor user experience is a costly and time consuming activity which requires visibility into page composition, JavaScript error diagnostics, network latency metrics and AJAX/iframe performance.

Let’s take a look at a few of the key features available in AppDynamics 3.7 which simplify troubleshooting these problems.

End User Experience dashboard:

The first view we will look at reports EUM data by geographic location showing which regions have the highest loads, the longest response times, the most errors, etc.

1geo
The dashboard is split into three main panels:

  • A main panel in the upper left that displays geographic distribution on a map or a grid
  • A panel on the right displaying summary information: total end user response time, page render time, network time and server time
  • Trend graphs in the lower part of the dashboard that dynamically display data based on the level of information displayed in the other two panels

The geographic region for which the data is displayed throughout the dashboard is based on the region currently showed on the map or in the summary panel. For example, if you zoom down from global view to France in the map, the summary panel and the graphs display data only for France.

This view is key to understanding the geographical impact of any network & CDN latency performance issues. You can also see which are the busiest geographies, driving the highest throughput of your application. 

Browsers and Devices:

From the Browsers and Devices tabs you can see the distribution of devices, browsers and browser versions providing an understanding of which are the most popular access mechanisms for the users of your application and geographic-split by region.  From here we can isolate if a particular browser or device is delivering a reduced experience to the end user and help plan the best areas for optimisation.

2browserdevice

Troubleshooting End User Experience:

The user response breakdown shown below, is the first place we look to troubleshoot why a user is experiencing slow response times. It provides a full breakdown of where the overall time is being spent during the various stages of a page render, highlighting issues such as network latency, poor page design, too much time parsing HTML or downloading and executing JavaScript. 

Screen Shot 2013-07-25 at 9.47.14 AM

Response time metric breakdown

First Byte Time is the interval between the time that a user initiates a request and the time that the browser receives the first response byte.

Server Connection Time is the interval between the time that a user initiates a request and the start of fetching the response document from the server. This includes time spent on redirects, domain lookups, TCP connects and SSL handshakes.

Response Available Time is the interval between the beginning of the processing of the request on the browser to the time that the browser receives the response. This includes time in the network from the user’s browser to the server.

Front End Time is the interval between the arrival of the first byte of text response and the completion of the response page rendering by the browser.

Document Ready Time is the time to make the complete HTML document (DOM) available.

Document Download Time is the time for the browser to download the complete HTML document content.

Document Processing Time is the time for the browser to build the Document Object Model (DOM)

Page Render Time is the time for the browser to complete the download of remaining resources, including images, and finish rendering the page.

AppDynamics EUM reports on three different kinds of pages:

  • A base page represents what an end user sees in a single browser window. 
  • An iframe is a page embedded in another page.
  • An Ajax request is a request for data sent from a page asynchronously.

Notifications can be configured to trigger on any of these.

JavaScript error detection

JavaScript error detection provides alerting and identification of the root cause of JavaScript errors in minutes, highlighting the JavaScript file, line # and exception messages for all errors seen by your real users.

Screen Shot 2013-07-25 at 9.45.01 AM

Server-side correlation

If the above isn’t enough and you want to look into the execution within the datacentre, you can drill in-context from any of the detailed browser snapshots directly into the corresponding call stack trace in the application servers behind. This provides end-to-end visibility of a user’s interaction with your web application, from the browser all the way through the datacentre and deep into the database.

4breakdown

Deployment and scalability:

Deployment is simple – all you have to do is add a few lines of JavaScript to the web pages you want to monitor. We’ll even auto-inject this JavaScript on certain platforms at runtime. With its elastic public cloud architecture, AppDynamics EUM is designed to support billions of devices and user sessions per day, making it a perfect fit for enterprise web applications.

See Everything:

With AppDynamics you’ll get visibility into the performance of pages, AJAX requests and iframes, and you can see how performance varies by geographic region, device and browser type. In addition, you’ll get a highly granular browser response time breakdown (using the Navigation Timing API) for each snapshot, allowing you to see exactly how much time is spent in the network and in rendering the page. And if that’s not enough, you’ll see all JavaScript errors occurring at the end user’s browser down to the line number.

If you don’t currently know exactly what experience your users are getting when they access your applications or your users are complaining and you don’t know why then why not checkout AppDynamics End User Monitoring for free at appdynamics.com

Manpower Group Sees Real Results from End User Monitoring

Some companies talk about monitoring their end user experience and other companies take the bull by the horns and get it done. For those who have successfully implemented EUM (RUM, EUEM, or whatever your favorite acronym is) the technology is rewarding for both the company and the end user alike. I recently had the opportunity to discuss AppDynamics EUM with one of our customers and the information shared with me was exciting and gratifying.

The Environment

ManpowerGroup monitors their intranet and internet applications with AppDynamics. These applications are used for internal operations as well as customer facing websites; in support of their global business and accessed from around the word, 24×7. We’re talking about business critical, revenue generating applications!

I asked Fred Graichen, Manager of Enterprise Application Support, why he thought ManpowerGroup needed EUM.

“One of the key components for EUM is to shed light on what is happening in the “last mile”. Our business involves supporting branch locations. Having an EUM tool allows us to compare performance across all of our branches. This also helps us determine whether any performance issues are localized. Having the insight into the difference in performance by location allows us to make more targeted investments in local hardware and network infrastructure.”

Meaningful Results

Turning on a monitoring tool doesn’t mean you’ll automagically get the results you want. You also need to make sure your tool is integrated with your people, processes, and technologies. That’s exactly what ManpowerGroup has done with AppDynamics EUM. They have alerts based upon EUM metrics that get routed to the proper people. They are then able to correlate the EUM information with data from other (Network) monitoring tools in their root cause analysis. Below is an EUM screen shot from ManpowerGroup’s environment.

MPG EUM

By implementing AppDynamics EUM, ManpowerGroup has been able to:

  • Identify locations that are experiencing the worst performance.
  • Successfully illustrate the difference in performance globally as well. (This is key when studying the impact of latency etc. on an application that is being accessed from other countries but are located in a central datacenter.)
  • Quickly identify when a certain location is seeing performance issues and correlate that with data from other monitoring solutions.

But what does all of this mean to the business? It means that ManpowerGroup has been able to find and resolve problems faster for their customers and employees. Faster application response time combined with happier customers and more productive employees all contribute to a healthier bottom line for ManpowerGroup.

ManpowerGroup is using AppDynamics EUM to bring a higher level of performance to it’s employees, customers, and shareholders. Sign up for a free trial today and begin your journey to a healthier bottom line.

Synthetic vs Real-User Monitoring: A Response to Gartner

AvailabilityRecently Jonah Kowall of Gartner released a research note titled “Use Synthetic Monitoring to Measure Availability and Real-User Monitoring for Performance”. After reading this paper I had some thoughts that I wanted to share based upon my experience as a Monitoring Architect (and certifiable performance geek) working within large enterprise organizations. I highly recommend reading the research note as the information and findings contained within are spot on and highlight important differences between Synthetic and Real-User Monitoring as applied to availability and performance.

My Apps Are Not All 24×7

During my time working at a top 10 Investment Bank I came across many different applications with varying service level requirements. I say they were requirements because there were rarely ever any agreements or contracts in place, usually just an organizational understanding of how important each application was to the business and the expected service level. Many of the applications in the Investment Bank portfolio were only used during trading hours of the exchanges that they interfaced with. These applications also had to be available right as the exchanges opened and performing well for the entire duration of trading activity. Having no real user activity meant that the only way to gain any insight into availability and performance of these applications was by using synthetically generated transactions.

Was this an ideal situation? No, but it was all we had to work with in the absence of real user activity. If the synthetic transactions were slow or throwing errors at least we could attempt to repair the platform before the opening bell. Once the trading day got started we measured real user activity to see the true picture of performance and made adjustments based upon that information.

Performance

Can’t Script It All

Having to rely upon synthetic transactions as a measure of availability and performance is definitely suboptimal. The problem gets amplified in environments where you shouldn’t be testing certain application functionality due to regulatory and other restrictions. Do you really want to be trading securities, derivatives, currencies, etc… with your synthetic transaction monitoring tool? Me thinks not!

So now there is a gaping hole in your monitoring strategy if you are relying upon synthetic transactions alone. You can’t test all of your business critical functionality even if you wanted to spend the long hours scripting and testing your synthetics. The scripting/testing time investment gets amplified when there are changes to your application code. If those code updates change the application response you will need to re-script for the new response. It’s an evil cycle that doesn’t happen when you use the right kind of real user monitoring.

Real User Monitoring: Accurate and Meaningful

When you monitor real user transactions you will get more accurate and relevant information. Here is a list (what would a good blog post be without a list?) of some of the benefits:

  • Understand exactly how your application is being used.
  • See the performance of each application function as the end user does, not just within your data center.
  • No scripting required (scripting can take a significant amount of time and resources)
  • Ensure full visibility of application usage and performance, not just what was scripted.
  • Understand the real geographic distribution of your users and the impact of that distribution on end user experience.
  • Ability to track performance of your most important users (particularly useful in trading environments)

Conclusion

Synthetic transaction monitoring and real user monitoring can definitely co-exist within the same application environment. Every business is different and has their own unique requirements that can impact the type of monitoring you choose to implement. If you’ve not yet read the Gartner research note I suggest you go check it out now. It provides a solid analysis on synthetic and real user monitoring tools, companies, and usage scenarios which are completely different from what I have covered here.

Have synthetic or real transaction monitoring saved the day for your company? I’d love to hear about it in the comments below.

UX – Monitor the Application or the Network?

Last week I flew into Las Vegas for #Interop fully suited and booted in my big blue costume (no joke). I’d been invited to speak in a vendor debate on User eXperience (UX): Monitor the Application or the Network? NetScout represented the Network, AppDynamics (and me) represented the Application, and “Compuware dynaTrace Gomez” sat on the fence representing both. Moderating was Jim Frey from EMA, who did a great job introducing the subject, asking the questions and keeping the debate flowing.

At the start each vendor gave their usual intro and company pitch, followed by their own definition on what User Experience is.

Defining User Experience

So at this point you’d probably expect me to blabber on about how application code and agents are critical for monitoring the UX? Wrong. For me, users experience “Business Transactions”–they don’t experience applications, infrastructure, or networks. When a user complains, they normally say something like “I can’t Login” or “My checkout timed out.” I can honestly say I’ve never heard them say –  “The CPU utilization on your machine is too high” or “I don’t think you have enough memory allocated.”

Now think about that from a monitoring perspective. Do most organizations today monitor business transactions? Or do they monitor application infrastructure and networks? The truth is the latter, normally with several toolsets. So the question “Monitor the Application or the Network?” is really the wrong question for me. Unless you monitor business transactions, you are never going to understand what your end users actually experience.

Monitoring Business Transactions

So how do you monitor business transactions? The reality is that both Application and Network monitoring tools are capable, but most solutions have been designed not to–just so they provide a more technical view for application developers and network engineers. This is wrong, very wrong and a primary reason why IT never sees what the end user sees or complains about. Today, SOA means applications are more complex and distributed, meaning a single business transaction could traverse multiple applications that potentially share services and infrastructure. If your monitoring solution doesn’t have business transaction context, you’re basically blind to how application infrastructure is impacting your UX.

The debate then switched to how monitoring the UX differs from an application and network perspective. Simply put, application monitoring relies on agents, while network monitoring relies on sniffing network traffic passively. My point here was that you can either monitor user experience with the network or you can manage it with the application. For example, with network monitoring you only see business transactions and the application infrastructure, because you’re monitoring at the network layer. In contrast, with application monitoring you see business transactions, application infrastructure, and the application logic (hence why it’s called application monitoring).

Monitor or Manage the UX?

Both application and network monitoring can identify and isolate UX degradation, because they see how a business transaction executes across the application infrastructure. However, you can only manage UX if you can understand what’s causing the degradation. To do this you need deep visibility into the application run-time and logic (code). Operations telling a Development team that their JVM is responsible for a user experience issue is a bit like Fedex telling a customer their package is lost somewhere in Alaska. Identifying and Isolating pain is useful, but one could argue it’s pointless without being able to manage and resolve the pain (through finding the root cause).

Netscout made the point that with network monitoring you can identify common bottlenecks in the network that are responsible for degrading the UX. I have no doubt you could, but if you look at the most common reason for UX issues, it’s related to change–and if you look at what changes the most, it’s application logic. Why? Because Development and Operations teams want to be agile, so their applications and business remains competitive in the marketplace. Agile release cycles means application logic (code) constantly changes. It’s therefore not unusual for an application to change several times a week, and that’s before you count hotfixes and patches. So if applications change more than the network, then one could argue it’s more effective for monitoring and managing the end user experience.

UX and Web Applications

We then debated which monitoring concept was better for web-based applications. Obviously, network monitoring is able to monitor the UX by sniffing HTTP packets passively, so it’s possible to get granular visibility on QoS in the network and application. However, the recent adoption of Web 2.0 technologies (ajax, GWT, Dojo) means application logic is now moving from the application server to the users browser. This means browser processing time becomes a critical part of the UX. Unfortunately, Network monitoring solutions can’t monitor browser processing latency (because they monitor the network), unlike application monitoring solutions that can use techniques like client-side instrumentation or web-page injection to obtain browser latency for the UX.

The C Word

We then got to the Cloud and which made more sense for monitoring UX. Well, network monitoring solutions are normally hardware appliances which plug direct into a network tap or span port. I’ve never asked, but I’d imagine the guys in Seattle (Amazon) and Redmond (Windows Azure) probably wouldn’t let you wheel a network monitoring appliance into their data-centre. More importantly, why would you need to if you’re already paying someone else to manage your infrastructure and network for you? Moving to the Cloud is about agility, and letting someone else deal with the hardware and pipes so you can focus on making your application and business competitive. It’s actually very easy for application monitoring solutions to monitor UX in the cloud. Agents can piggy back with application code libraries when they’re deployed to the cloud, or cloud providers can embed and provision vendor agents as part of their server builds and provisioning process.

What’s interesting also is that Cloud is highlighting a trend towards DevOps (or NoOps for a few organizations) where Operations become more focused on applications vs infrastructure. As the network and infrastructure becomes abstracted in the Public Cloud, then the focus naturally shifts to the application and deployment of code. For private clouds you’ll still have network Ops and Engineering teams that build and support the Cloud platform, but they wouldn’t be the people who care about user experience. Those people would be the Line of Business or application owners which the UX impacts.

In reality most organizations today already monitor the application infrastructure and network. However, if you want to start monitoring the true UX, you should monitor what your users experience, and that is business transactions. If you can’t see your users’ business transactions, you can’t manage their experience.

What are your thoughts on this?

AppDynamics is an application monitoring solution that helps you monitor business transactions and manage the true user experience. To get started sign-up for a 30-day free trial here.

I did have an hour spare at #Interop after my debate to meet and greet our competitors, before flying back to AppDynamics HQ. It was nice to see many of them meet and greet the APM Caped Crusader.

App Man.

Finding the Root Cause of Application Performance Issues in Production

The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.

It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.

To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.

End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.

For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.

Why Deep Diagnostics for Production Monitoring Matters

A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.

AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.

Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.

Needle #1 – Slow SQL Statement

Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan

Needle #2 – Slice of Death in Cassandra

Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra

Needle #3 – Slow & Chatty Web Service Calls

Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)

Needle #4 -Extreme XML processing

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.

Needle #5 – Mail Server Connectivity

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity

 Needle #6 – Slow ResultSet Iteration

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data

Needle #7 – Slow Security 3rd Party Framework

Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code

Needle #8 – Excessive SQL Queries

Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction

Needle #9 – Commit Happy

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.

Needle #10 – Locking under Concurrency

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency

 Needle #11 – Slow 3rd Party Search Service

Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code

 Needle #12 – Connection Pool Exhaustion

Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries

Needle #13 – Excessive Cache Usage

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration

If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.

Now go find those needles that are hurting your end users!

App Man.

Spot the Difference in AppDynamics 3.4

Okay, admit it. The title Spot the Difference in AppDynamics 3.4 caught your attention, especially after we just recently announced our free End User Monitoring features, and now you’re here to find out what other insanely cool stuff we’ve been working on to enhance the experience for APM aficionados, customers and people like you! I’m proud to say that AppDynamics continues to innovate by leaps and bounds which enables our customers to be more successful in how they manage application performance and availability.

Here’s an example of us staying ahead of the curve. I was scouring Twitter feeds several weeks back and found this tweet from a Java applications guy,


Hmmmm, a proverbial question indeed.

So what exactly changes with an application release?

Flashback to 2008…I ran into a painfully similar situation back at LG. We had a team of consultants working on Java performance optimizations for eight months, executing test cases, refactoring application code, disabling trigger objects and even redesigning some of the use case workflows. Well, it turns out the build and deployment team pushed the entire application onto a completely different set of infrastructure, and when the new test results came back we almost choked when we saw 20-30% regressions in application performance. That night the performance team drank itself into oblivion, and you can imagine what happened next.

For the next two weeks everyone on the project was now heads down trying to identify what changes caused such a massive slowdown to the system. Remember, we didn’t have an APM solution like AppDynamics, and had to resort to using four different performance monitoring tools, since we didn’t have a single solution that provided us with complete end-to-end visibility. Those were tumultuous times when Ops, Dev and IT teams engaged in heated arguments about who was at fault.

All of this could have been easily avoided if we had something like AppDynamics 3.4 Agile Release Analysis to compare application releases visually. Having this type of comparative analysis capability comes in handy when you’re trying to understand why a user login transaction, for instance, is slower with your latest agile release than what it was a week ago. You could always conduct code diffs on your releases, but sometimes it’s incredibly valuable to be able to visualize performance infrastructure changes at a glance rather than having to spend hours or even days manually verifying what differences actually occurred.

Agile Release Analysis

AppDynamics release analysis rolls up application performance metrics so you can compare KPIs of not only the application, but also individual business transactions, as well as their flow and execution. The application might’ve have gotten slower, but being able to identify which piece of the overall application or specific transaction is affecting performance is much more useful to operations and IT.

Speaking of comparing Application Flow Maps, in order for this feature to be truly tenable, we needed to improve the visibility challenges customers were facing with our original flow map. We had the right idea, but it was too rigid. Monitoring several thousand tiers became blinding for some customers, leaving App Ops with a cluttered forest, and no trees. So to stay true to our company mission with offering customers with maximum visibility with minimal effort, we improved the viewing experience so you can now navigate your entire application just like you’d navigate the entire world with Google Maps. The next time you’re notified there’s an issue with a transaction that’s traversing through a particular node in a cluster, you can feel confident it’ll be easy to spot and zoom in on. The application flow maps now change color in real-time depending on the SLA and performance of application tiers and the flows that connect them. For example, here is a screenshot of the new flow map with baseline comparison enabled:

Zoom in and out of your App Flow Map

 

Introducing Role-Based Perspectives

Let’s assume the role of a database administrator for a moment. Sure, they might look at Java or .NET code once in a blue moon, but their technical domain is managing the backend of the application. So in 3.4 we decided to streamline the troubleshooting process from a role-based perspective to help our respective backend admins “get to the point” and view information pertinent to their role.

“Show me the top SQL calls for my database.” Done.

“Okay, now which business transactions do my SQL queries impact?” You can think of it as a “Where Used” look up that allows the DBA to analyze what they’re interested in from a role-based perspective and see how queries are contributing to business transaction performance. The Backend Specialists – DBA, Message Broker, ESB and Security Administers – can now say with conviction, “I optimized my backend service that was impacting the user experience of several mission critical business transactions.” That’s what I call getting bang for the buck performance optimizations!

Reporting on Application AND Business Activity.

AppDynamics 3.4 also includes a new PDF reporting engine so you can generate, save and share reports detailing application health metrics. We’ve introduced several prepackaged reports as well as the ability to create your own custom report. All of the aforementioned cool 3.4 features offer some amazing technical benefits that speak to those in DevOps, but at the end of the day, someone is going to ask how this is impacting your business’s bottom line. So in 3.4, we now allow you to build a dashboard that monitors business activity for your most revenue-critical apps. Albeit, our technology is pretty slick as it is to auto-discover and baseline your application’s performance without the need for manual instrumentation. However, there may be users or scenarios where you may want to monitor the activity and revenue of your application, rather than say its performance or throughput. No problem – you can exploit “Information Points” in AppDynamics to track any business metric or value which is part of a business transaction. Our intuitive dashboard builder lets you expose any metric with drag and drop widgets meaning you can create powerful business views in minutes.

There are a number of other powerful features and product enhancements in AppDynamics 3.4 you can start exploring today by registering here for a 30-day free trial. If you’re already an AppDynamics customer, we look forward to hearing more feedback on how we can improve the experience even further and want to say thanks again! We hope you’re as delighted as we are about our latest release.

 

 

APM vs NPM. 2nd Round K.O.

Round Two – Last time I wrote a blog comparing APM versus network-based APM tools, which I still consider NPM at it’s core regardless of what some critics and competitors claim. Let me make one thing clear though, NPM is great for equipping IT network administrators to see how fast or slow data is traveling through the pipes of their application. Unfortunately, network-based APM tools simply cannot provide App Ops granular visibility into the application runtime when isolating bottlenecks go beyond the system level and it’s final destination – the end user’s browser.

I find several of the blogs and YouTube clips from such NPM vendors quite comical as they try to throw punches at APM companies. Their arguments are centered primarily against agent-based approaches being an inadequate APM solution due to today’s fickle and distributed application architectures. It’s not like I haven’t heard it before.

The amusing thing about it…they’re completely right! In fact, we couldn’t agree more, and that’s why Jyoti Bansal founded AppDynamics to address these perennial shortcomings legacy APM vendors have been ignoring. Even the smallest businesses next to the largest enterprises have complex applications that have outpaced their App Ops teams’ current set of monitoring tools. That’s why AppDynamics is reinventing and reigniting the application performance management space by enabling IT operations to monitor complex, modern applications running in the cloud or the data center. So let me respond to those claims they’ve made.

The Claims

“Agents have high deployment and ongoing maintenance burden.”
Legacy APM: TRUE
AppDynamics: FALSE. No manual instrumentation required. It’s automatic.

“Agents are invasive which can perturb the systems being monitored.”
Legacy APM: TRUE
AppDynamics: FALSE. Our customers see less than 1-2% overhead in production. 

“Performance management vendors have over promised and under delivered for decades.”
Legacy APM: TRUE
AppDynamics: FALSE. Things are going well thanks. Check our customer list and 400% growth.

All AppDynamics. The next-gen of APM.

Example App with application performance issues

I drew a parallel in my previous post that using NPM concepts to monitor application performance is like inspecting UPS packages en-route to figure out why operations at a hub came to a screeching halt. Remember, even if the package contents is visible from afar, it doesn’t explain why the hub conveyors, which electronically guide packages to their appropriate destination chute is broken, nor can it identify why cargo operations have stalled. In other words, good luck trying to gather anything beyond the scope of the application’s infrastructure. Using network monitoring tools to collect even the most basic system health metrics such as CPU utilization, memory usage, thread pool consumption and thrashing? Time to throw in the towel.

And what about End User Monitoring?

What’s becoming just as important as being able to monitor server side processing and network time is the ability to monitor end user performance. When NPM tools are only able to see the last packet sent from the server, how does that help you understand the browser’s performance? It doesn’t since once again, this kind of analysis is only feasible higher up the stack at the Application Layer. And just to clarify when I say Application Layer, I mean application execution time, not “network process time to application” as defined by OSI Layer 7.

On the other hand, injected agents residing in that layer can insert JavaScript into the Web page to determine the execution time spent in the browser. This is becoming more of a concern for App Ops and Dev Ops now that 80-90% of the end-user response time is spent on the frontend executing JavaScript, rendering markup and stylesheets. As business logic continues it’s migration to the browser while increasing it’s processing burden, the client is looking more and more like the new server. Network monitoring tools must move to an agent-based approach if they are to truly deliver the monitoring visibility needed for the application and end user experience, otherwise their visibility will remain between a rock and a hard place.

On top of that, what about those customers running their applications in a public cloud? Are you going to convince your cloud provider to install a network appliance into their infrastructure? I highly doubt it. With AppDynamics, we have partnerships with cloud providers such as Amazon EC2, Azure, RightScale and Opsource allowing developers and operations to easily deploy AppDynamics with a flick of a switch and monitor their applications in production 24/7.

Once again, next-gen APM triumphs over NPM based application performance on not just the server side, but also the browser. AppDynamics is embracing this and fully aware of the technical and business significance of monitoring end user performance. We’re delighted to offer this kind of end-to-end visibility to our customers who will now be able to monitor application performance from the end users’ browser to the backend application tiers (databases, mainframes), all through a single pane of glass view.