Focusing on Business Transactions is a Proven Best Practice

In today’s software-defined business era, uptime and availability are key to the business survival. The application is the business. However, ensuring proper application performance remains a daunting task for their production environments, where do you start?

Enter Business Transactions.

By starting to focus on the end-user experience and measuring application performance based on their interactions, we can correctly gauge how the entire environment is performing. We follow each individual user request as it flows through the application architecture, comparing the response time to its optimal performance. This inside-out strategy allows AppDynamics to instantly identify performance bottlenecks and allows application owners to get to the root-cause of issues that much faster.
output_R1SFaZ

By becoming business transaction-centric application owners can ensure uptime and availability even within a challenging application environment. Business transactions give them the insight that’s required in order to react to quickly changing conditions and respond accordingly.

So, what exactly is a Business Transaction?

Simply: any and every online user request.

Consider a business transaction to be a user-generated action within your system. The best practice for determining the performance of your application isn’t to measure CPU usage, but to track the flow of a transaction that your customer, the end user, has requested.

It can be requests such as:

  • logging in
  • adding an item to a cart
  • checking out
  • searching for an item
  • navigating different tabs
  • Shifting your focus to business transactions completely changes the game in terms of your ability to support application performance.

    Business Transactions and APM

    Business Transactions equip application owners with three important advantages.

    Knowledge of User Experience

    If a business transaction is a “user-generated action,” then it’s pretty clear how monitoring business transactions can have a tremendous effect on your ability to understand the experience of your end user.

    If your end user adds a book to a shopping cart, is the transaction performing as expected or is it taking 3 seconds longer? (And what kind of impact will that have on end users? Will they decide to surf away and buy books somewhere else, thus depriving your business of not just the immediate purchase but the potential loss of lifetime customer revenue?)

    Monitoring business transactions gives you a powerful insight into the experience of your end user.

    Service Assurance – the ability to track baseline performance metrics

    AppDynamics hears from our clients all the time that it’s difficult to know what “normal” actually is. This is particularly true in an ever-changing application environment. If you try to determine normal performance by correlating code-level metrics – while at the same time reacting to frequent code drops – you will never get there.

    Business transactions offer a Service Assurance constant that you can use for ongoing monitoring. The size of your environment may change and the number of nodes may come and go, but by focusing on business transactions as your ongoing metric, you can begin to create baseline performance for your application. Understanding this baseline performance is exactly what you need in order to understand whether your application is running as expected and desired, or whether it’s completely gone off the rails.

    For example, you may have a sense of how your application is supposed to perform. But do you really know how it performs every Sunday at 6 p.m.? Or the last week of December? And if you don’t, how will you know when the application is deviating from acceptable performance? It’s figuring out “normal” in terms of days, weeks, and even seasons that you need to truly understand your application’s baseline performance.

    Triage & Diagnosis – always knowing where to look to solve problems

    Finally, when problems occur, business transactions prevent you from hunting through logs and swimming through lines of code. The transaction’s poor performance immediately shines a spotlight on the problem – and your ability to get to root cause quickly is dramatically improved.

    If you’re tracking code-level metrics in a large environment instead of monitoring business transactions, the chances are that the fire you’re troubleshooting is going to roar out of hand before you’re able to douse it.

    Summary

    Application owners are under extraordinary pressure to incorporate frequent code changes while still being held responsible for 100% application uptime and performance. In a distributed and rapidly changing environment, meeting these high expectations becomes tremendously challenging.

    A strong focus on business transactions becomes absolutely essential for maintaining application performance. Transaction-centric monitoring provides the basis for a stable performance assurance metric, it delivers powerful insights into user experience, and it ensures the ability to know where to hunt during troubleshooting.

    The right APM solution can automate much of this work. It can help application owners identify and bucket their business transactions, as well as assist with triage, troubleshooting, and root cause diagnosis when transactions violate their performance baselines. In this way, business transactions are essential to ensuring the success of Developers, Operations, and Architects – anyone with a stake in application performance.

    Holographic Haystacks — Capturing Business Transactions

    Why Archive Every Business Transaction?

    The other day I found myself in a discussion with a prospect regarding the merits of capturing complete stack traces for every transaction that flowed through their production system. You see AppDynamics doesn’t do that; we capture problematic business transactions, and this bloke seemed concerned someone would one day ask him for the details of a specific transaction that our method would leave him underprepared to fulfill. As the conversation played on, we explored the cost – both literal, in the sense of hardware required for data warehousing, and virtual, like the compounding impact additional polling during bad times has on performance. Ultimately we arrived at the conclusion that understanding your haystack is better than having the haystack. This point can be subtle, so let’s explore how we got there…

    Fullscreen capture 782014 31924 PMSince AppDynamics automatically baselines business transactions, we can very quickly tell the difference between a normal transaction and an outlier, or in the context of this analogy: hay vs. needles. When we encounter a piece of hay we record the metadata of the transaction. You know: how long the strand is, where it resides, who put it in the stack, etc. In fact, the part of me that humbly struggles with homonyms likes to think of this as a piece of “hey.” I imagine someone (a business transaction) flying past me on the highway and waiving “heeeyy!” as I feverishly jot down some key details about their ride. For the most part I already know enough about them, they’re speeding along unabated, just like the plethora of friends to follow, no need to dig too deep, just note the event so we can better understand our “heystack.”

    In contrast, when a particular transaction performs several standard deviations outside of the norm, or perhaps worse than a given threshold, we need to know more than “it’s slow.” This is where the context before content idea really starts to come into light. Because we know what a good piece of hay looks like, and because we understand how our virtual haystack looks at any given moment, we have the context to both appreciate and spot the needles – the need for content. Needles are special. They’re shiny. And you probably want to trade them for hay. So we need to know all their specs. We need content.

     

    No worries, that’s why AppDynamics captures complete transaction flows for these items. But what if – returning to the highway analogy for a moment – a funeral procession (a bout of bad performance) drives by? The act of gathering information during this kind of needle storm could exacerbate the issue and result in exaggerated wait times, inaccurate information, and a unnecessarily pitiful user experience. Well we have the context to understand when that’s happening so AppDynamics will automatically throttle back needle gathering to ensure our application never negatively impacts yours. Remember the ultimate goal here is to ensure you have all the tools you need to deliver an exceptional user experience, in the short and long term.

    Thinking about the long term for a moment … we should probably use this information to improve the performance of the application. So let’s pass the data about the problems along, shall we? But … if all you ever gave your team were needles, they might forget what hay looks like. This lack of context could hinder their ability to remediate issues. To mitigate these risks, AppDynamics will snag the occasional piece of hay… just for comparison’s sake and ensure your team, your developers, have all the context and content they need to improve the performance of your applications.

    So as you can see, you don’t need to house an entire haystack to effectively identify and remediate problems. With a clear understanding of what the haystack would look like – a holographic haystack, if you will – a few pieces of hay for reference, and a spotlight on all your needles, you’ll have everything you need to see, act, and know more quickly than ever before.

    Also check out how Citrix uses AppDynamics to capture business transactions, increase visibility, and lower their MTTR for their complex applications.

     

    Take five minutes to get complete visibility into the performance, and start finding those needles, of your production applications with AppDynamics today.

    AppDynamics & Splunk – Better Together

    AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

    So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

    I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

    The Power of the Business Transaction

    In our last post, we talked about the importance of business transactions for applications in the cloud. They’re also crucial for managing highly distributed applications. But what is a business transaction?
    Consider a business transaction to be a user-generated action within your system. The best practice for determining the performance of your application isn’t to measure CPU usage, but to track the flow of a transaction that your customer, the end user, has requested.

    A Holiday Greeting in Verse

    ‘Twas nearly 2011, a brand-new year
    But app owners stared sadly at their beer.

    They had a problem, and it seemed just enormous
    They craved a surefire solution to application performance.

    Getting to the Root of Swisscom Application Performance

    Here’s a guest blog from one of AppDynamics’ international partners, Stefan Zoltai from sysPerform. Stefan wanted to write about how he used AppDynamics to solve a performance problem for a major telecom company in Switzerland—and we said, sure!  Take it away, Stefan…

    I’d like to talk about how we used AppDynamics for a major production troubleshooting exercise—and how AppDynamics passed with flying colors.

    Swisscom is the leading telecommunications company in Switzerland with about 5.7 million mobile customers and 1.8 million broadband connections. Swisscom is present on the Swiss market with a full portfolio of wireless, wire- and IP-based data and voice-based communication services.

    Swisscom’s (Internet) Messaging had engaged sysPerform to assist with the analysis of their Tomcat 6 / Java 1.6 based WebMail application. WebMail has been under scrutiny for about a year now—ever since it manifested both performance and stability problems. Prior analysis efforts, conducted with a number of available tools, did not lead to the determination of the actual root cause(s) since the aforementioned problems only occurred in production under load and could not be reproduced in other environments. WebMail is rated at a throughput of 300 tx/sec.

    We realized immediately that without a deep, detailed view into the application’s runtime, in production and under load, we would not be able to determine the actual root cause.

    To analyze the application, we selected AppDynamics’ application performance management solution.  Since this solution has been developed specifically for high throughput, distributed production environments, we were able to obtain a high-level overview of the application as well as conduct a deep root cause analysis down to code-level execution without generating measurable overhead. Again, we did all of this at 300 transactions per second of throughput.

    Thanks to AppDynamics’ ability to create a dynamic baseline of application performance, we were able to isolate the major bottlenecks on the first day and discuss a solution with the developers at Swisscom.  We were able to quickly learn the application’s performance and stability characteristics — and after only 5 days of development, we deployed a specific, major fix to address the main issue and massively improve performance.  At the moment, we are continuing our analysis efforts since stability and performance are the focus of an ongoing quality process.

    [UPDATE: For Swisscom’s perspective on the use of AppDynamics, check out Mika Borner’s blog]

    This example clearly demonstrates that operating a modern, distributed application without an adequate monitoring solution is effectively the same as “flying blind.” 60%-80% of all performance problems are caused by the application itself, and need to be analyzed from the inside out. We can confirm these numbers from many of other engagements with similar customers. External causes like hardware or network issues have become increasingly rare; it’s the problems deep inside the application that truly matter.

    Intelligent application performance management however is not a means to itself, but must be evaluated in terms of economical considerations as well. Our experience indicates that an APM solution shows an ROI within just a few months. Among the reasons for such a quick ROI is the aforementioned extremely fast root cause analysis.

    If you’re reading this in Switzerland, feel free to contact me with questions!

    — Stefan Zoltai, Founder, SysPerform GmbH

    Email: sz@sysperform.ch

    Twitter: http://twitter.com/SysPerform

    Less is More When Monitoring Business Transactions

    In my last post, I wrote about how using Business Transactions as a management unit is critical for managing modern-day applications efficiently. Sticking to this train of thought, I will focus on how this applies to various aspects of application management. The first area I want to cover is monitoring and troubleshooting.

    In a highly distributed system, there are 100s of CPUs, 100s of JVMs/CLRs, and millions of lines of code running. Now, if you want to attach service levels to every component of that system, you would either be eyeballing dashboards most of the time or trying to maintaining the configuration associated with alerting.

    As I mentioned in the last post, if the business grows, it will require more capacity and more infrastructure—which means newer pieces (and ones that are moving around rapidly). Add the configuration attached with lines of code and you can pick either “daunting” or “impossible” to describe the task at hand. Long testing and staging cycles have become a thing of the past.

    At the same time, the DevOps tribe is adapting to and embracing the new application landscape rapidly. Their biggest need is the need for speed, which translates into efficiency driven by intelligence in every aspect of application management in production. This is where using Business Transactions can make monitoring more efficient and easier to accomplish.

    But wait a minute – am I really suggesting that you watch Business Transaction service levels (which are your business’ bottom line) instead of getting overwhelmed with alerts based on 1000s of key metrics? Doesn’t that mean you are ignoring things that can be going wrong by not giving them enough attention?

    Au contraire! In fact, you can actually focus on more with this approach versus traditional monitoring techniques. Let me explain.

    Traditional monitoring has all been about averages. You look at some service or some method and its average over time, then set up alerts associated with it. Doing so is great for catching slow performance degradation or systemic outages. But it won’t help you catch outliers where there is no pattern associated with errors or slow requests.

    Let’s look at a couple of examples to understand this better. For brevity I will talk only about slow requests (but the same argument can also be applied to errors).

    1) A frequent cause of slow requests, resulting in a poor user experience, pertains to a transaction associated with user input.  For example, in a shopping cart application, the user might add a particular item to his or her cart—which results in the application slowing down.

    2) Here’s another one. Sometimes one particular node, which is part of a big cluster of say 50 nodes, has an issue in servicing requests but might be doing ok on CPU and memory usage.

    In both of these cases, averages would hide the problem. Here’s an example.

    Scenario
    Over a period of time, an online checkout application experienced 5,000 Total Checkouts. Of those checkouts, 4,800 were Normal Checkouts and 200 were Slow Checkouts.

    In cases like these, it is very likely that the average response time of the normal transaction—along with that of the bad transactions—averages out to a very normal rate. But the bottom line is that 200 users had a bad checkout, and that needs to be fixed. By focusing on business transactions only and reducing the points of monitoring, you are actually able to identify and address more performance concerns than you might have otherwise.

    So what exactly are we watching here?

    a) Response time for a transaction – this is averaged over all requests for this transaction. We don’t need to watch the key methods being executed in the request since the overall response time for the request is the single indicator. A method-level drill down is needed only when the response time is slow.

    b) Number of slow requests over time – If we set up thresholds over the transaction and watch every request, we are able to identify exactly how many requests were outliers. Of course, for this to be useful, the system needs to be able to collect diagnostic information as slow requests happen and not afterwards. This also ensures that when the request is being serviced by a problematic node, it is represented appropriately.

    We’re not the only ones who believe that monitoring Business Transactions is critical to better performance in production. The customers and prospects we speak to are serious about having a system in place that can watch all requests instead of just watching averages in their system. It’s becoming a real-world demand with real-world benefits.

    The last thing I want to mention before signing off, is that AppDynamics does all of the above. We follow all requests to monitor SLAs—not just averages, but actual numbers—and we do so at an extremely low overhead, enabling us to jump in and get diagnostic data as bad requests happen!

    That wasn’t too much of a plug, right? Until next time…