How to Identify Impactful Business Transactions in AppDynamics

New users of APM software often believe their company has hundreds of critical business transactions that must be monitored. But that’s not the case. In my role as Professional Services Consultant (EMEA) at AppDynamics, I’ve worked at dozens of customer sites, and the question of “What to monitor?” is always foremost in new users’ minds.

AppDynamics’ Business Transactions (BTs) reflect the core value of your applications. Since our inception a decade ago, we’ve built our APM solution around this concept. Given the critical importance of Business Transactions, you’ll want to configure them the right way. While AppDynamics will automatically create BTs for you, you’ll benefit greatly by taking a few extra steps to optimize your monitoring environment.

APM users often think of a BT as a technical transaction in their system, but it’s much more than that. The BT is a key component of effective application monitoring. It consists of all required  services within your environment—things like login, search and checkout—that are utilized to fulfill and respond to a user-initiated request. These transactions reflect the logical way users interact with your applications. Activities such as adding an item to a shopping cart or checking out will summon various applications, databases, third-party APIs and web services.

If you’re new to APM, you may find yourself asking “Where should I begin?” By applying essential best practices, BT configuration can be a smooth and orderly process.

Start by asking yourself two key questions:

  1. What are my business goals for monitoring?
  2. What pain points am I trying to address by using APM?

You may already know the answers. Perhaps you want to resolve major problems that consume a lot of your time and resources, or insure that your most critical business operations are performing optimally. From there, you can drill down to more specific goals and operations to focus on. A retail website, for instance, may choose to focus on its checkout or catalog operation. Financial services firms may focus on the most-used APIs provided for their mobile clients. By prioritizing your business goals early in the process, you’ll find BTs much easier to configure.

AppDynamics automatically discovers and maps Business Transactions for you. Actions like Add to Cart are tagged and traced across every component of your application and visualized on a topology map, helping you to better understand performance across an entire application.

It’s tempting to think configuration is complete once you’ve instrumented with an agent and start seeing traffic coming in. But that’s just the technical side of things. You’ll also need to align with the business, asking questions like, “Do we have SLAs on this?” and “What’s the performance requirement?” You’ll also need to establish health rules and work with the business to determine, for instance, what action to take if a particular rule is violated.

Choose Your BTs Wisely

At a high level, a Business Transaction is more like a use case, even though users often think of it as a technical transaction. Sometimes I must remind users: “No, this activity you want to monitor is not a business transaction. It’s just a technical functionality of the system, but it’s not being used by a customer or an API user.” These cross-cutting metrics may be better served by monitoring through views like Service Endpoints or specific technical metrics.

Be very selective when choosing your Business Transactions. Here’s a rule of thumb: Configure up to 20 to 30 BTs per business application. This may not seem like a lot, but really it is. One of AppDynamics’ largest banking customers identified that 90% of its business activity was reflected in just 25 or so business transactions.

It’s not uncommon for new users to balk at this. They may say, “But we have many more important processes to track!” Fear not: the recommended number of BTs isn’t set in stone, although our 20-to-30 guideline is a good starting point. You may have 20 key Business Transactions and another 20 that are less critical, but you really want to monitor all 40. You can do this, of course, but you’ll need to prioritize these transactions. Capturing too many BTs can lead users to miss the transactions that are truly important to the business.

Best Practices

During APM setup, you’ll have many questions. Should you work exclusively with your own technical team? With the application owner? The business that’s using the application?

Start with these three key steps:

  1. Get to know your business.
  2. Identify the major flows.
  3. Talk to the application owner.

 

Whenever I’m onsite with a customer, the first thing I advise is that we login as an end user to see how they use the system. For example, we’ll order a product or renew a subscription, and then track these transactions end-to-end through the system. This very important step will help you identify the transactions you want to monitor.

It’s also critical to check the current major incidents you have, or at least the P1s and P2s. Find out what problems you’re experiencing right now. What are the major complaints involving the application?

Focus on the the low-hanging fruit—your most troublesome applications—which you’ll find by instrumenting systems and talking to applications owners. This will deliver value in the early setup stage, providing information you can take to the business to make them more receptive to working with you.

Prioritize Your Operations

Business Transactions are key to configuring APM. Before starting configuration, ask yourself these critical questions:

  1. What are my business goals for monitoring?
  2. What pain points am I trying to solve with AppDynamics?
  3. What are the typical problems that take up my time and resources?
  4. What are the most critical business operations that need to perform optimally?

 

Then take a closer look at your application. Decide which operations you must focus on to achieve your goals.

These key steps will help you prioritize operations and make it easier to configure them as Business Transactions. Go here to learn more!

Top 10 Reasons Why eCommerce Apps Will Fail This Black Friday

My wife is a shopoholic and serial checkout killer. Every week she spends several hours browsing and carefully adding items to shopping carts. Every now and then she’ll interrupt my sports program with an important announcement “My Checkout just failed”. Take this example Mrs Appman sent me during the month of September:

Checkout Fail

The fact I work in the APM industry means it is my responsibility to fix these problems immediately (for my wife). As Black Friday is coming up I thought I’d share with you what typically goes wrong under the covers when our customer’s e-commerce applications go bang during a critical time.

It is worth mentioning that nearly all our customers perform some form of performance or load testing prior to the Black Friday period. Many actually simulate the load from the previous year on test environments designed to reproduce Black Friday activity. While 75% of bottlenecks can be avoided in testing, unfortunately a few surface in production as a result of applications being too big and complex to manage to test under real world scenarios. For example, most e-commerce applications these days span more than 500 servers distributed across many networks and service providers.

Here are the top 10 reasons why eCommerce applications will fail this Black Friday:

1. Database Connection Pool

Nearly every checkout transaction will interact with one or more databases. Connections to this resource are therefore sacred and can often be deadly when transaction concurrency is high. Most application servers come with default connection pool configurations of between 10 and 20. When you consider that transaction throughput for e-commerce applications can easily exceed 100,000 trx/min you soon realize that default pool configurations aren’t going to cut it. When  a database connection pool becomes exhausted incoming checkout requests simply wait or timeout until a connection becomes available. Take this screenshot for example:

Connection Pool Issue

2. Missing Databases Indexes

This root cause is somewhat related to the exhausted connection pools. Simply put, slow running SQL statements hold onto a database connection for longer, therefore connection pools aren’t recycled as often as they should be as queries take longer. The number 1 root cause of slow SQL statements is missing indexes on database tables, which is often caused by miss-communication between developers who write SQL, and the DBAs who configure and maintain the database schemas which hold the data. The classic “full table scan” query execution where a transaction and its database operation must scan through all the data in a table before a result is returned. Here is an example of what such looks like in AppDynamics:

Missing Index

3. Code Deadlock

High transaction concurrency often means application server threads have to contend more for application resource and objects. Most e-commerce applications have some form of atomicity build in to their transactions, so that order and stock volumes are kept in check as thousands of users fight over special offers and low prices. If access to application resource is not properly managed some threads can end up in deadlock, which can often cause an application server and all its user transactions to hang and timeout. One example of this was last year where an e-commerce customer was using a non-thread safe cache. Three threads tried to perform a get, set and remove on the same cache at the same time causing code deadlock to occur, impacting over ~2,500 checkout transactions as the below screenshot shows.

deadlock

4. Socket Timeout Exceptions

Server connectivity is an obvious root cause, if you check your server logs using a Sumologic or Splunk then you’ll probably see hundreds of these events. They represent network problems or routing failures where a checkout transaction is attempting to contact one or more servers in the application infrastructure. Most of the time the services you are connecting to aren’t your own, for example a shipping provider, credit card processor, or fraud detector. On high traffic days like Black Friday it isn’t just your site experiencing a surge in traffic – often times entire networks are saturated due to intense demand. After a period of time (often 30-45 secs) the checkout transaction will just give up, timeout and return an error to the user. No Connectivity = No Revenue. Here is an example of what it looks like:

socket timeout exception

5. Garbage Collection

Caches are an easy way to speed up applications. The closer data is to application logic (in memory) the faster it executes. It is therefore no surprise that as memory has gotten bigger and cheaper most companies have adopted some form of in-memory caching to eliminate database access for frequent used results. The days of 64GB and 128GB heaps are now upon us which means the impact of things like Garbage Collection are more deadly to end users. Maintaining cache data and efficiently creating/persisting user objects in memory becomes paramount for eliminating frequent garbage collection cycles. Just because you have GB’s of memory to play with doesn’t mean you can be lazy in how you create, maintain and destroy objects. Here is are a few screenshots that show how garbage collection can kill your e-commerce application:

Garbage Collection

Screen Shot 2013-10-14 at 2.57.04 PM

6. Transactions with High CPU Burn

Its no secret than inefficient application logic will require more CPU cycles than efficient logic. Unfortunately the number 1 solution to slow performance in the past was for eCommerce vendors to buy more servers. More servers = More Capacity = More Transaction Throughput. While this calculation sounds good, the reality is that not all e-commerce transactions are CPU bound. Adding more capacity just masks inefficient code in the short term, and can waste you significant amounts of money in the long term. If you have specific transactions in your eCommerce application that hog or burn CPU then you might want to consider tuning those before you whip out your check book with Oracle or Dell. For example:

High CPU Burn

7. 3rd Party Web Services

If your e-commerce application is built around a distributed SOA architecture then you’ll have multiple points of failure. Especially if several of those services are provided by a 3rd party where you have no visibility. For example, most payment and credit card authorization services are provided by 3rd party vendors like PayPal, Stripe, or Braintree. If these services slow down or fail then its impossible for checkout transactions to complete. You therefore need to monitor these services religiously so when problems occur you can rapidly identify whether it is your code or connectivity or someone else’s outage. Here is example of how AppDynamics can help you monitor your 3rd party web services:

Transaction Flow

Screen Shot 2013-10-14 at 2.59.50 PM

8. Crap Recursive Code

This is similar to #6 but burns time instead of resources. For example, many e-commerce transactions will request data from multiple sources (caches, databases, web services) at the same time. Every one of these round trips could be expensive and may involve network time along the way. I’ve seen a single eCommerce search transaction call the same database multiple times instead of performing a single operation using a stored procedure on the database. Recursive remote calls may only take 10-50 millisecond each, but if they are invoked multiple times per transaction they can add seconds to your end user experience. For example, here is that search transaction that took x seconds and made 13,000 database calls.

Screen Shot 2013-10-14 at 3.00.36 PM

9. Configuration Change

As much as we’d like to think that production environments are “locked down” with change control process, they are not. Accidents happen, humans make mistakes and hotfixes occasionally get applied in a hurry at 2am 😉 Application server configuration can be sensitive just like networks, or any other pieces of the infrastructure. Being able to audit, report and compare configuration change across your application gives you instant intelligence that a change may have caused your eCommerce application to break. For example, AppDynamics can record any application server change and show you the time and values that were updated to help you correlate change with slowdowns and outages, see below screenshot.

Screen Shot 2013-10-14 at 3.01.02 PM

10. Out of Stock Exception

“I’m sorry, the product you requested is no longer in stock”. This basically means you were too slow and you’ll need to wait until 2014 for the same offer. Remember to set an alarm next year for Black Friday 😉

out_stock_en

In addition, AppDynamics can also monitor the revenue and performance of your checkout transactions over-time which helps Dev and Ops teams monitor and alert on the health of the business:

 Correlating revenue and performance

The good news is that AppDynamics Pro can identify all of the above defects in minutes. You can take a free trial here and be deployed in production in under 30 minutes! If you send us a few screenshots of your findings in production like the above we’ll send you a $250 Amazon gift certificate for your hard work!

Steve.