Common Application Problems and How to Fix Them: The Select N + 1 Problem

At AppDynamics we get the opportunity to see the inner workings of a lot of applications. While these applications all seem pretty different to their end users, what’s under the hood usually doesn’t vary that much (sorry, your app isn’t a unique snowflake after all). They all have similar service oriented architectures using a variety of databases, caches, and queues. This holds true for the performance issues these apps experience and the antipatterns that cause them. The same problems show up in high-speed trading applications, e-commerce sites, mobile apps and online games – so we thought we’d put together some of the most common performance problems in a blog series to show you how to find, fix and prevent them. In this blog post we’ll take a look at a pretty common problem that can be tricky to detect in a large application: the Select N + 1 problem.

What is it?

The N + 1 problem is a performance anti-pattern in which an application makes N + 1 database calls (where N is the number of objects fetched). Like most antipatterns, this isn’t necessarily a problem in itself, but under certain circumstances (where N is large, for example) it will cause performance to degrade by making hundreds or even thousands of database calls for a single business transaction.

In plain English: You’re spamming your database with really small, fast queries instead of using one or two more complex ones.

Here’s how it usually goes down: You have two database tables with a parent/child relationship (like blogs and posts, or products and line items), and you want to iterate through all of them. So you do this:

SELECT id FROM Parent

and then executing a query for each record:

SELECT * FROM Child WHERE parent_id = ?

This isn’t necessarily a bad way of doing it, especially if there aren’t many parents or children. But what if you’re a giant e-commerce company with thousands of products and line items? Suddenly each transaction is calling the database thousands of times. Even if each database call is super fast, the cumulative response time of that transaction will be seconds (if not longer). Which is not an ideal situation. The problem only becomes worse with increase in traffic.

How to find it with AppDynamics

Like I said, this isn’t always a problem, but when it is it can be hard to find. The database will probably look normal to the DBA – they won’t see any long-running queries, and CPU may look fine under normal load. The best way to find a problem like this is using a performance monitoring solution that allows you to drill into a particular Business Transaction (user request) and see what code is executed and how it is accessing the database.

Here’s an example from an AppDynamics customer:

The best part about AppDynamics is its request snapshots. They not only allow us to troubleshoot performance problems faster – they also allow us to perfect his code. “AppDynamics allows us to see what is going on and identify the issues that could be refactored and made faster,” he said. “It lets us bridge the gap between anecdotes from users and actual, actionable information.” – Cornell University

AppDynamics Database

How to fix it

This problem happens most often when you’re using a persistence engine or an object/relational mapper (ORM) like Hibernate and you’re using lazy loading. Be sure to understand the defaults for any Object-relational mapper before you begin using them.

If you’re just writing raw SQL, you may want to fetch all your data at once and then join the two sets of records, like this:


SELECT p.id, p.name FROM Parent p
LEFT OUTER JOIN child c ON p.id = c.parent_id
INNER JOIN grandchild g ON c.id = g.parent_id

Optimizing database access

Many times how an application access the database will be a focus of optimization. Stay tuned for our next blog post in the series we will discuss the importance of caching database access. If you enjoyed this blog post, checkout our ebook on java performance problems.

Find out more about AppDynamics Pro and get started optimizing your application with a free 15 day trial.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Thoughts? Let us know on Twitter @AppDynamics!

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

Introducing AppDynamics for PHP

PHP Logo

It’s been about 12 years since I last scripted in PHP. I pretty much paid my way through college building PHP websites for small companies that wanted a web presence. Back then PHP was the perfect choice, because nearly all the internet service providers had PHP support for free if you registered domain names with them. Java and .NET wasn’t an option for a poor smelly student like me, so I just wrote standard HTML with embedded scriplets of PHP code and bingo–I had dynamic web pages.

Today, 244 million websites run on PHP which is almost 75% of the web. That’s a pretty scary statistic. If only I’d kept coding PHP back when I was 21, I’d be a billionaire by now! PHP is a pretty good example of how open-source technology can go viral and infect millions of developers and organizations world-wide.

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

Covis Software GmbH isolates problematic Web Service in 4 minutes with AppDynamics Pro

I received another X-Ray from Covis Software GmbH in Germany who provides CRM solutions. They’ve been managing their .NET application performance with AppDynamics Pro for several months now in production. In the below X-Ray (as documented by the customer), Covis were able to rapidly identify a poor performing remote web service call which was impacting their application, business transactions and end user experience of their customers.

If you would like to get started with AppDynamics you can download AppDynamics Lite (our free version) or you can take a free 30-day trial of AppDynamics Pro.

Covis Software GmbH Speeds up CRM Application by 33x with AppDynamics

I received an X-Ray from Covis Software GmbH in Germany who provides CRM solutions. They’ve been managing their .NET application performance with AppDynamics Pro for several months now in production. In the below X-Ray (as documented by the customer), Covis were able to improve the performance of a mission-critical business transaction from over 10 seconds to around 300 milliseconds, representing a 33x improvement. The business impact of such change meant average call time for some CRM agents dropped by almost 10 seconds.

If you would like to get started with AppDynamics you can download AppDynamics Lite (our free version) or you can take a free 30-day trial of AppDynamics Pro.

App Man.

Finding the Root Cause of Application Performance Issues in Production

The most enjoyable part of my job at AppDynamics is to witness and evangelize customer success. What’s slightly strange is that for this to happen, an application has to slow down or crash.

It’s a bittersweet feeling when End Users, Operations, Developers and many Businesses suffer application performance pain. Outages cost the business money, but sometimes they cost people their jobs–which is truly unfortunate. However, when people solve performance issues, they become overnight heroes with a great sense of achievement, pride, and obviously relief.

To explain the complexity of managing application performance, imagine your application is 100 haystacks that represent tiers, and somewhere a needle is hurting your end user experience. It’s your job to find the needle as quickly as possible! The problem is, each haystack has over half a million pieces of hay, and they each represent lines of code in your application. It’s therefore no surprise that organizations can take days or weeks to find the root cause of performance issues in large, complex, distributed production environments.

End User Experience Monitoring, Application Mapping and Transaction profiling will help you identify unhappy users, slow business transactions, and problematic haystacks (tiers) in your application, but they won’t find needles. To do this, you’ll need x-ray visibility inside haystacks to see which pieces of hay (lines of code) are holding the needle (root cause) that is hurting your end users. This X-Ray visibility is known as “Deep Diagnostics” in application monitoring terms, and it represents the difference between isolating performance issues and resolving them.

For example, AppDynamics has great End User Monitoring, Business Transaction Monitoring, Application Flow Maps and very cool analytics all integrated into a single product. They all look and sound great (honestly they do), but they only identify and isolate performance issues to an application tier. This is largely what Business Transaction Management (BTM) and Network Performance Management (NPM) solutions do today. They’ll tell you what and where a business transaction slows down, but they won’t tell you the root cause so you can resolve the issues.

Why Deep Diagnostics for Production Monitoring Matters

A key reason why AppDynamics has become very successful in just a few years is because our Deep Diagnostics, behavioral learning, and analytics technology is 18 months ahead of the nearest vendor. A bold claim? Perhaps, but it’s backed up by bold customer case studies such as Edmunds.com and Karavel, who compared us against some of the top vendors in the application performance management (APM) market in 2011. Yes, End User Monitoring, Application Mapping and Transaction Profiling are important–but these capabilities will only help you isolate performance pain, not resolve it.

AppDynamics has the ability to instantly show the complete code execution and timing of slow user requests or business transactions for any Java or .NET application, in production, with incredibly small overhead and no configuration. We basically give customers a metal detector and X-Ray vision to help them find needles in haystacks. Locating the exact line of code responsible for a performance issue means Operations and Developers solve business pain faster, and this is a key reason why AppDynamics technology is disrupting the market.

Below is a small collection of needles that customers found using AppDynamics in production. The simple fact is that complete code visibility allows customers to troubleshoot in minutes as opposed to days and weeks. Monitoring with blind spots and configuring instrumentation are a thing of the past with AppDynamics.

Needle #1 – Slow SQL Statement

Industry: Education
Pain: Key Business Transaction with 5 sec response times
Root Cause: Slow JDBC query with full-table scan

Needle #2 – Slice of Death in Cassandra

Industry: SaaS Provider
Pain: Key Business Transaction with 2.5 sec response times
Root Cause: Slow Thrift query in Cassandra

Needle #3 – Slow & Chatty Web Service Calls

Industry: Media
Pain: Several Business Transactions with 2.5 min response times
Root Cause: Excessive Web Service Invocation (5+ per trx)

Needle #4 -Extreme XML processing

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 17 sec response times
Root Cause: XML serialization over the wire.

Needle #5 – Mail Server Connectivity

Industry: Retail/E-Commerce
Pain: Key Business Transaction with 20 sec response times
Root Cause: Slow Mail Server Connectivity

 Needle #6 – Slow ResultSet Iteration

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 30+ sec response times
Root Cause: Querying too much data

Needle #7 – Slow Security 3rd Party Framework

Industry: Education
Pain: All Business Transactions with > 3 sec response times
Root Cause: Slow 3rd party code

Needle #8 – Excessive SQL Queries

Industry: Education
Pain: Key Business Transactions with 2 min response times
Root Cause: Thousands of SQL queries per transaction

Needle #9 – Commit Happy

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 25+ sec response times
Root Cause: Unnecessary use of commits and transaction management.

Needle #10 – Locking under Concurrency

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 5+ sec response times
Root Cause: Non-Thread safe cache forces locking for read/write consistency

 Needle #11 – Slow 3rd Party Search Service

Industry: SaaS Provider
Pain: Key Business Transaction with 2+ min response times
Root Cause: Slow 3rd Party code

 Needle #12 – Connection Pool Exhaustion

Industry: Financial Services
Pain: Several Business Transactions with 7+ sec response times
Root Cause: DB Connection Pool Exhaustion caused by excessive connection pool invocation & queries

Needle #13 – Excessive Cache Usage

Industry: Retail/E-Commerce
Pain: Several Business Transactions with 50+ sec response times
Root Cause: Cache Sizing & Configuration

If you want to manage and troubleshoot application performance in production, you should seriously consider AppDynamics. We’re the fastest growing on-premise and SaaS based APM vendor in the market right now. You can download our free product AppDynamics Lite or take a free 30-day trial of AppDynamics Pro – our commercial product.

Now go find those needles that are hurting your end users!

App Man.

Online media company gets proactive with application monitoring in production

On Wednesday I delivered a keynote at WJAX in Munich. Everything went really well, but I was a little shocked at the response I got when I asked the audience “How many of you monitor the performance of your apps in production?” As I scanned the audience, I counted 9 out of ~950 developers had put their hands up, meaning about 1% had visibility of how their applications actually performed in production. I know what you’re thinking: “But isn’t application performance in production the responsibility of Operations?”  Well, it is and it isn’t. Most organizations think that when an application has an issue, it’s related to the infrastructure it runs on. That’s like saying when a car crashes, it’s because a part failed on the car whereas in actual fact most accidents are caused by the driver. Yes, hardware fails occasionally, but application logic and configuration drives how infrastructure resource is used, which is why most issues today occur when new code is deployed in production.