Slow SQL – A Usual Suspect

Ask any developer to name a few common performance bottlenecks and 99.9% of them will say “the database” as their first answer. Ask a DBA the same question and 99.9% of them will respond “crappy application code.” If we put aside the mutual love between each party, it’s actually worth looking at why they arrive at different conclusions.

First of all, it’s very easy to make any application or SQL query perform fast in a dev environment, especially on a developer’s desktop. Why? The two things that impact query performance are data volume and user/transaction concurrency, both of which typically don’t exist in dev environments when unit tests pass. Computing power these days is also silly fast and cheap with desktop cpu and memory helping any dev environment mask potential performance bottlenecks.

What’s Under Your Hood? APM for the Non Java Guru: ORMs & Slow SQL

It’s time for another update to our series on Top Application Performance Challenges. Last time I looked at Java’s synchronization mechanism as a source for performance problems. This time around I take on what is likely the Performance Engineer’s bread and butter … slow database access!

Behind this small statement lies a tricky and multifaceted discussion. For now, I’m going to focus on just one particular aspect – the Object Relational Mapper (ORM).

The ORM has become a method of choice for bringing together the two foundational technologies that we base business applications on today – object-oriented applications (Java, .NET) and relational databases (Oracle, mySQL, PostgreSQL, etc.). For many developers, this technology can seem like a godsend, eliminating the need for them to drill-down into the intricacies of how these two technologies interact. But at the same time, ORMs can place an additional burden on applications, significantly impacting performance while everything looks fine on the surface.

Here’s my two cents on ORMs and why developers should take a longer look under the hood:

In the majority of cases, the time and resources taken to retrieve data are orders of magnitude greater than what’s required to process it. It is no surprise that performance considerations should always include the means and ways of accessing and storing data.

I already mentioned the two major technology foundations on which we build business applications today, object oriented applications, used to model and execute the business logic, and relational databases, used to manage and store the data. Object oriented programs unite data and logic into object instances, relational databases on the other hand isolate data into columns and tables, joined by keys and indexes.

This leaves a pretty big gap to bridge, and it falls upon the application to do the legwork. Since bridging this gap is something many applications must do, enter the ORM as a convenient, reusable framework. Hibernate, for instance, is quite likely the most popular ORM out there.

While intuitive for an application developer to use (ORMs do hide the translation complexities) an ORM can also be a significant weight on an application’s performance.

Let me explain.

Take a Call Graph from AppDynamics and follow the execution path of a transaction, method by method, from the moment a user request hits the application until calls to the database are issued to retrieve the requested information. Then size up the layers of code this path has to go through to get to the data. If your application has implemented an ORM like Hibernate, I assure you’ll be surprised how much stuff is actually going on in there.

No finger-pointing intended. A developer will emphasize that the benefits of just using the ORM component (without having to understand how its working) greatly increases his productivity. Point taken. It does pay, however, to review an ORM’s data access strategy.

I recently worked with a company and saw transactions (single user requests) that each bombard the database with 3000+ distinct calls. Seems a little excessive? You would be right. More over, nobody knew that this was happening and certainly nobody intended for it to be so.

In many cases simple configuration settings or using a different ‘fetch’ method offered by the ORM itself can affect performance significantly. Whether for instance the ORM accesses each row of the customer table individually to fill an array of customers in your code or is actually constructing a query that encompasses all of the expected result-set and gets them in one fell swoop does make a big difference.

Seems obvious, right? But do you actually know what your ORM really does when retrieving the information?

You might be surprised.