A Real Example of the Database to Storage Performance Relationship

Most enterprise databases today run on shared storage volumes (SAN, NAS, etc…) that are accessed over the network or via Fibre Channel connection. The shared storage concept is great for helping to keep storage infrastructure and management costs relatively low but creates cross silo finger pointing when there are performance issues. In this blog post we will explore a real world example of how to avoid finger pointing and get right down to figuring out how to fix the problem.

One Rotten Apple Can Ruin The Whole Bunch

This story dates back to June of 2012 but I just came across it so it is new to me. One of our customers had an event which impacted the performance of multiple databases. All of these databases were connected to the same NetApp storage array. Often when there is an issue with database performance the DBAs will point the finger at the storage team and the storage team will tell the DBA team that everything looks good on their side. This finger pointing between silo’s is a common occurrence between various groups (network, storage, database, application support, etc…) within enterprise organizations.

In the chart below (screen grab taken from AppDynamics for Databases) you can see that there was a significant increase in I/O activity on dw_logvol. This issue impacted the performance of the entire NetApp storage array.

NetApp Storage Issue

As it turns out dw_logvol was used as a temporary storage location for web logs. There was a process that would copy log files to this location, decompress them, and insert them into an Oracle data warehouse for long term storage. This process normally would not impact the performance of anything else connected to the same storage array but in this case there happened to be corrupted log files that could not be properly decompressed. This resulted in multiple attempts to retransmit and decompress the same files.

Context and Collaboration to the Rescue

Storage teams normally don’t have access to application context and application teams normally don’t have access to storage metrics. In this case though, both teams were able to collaborate and quickly realize what the problem was as a result of having a monitoring solution that was available to everyone. The fix for this problem was really easy, just remove the corrupted files and replace them with versions without any corruption. You can see activity return to normal in the chart below.

NetApp Storage Issue After

Modern application architectures require collaboration across all silo’s in order to identify and fix issues in a timely manner. One of the key enablers of cross-silo collaboration is intelligent monitoring at each layer of the application and the infrastructure components that provide the underlying resources. AppDynamics provides end-to-end visibility in an analytics based solution that help you identify, isolate and remediate issues. Try AppDynamics for Databases and Storage for free today and bring a new level of collaboration to your organization.

Storage is Killing Your Database Performance

The other day I had the opportunity to speak with a good friend of mine who also happens to be a DBA at a global Financial Services company. We were discussing database performance and I was surprised when he told me that the most common cause of database performance issues (from his experience) was a direct result of contention on shared storage arrays.

After recovering from my initial surprise I had an opportunity to really think things through and realized that this makes a lot of sense. Storage requirements in most companies are growing at an ever increasing pace (big data anyone?). Storage teams have to rack, stack, allocate, and configure new storage quickly to meet demand and don’t have the time to do a detailed analysis on the anticipated workload of every application that will connect to and use the storage. And therein lies the problem.

Workloads can be really unpredictable and can change considerably over time within a given application. Databases that once played nicely together on the same spindles can become the worst of enemies and sink the performance of multiple applications at the same time. So what can you do about it? How can you know for sure if your storage array is the cause of your application/database performance issues? Well, if you use NetApp storage then you’re in luck!

AppDynamics for Databases remotely connects (i.e. no agent required) to your NetApp controllers and collects the performance and configuration information that you need to identify the root cause of performance issues. Before we take a look at the features, let’s look at how it gets set up.

The Config

Step 1: Prepare the remote user ID and privileges on the NetApp controller. The following commands are used for the configuration.

useradmin role add AppD_Role -a api-*,login-http-admin
useradmin group add AppD_Group -r AppD_Role
useradmin user add appd -g AppD_Group

Note: Make sure you set a password for the appd user.

Step 2: Configure AppDynamics to monitor the NetApp controller. Notice that we configure AppDynamics with the the username and password created in step 1.

Screen Shot 2013-04-11 at 4.14.33 PM

 Step 3: Enjoy your awesome new monitoring (yep, it’s that easy).

The Result

After an incredibly difficult 2 minutes of configuration work we are ready for the payoff. In the AppDynamics for Databases main menu you will see a section for all of your NetApp agents.

Screen Shot 2013-04-11 at 4.09.13 PM

Let’s do a “drill-up” from the NetApp controller to our impacted database. Clicking into our monitored instance we see the following activity screen.

Screen Shot 2013-04-11 at 4.29.28 PM

By clicking on the purple latency line inside of the red box in the image above we can drill into the volume that has the highest response time. Notice in the scree grab below that we have a link at the bottom of the page where we can drill-up into the database that is attached to this storage volume. This relationship is built automatically by AppDynamics for Databases.

Screen Shot 2013-04-11 at 4.31.56 PM

Clicking on the “Launch In Context” link we are immediately transfered to the Oracle instance activity page shown below.

Screen Shot 2013-04-11 at 4.34.21 PM

In just the same manner as we can drill-up from storage to database, we can also drill-down from database to storage. Notice the screen grab below from an Oracle instance activity screen. Clicking on the “View NetApp Volume Activity” link will launch the NetApp activity screen shown earlier for the volumes associated with this Oracle instance. It’s that easy to switch between the views you need to solve your applications performance issues.

Imagine being able to detect an end user problem, drill down through the code execution, identify the slow SQL query, and isolate the storage volume that is causing the poor performance. That’s exactly what you can do with AppDynamics.

natappdrilldown

Storage monitoring in AppDynamics for Databases is another powerful feature that enables application support, database support, and storage support to get on the same page and restore service as quickly as possible. If you have databases connected to NetApp storage you need to take a free trial of AppDynamics for Databases today.

Here’s a short video demonstration of AppDynamics for NetApp…