APM vs NPM. Round One.

Another three-letter acronym I see frequently mixed in with APM is NPM which stands for Network Performance Management. At first glance they look very similar. The distinction appears very subtle with just a one letter difference, but it speaks volumes because their core technologies and approaches to monitoring application performance are fundamentally different.

Application Performance Management tools typically use agents that live in the application run-time which capture performance details of how application logic computes across the application infrastructure.

In contrast, Network Performance Monitoring tools are agent-less appliances that sit on the network analyzing content and traffic by capturing packets flowing through the network. They can measure response times and find errors for your applications by understanding the wire protocols across the application tiers. Like agent based APM solutions, they can automatically discover your application topology, but NPM tools lack deep code-level diagnostics which is paramount to helping App Ops and Dev teams solve problems. Here’s why.

Imagine your application’s architecture is FedEx providing package delivery services to it’s customers in a timely manner. If you were to apply the concept of network monitoring to this analogy, using an NPM tool would help track how long it takes for packages to travel in transit from one hub to the next, examining it’s contents and the expected arrival time to it’s final destination.

Now let’s say that a package was being sent from Paris to Los Angeles but hit a snag at the London hub along the way. Operations at London came to a screeching hault for some inexplicable reason and your task is to figure out why and how to fix it ASAP so operations are running smooth again. In this case, APM is the favorable solution over NPM here. APM tools not only empower you with the ability to see how long it takes for your package to travel to different locations, but also how the package is processed at the facility and why operations came to a standstill. At this juncture, you probably wouldn’t even care to analyze the package’s contents since you already know what’s inside and it’s still in London.

When applying this concept to the world of business applications, NPM tools won’t provide you with visibility into application logic inefficiencies, especially when experiencing a performance issue or service level breach to key business transactions. NPM performance metrics are derived from captured packets and the network protocols the servers support, but if you’re making inefficient, iterative business logic calls in your code, how will capturing and analyzing packets help you triage the problem? Its the same story for identifying the root cause of memory leaks or thread synchronization issues; execution and visibility like this occurs at the application layer, not at the network layer. Thus, I would argue that NPM compliments APM but doesn’t serve as a viable replacement.

Now before I go enumerating my theories as to why I see this trend, let me just say while I respect their hustle, nice try but not quite. While scouring the web I found many NPM companies trying to crash the popular APM party by marketing their network performance monitoring utilities as an application monitoring and management solution. It can be confusing for a first time buyer trying to evaluate APM tools when you see NPM companies using phrases like, “network-based APM” or “gather data already on your network” in their product messaging.

One of the reasons for this messaging is because NPM is an immensely crowded market. Here’s a list of NPM tools compiled by some IT folks at Stanford. There’s over 300 network monitoring tools, and that’s only going back five years to 2007!! Take your pick. Second, applications and the tools to monitor their performance and availability speaks to customers at a higher level from a technical and business perspective. If I’m in charge of some transaction intensive application, my customers are directly interfacing with the Application Layer in order to complete business transactions which takes place at the highest layer on the stack. Customers aren’t 5-6 levels deep at the Network Layer interacting with packets or the routers and switches they pass through.

By the way, have you seen Gartner’s APM spend analysis? It’s a $2 billion market and growing. This most certainly is a valid reason why NPM solutions are now positioning themselves as an APM solution to customers.

If you have absolutely nothing to monitor any part of your applications infrastructure, then something is better than nothing. However, at some point you’ll need better visibility at the application layer regardless how fast packets are traveling through the pipes. NPM won’t necessarily provide the range of visibility your App Ops and Dev teams need to diagnose, troubleshoot, and idenfity problem root causes.

Those who live day in and day out the networking world will naturally gravitate to what they’re familiar with, but if you haven’t ventured out into the evolving APM market, I highly recommend it. Even though you can achieve some level of performance monitoring with NPM, you’ll find yourself running into limitations in application visibility pretty quick. But as the old saying goes, “If all you have is a hammer, then everything looks like a nail.”

I’m curious to know what are your thoughts are. Which class of tools would you choose? Why?


Troubleshooting OutOfMemory Exceptions and Memory Leaks in Production

Many root causes ago I was working with a customer who suspected they had a memory leak in production. Their JVM console event logs were showing the famous OutOfMemory exception and these were being thrown periodically every three to four days causing production outages. To stop these exceptions, the operations team would restart all JVMs at midnight every night in order to prevent system wide impact to customers during business hours. And if Ops forgot to restart the JVMs (which they did on several occasions), production went bang.

It’s worth pointing out at this stage that “OutOfMemory” exceptions in log files doesn’t automatically mean your application has a memory leak. It simply means your application is using or needs more memory than you’ve allocated to it at run-time. A leak is just one candidate of several potential candidates that cause memory to grow over time until all resource is exhausted.

A Holiday Greeting in Verse

‘Twas nearly 2011, a brand-new year
But app owners stared sadly at their beer.

They had a problem, and it seemed just enormous
They craved a surefire solution to application performance.

APM for the Non-Java Guru: What leak?

Memory, Memory, Memory…


Memory is a critical part of Java, in particular, the management of memory. As a developer, memory management is not something you want to be doing on a regular basis, nor is it something you want to do manually. One of the great benefits of Java is its ability to take care of the memory model for you. When objects aren’t in use, Java helps you out by doing the clean up.