Insights from an Investment Banking Monitoring Architect

To put it very simply, Financial Services companies have a unique set of challenges that they have to deal with every day. They are a high priority target for hackers, they are highly regulated by federal and state governments, they deal with and employ some of the most demanding people on the planet, problems with their applications can have an impact on every other industry across the globe. I know this from first hand experience; I was an Architect at a major investment bank for over 5 years.

In this blog post I’m going to show you what’s really important when Financial Services companies consider application monitoring solutions and warn you about the hidden realities that only expose themselves after you’ve installed a large enough monitoring footprint.

1 – Product Architecture Plays a Major Role in Long Term Success or Failure

Every monitoring tool has a different core architecture. On the surface these architectures may look similar but it is imperative to dive deeper into the details of how all monitoring products work. We’ll use two real product architectures as examples.

Monitoring Architecture“APM Solution A” is an agent based solution. This means that a piece of vendor code is deployed to gather monitoring information from your running applications. This agent is intelligent and knows exactly what to monitor, how to monitor, and when to dial itself back to do no harm. The agent sends data back to central collector (called a controller) where this data is correlated, analyzed, and categorized automatically to provide actionable intelligence to the user. With this architecture the agent and the controller are very loosely coupled which lends itself well to highly distributed, virtualized environments like you see in modern application architectures.

“APM Solution B” is also agent based. They have a 3 tiered architecture which consists of agents, collectors, and servers. On the surface this architecture seems reasonable but when we look at the details a different story emerges. The agent is not intelligent therefore it does not know how to instrument an application. This means that every time an application is re-started, the agent must send all of the methods to the collector so that the collector can tell the agent how and what to instrument. This places a large load on the network, delays application startup time, and adds to the amount of hardware required to run your monitoring tool. After the collector has told the agent what to monitor the collectors job is to gather the monitoring data from the agent and pass it back to the server where it is stored and viewed. For a single application this architecture may seem acceptable but you must consider the implications across a larger deployment.

Choosing a solution with the wrong product architecture will impact your ability to monitor and manage your applications in production. Production monitoring is a requirement for rapid identification, isolation and repair of problems.

2 – Monitoring Philosophy

Monitoring isn’t as straight forward as collecting, storing, and showing data. You could use that approach but it would not provide much value. When looking at monitoring tools it’s really important to understand the impact of monitoring philosophy on your overall project and goals. When I was looking at monitoring tools I needed to be able to solve problems fast and I didn’t want to spend all of my time managing the monitoring tools. Let’s use examples to illustrate again.

Application Monitoring Philosophy“APM Solution A” monitors every business transaction flowing through whatever application it is monitoring. Whenever any business transaction has a problem (slow or error) it automatically collects all of the data (deep diagnostics) you need to figure out what caused the problem. This, combined with periodic deep diagnostic sessions at regular intervals, allows you to solve problems while keeping network, storage, and CPU overhead low. It also keeps clutter down (as compared to collecting everything all the time) so that you solve problems as fast as possible.

“APM Solution B” also monitors every transaction for each monitored application but collects deep diagnostic data for all transactions all the time. This monitoring philosophy greatly increases network, storage, and CPU overhead while providing massive amounts of data to work with regardless of whether or not there are application problems.

When I was actively using monitoring tools in the Investment Bank I never looked at deep diagnostic data unless I was working on resolving a problem.

3 – Analytics Approach

Analytics comes in many shapes and sizes these days. Regardless of the business or technical application, analytics does what humans could never do. It creates actionable intelligence from massive amounts of data and allows us to solve problems much faster than ever before. Part of my process for evaluating monitoring solutions has always been determining just how much extra help each tool would provide in identifying and isolating (root cause) application problems using analytics. Back to my example…

“APM Solution A” is an analytics product at it’s core. Every business transaction is analyzed to create a picture of “normal” response time (a baseline). When new business transactions deviate from this baseline they are automatically classified as either slow or very slow and deep diagnostic information is collected, stored, and analyzed to help identify and isolate the root cause. Static thresholds can be set for alerting but by default, alerts are based upon deviation from normal so that you can proactively identify service degradation instead of waiting for small problems to become major business impact.

“APM Solution B” only provides baselines for the business transactions you have specified. You have to manually configure the business transactions for each application. Again, on small scale this methodology is usable but quickly becomes a problem when managing the configuration of 10’s, 100’s, or 1000’s of applications that keep changing as development continues. Searching through a large set of data for a problem is much slower without the assistance of analytics.

Monitoring Analytics

4 – Vendor Focus

When you purchase software from a vendor you are also committing to working with that vendor. I always evaluated how responsive every vendor was during the pre-sales phase but it was hard to get a good measure of what the relationship would be like after the sale. No matter how good the developers are, there are going to be issues with software products. What matters the most is the response you get from the vendor after you have made the purchase.

5 – Ease of Use

This might seem obvious but ease of use is a major factor in software delivering a solid return on investment or becoming shelf-ware. Modern APM software is powerful AND easy to use at the same time. One of the worst mistakes I made as an Architect was not paying enough attention to ease of use during product evaluation and selection. If only a few people in a company are capable of using a product then it will never reach it’s full potential and that is exactly what happened with one of the products I selected. Two weeks after training a team on product usage, almost nobody remembered how to use it. That is a major issue with legacy products.

Enterprise software is undergoing a major disruption. If you already have monitoring tools in place, now is the right time to explore the marketplace and see how your environment can benefit from modern tools. If you don’t have any APM software in place yet you need catch up to your competition since most of them are already looking or have already implemented APM for their critical applications. Either way, you can get started today with a free trial of AppDynamics.