A Newbie Guide to APM

Today’s blog post is headed back to the basics. I’ve been using and talking about APM tools for so many years sometimes it’s hard to remember that feeling of not knowing the associated terms and concepts. So for anyone who is looking to learn about APM, this blog is for you.

What does the term APM stand for?

APM is an acronym for Application Performance Management. You’ll also hear the term Application Performance Monitoring used interchangeably and that is just fine. Some will debate the details of monitoring versus management and in reality there is an important difference but from a terminology perspective it’s a bit nit-picky.

What’s the difference between monitoring and management?

Monitoring is a term used when you are collecting data and presenting it to the end user. Management is when you have the ability to take action on your monitored systems. Management tasks can include restarting components, making configuration changes, collecting more information through the execution of scripts, etc… If you want to read more about the management functionality in APM tools click here.

What is APM?

There is a lot of confusion about the term APM. Most of this confusion is caused by software vendors trying to convince people that their software is useful for monitoring applications. In an effort to create a standard definition for grouping software products, Gartner introduced a definition that we will review here.

Gartner lists five key dimensions of APM in their terms glossary found here… http://www.gartner.com/it-glossary/application-performance-monitoring-apm

End user experience monitoringEUM and RUM are the common acronyms for this dimension of monitoring. This type of monitoring provides information about the response times and errors end users are seeing on their device (mobile, browser, etc…). This information is very useful for identifying compatibility issues (website doesn’t work properly with IE8), regional issues (users in northern California are seeing slow response times), and issues with certain pages and functions (the Javascript is throwing an error on the search page).

prod-meuem_a-960x0 (2)

Screen_Shot_2014-08-04_at_4.29.55_PM-960x0 (2)

Runtime application architecture discovery modeling and display – This is a graphical representation of the components in an application or group of applications that communicate with each other to deliver business functionality. APM tools should automatically discover these relationships and update the graphical representation as soon as anything changes. This graphical view is a great starting point for understanding how applications have been deployed and for identifying and troubleshooting problems.

Screen_Shot_2014-07-17_at_3.42.47_PM-960x0 (3)

User-defined transaction profiling – This is functionality that tracks the user activity within your applications across all of the components that service those transactions. A common term associated with transaction profiling is business transactions (BT’s). A BT is very different from a web page. Here’s an example… As a user of a website I go to the login page, type in my username and password, then hit the submit button. As soon as I hit submit a BT is started on the application servers. The app servers may communicate with many different components (LDAP, Database, message queue, etc…) in order to authenticate my credentials. All of this activity is tracked and measured and associated with a single “login” BT. This is a very important concept in APM and is shown in the screenshots below.

Screen_Shot_2014-08-04_at_1.30.15_PM-960x0

Component deep-dive monitoring in application context – Deep dive monitoring is when you record and measure the internal workings of application components. For application servers, this would entail recording the call stack of code execution and the timing associated with each method. For a database server this would entail recording all of the SQL queries, stored procedure executions, and database statistics. This information is used to troubleshoot complex code issues that are responsible for poor performance or errors.

Screen_Shot_2014-08-07_at_11.08.00_AM-960x0

Analytics – This term leaves a lot to be desired since it can be and often is very liberally interpreted. To me, analytics (in the context of APM) means baselining, and correlating data to provide actionable information. To others analytics can be as basic as providing reporting capabilities that simply format the raw data in a more consumable manner. I think analytics should help identify and solve problems and be more than just reporting but that is my personal opinion.

business-impact-analytics2-1-960x0

performance-analytics-960x0

Do I need APM?

APM tools have many use cases. If you provide support for application components or the infrastructure components that service the applications then APM is an invaluable tool for your job. If you are a developer the absolutely yes, APM fits right in with the entire software development lifecycle. If your company is adopting a DevOps philosophy, APM is a tool that is collaborative at it’s core and enables developers and operations staff to work more effectively. Companies that are using APM tools consider them a competitive advantage because they resolve problems faster, solve more issues over time, and provide meaningful business insight.

How can I get started with APM?

First off you need an application to monitor. Assuming you have access to one, you can try AppDynamics for free. If you want to understand more about the process used in most companies to purchase APM tools you can read about it by clicking here.

Hopefully this introduction has provided you with a foundation for starting an APM journey. If there are more related topics that you want me to write about please let me know in the comments section below.

Check out our complementary ebook, Top 10 Java Performance Problems!

Taking Advantage of What the Cloud Has to Offer

Moving your application to the cloud isn’t as simple as porting your code over and configurations to someone else’s infrastructure – nor should it be. Cloud computing represents a shift in the world of application architecture from vertical scalability to horizontal scalability. This new paradigm offers organizations the opportunity to build highly scalable and dynamic applications. However, if you’re not careful or purposeful in how you prepare for the cloud, your application could suffer.

Horizontal vs. Vertical Scalability

The biggest fundamental difference between the cloud and your data center is  the cloud is typically run on commodity hardware, rather than the powerful machines in your data center. This means having to write an application that’s horizontally scalable instead of vertically scalable. Google probably best described what this means for your application architecture in their blog on highscalability.com:

A 1,000-fold computer power increase can be had for a 33 times lower cost if you use a failure-prone infrastructure rather than an infrastructure built on highly reliable components. You must build reliability on top of unreliability for this strategy to work.

In other words, it can be cost effective to run an application on cheaper, less reliable commodity servers instead of more expensive and powerful machines. But in order to be successful, the software component needs to be highly scalable – even infinitely scalable – and resistant to failure. These two requirements guide many of the architectural decisions of cloud pioneers like Netflix, and give a good indicator of what’s required to be successful in the cloud from a performance standpoint.

Architecting for the Cloud

Very few organizations have the same requirements from their applications that Netflix does. With tens of thousands of nodes in the Amazon EC2, Netflix is undoubtedly a pioneer in cloud architecture. Even though most organizations will never need the scale that Netflix does, these architectural practices and strategies are relevant to anyone building in or migrating to the cloud.

Here are a few of the ways Netflix takes advantage of the cloud, from a presentation by Netflix’s Director of Cloud Solutions, Ariel Tseitlin.

Service Oriented Architecture

The easiest way to accomplish horizontal scalability is with a service-oriented architecture. This is already pretty commonplace, but it’s especially important in the cloud, where you pay for the resources you consume – as your application scales, you can scale out only the services you need, which is more efficient (and cheaper) than scaling everything across the board. In addition, service-oriented architectures help manage concurrency. The following two diagrams demonstrate this point.

Fig. 1: The environment under normal load

Fig. 2: The environment after load changes

Auto-scaling

In the data center, scaling your application is expensive and time-consuming. In the cloud, it’s easy and (relatively) cheap. Netflix takes advantage of this, scaling their application up during the evenings when load increases and back down when peak viewing hours are over. Anyone with very dynamic load can take advantage of auto-scaling, which allows you to be cost-effective in the cloud without sacrificing performance.

Planning for Failure

One of the techniques Netflix is most famous for is simulating failure with its Simian Army. While this might not be a feasible approach for everyone, planning for failure is important for any cloud-based application – this is what Google is referring to when it talks about building “reliability on top of unreliability.” Your application needs to be able to survive failure at multiple levels – an individual node, a cluster, or perhaps even more.

Server instances can come and go at the drop of a hat, so they cannot store any state. Instead, Netflix groups server instances together into “clusters” and considers the behavior of the cluster as a whole.

These are just a few of the best practices for building a highly scalable cloud application. In order to take advantage of what the cloud has to offer, you need to rethink the architecture of your entire application. This also means you need to rethink your approach to managing the performance of your application in order to ensure the availability and performance of your application in the cloud.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

 

APM vs aaNPM – Cutting Through the Marketing BS

Marketing_BSMarketing; mixed feelings! I’m really looking forward to the new super bowl ads (some are pure marketing genius) in a few weeks but I really dislike all of the confusion that marketing tends to create in the technology world. In todays blog post I’m going to attempt to cut through all of the marketing BS and clearly articulate the differences between APM (Application Performance Management) and aaNPM (Application Aware Network Performance Management). I’m not going to try to convince you that you need one instead of the other. This isn’t a sales pitch, it’s an education session.

Definitions

APM – Gartner has been hard at work trying to clearly define the APM space. According to Gartner, APM tools should contain the following 5 dimensions:

  • End-user experience monitoring (EUM) – (How is the application performing according to what the end user sees)
  • Runtime application architecture discovery modeling and display (A view of the logical application flow at any given point in time)
  • User-defined transaction profiling (Tracking user activity across the entire application)
  • Component deep-dive monitoring in application context (Application code execution, sql query execution, message queue behavior, etc…)
  • Analytics

aaNPM – Again, Gartner has created a definition for this market segment which you can read about here

“These solutions allow passive packet capture of network traffic and must include the following features, in addition to packet capture technology:

  • Receive and process one or more of these flow-based data sources: NetFlow, sFlow and Internet Protocol Flow Information Export (IPFIX).
  • Provide roll-ups and dashboards of collected data into business-relevant views, consisting of application-centric performance displays.
  • Monitor performance in an always-on state, and generate alarms based on manual or automatically generated thresholds.
  • Offer protocol analysis capabilities to decode and understand multiple applications, including voice, video, HTTP and database protocols. The tool must provide end-user experience information for these applications.
  • Have the ability to decrypt encrypted traffic if the proper keys are provided to the solution.”

Reality

So what do these definitions mean in real life?

APM tools are typically used by application support, operations, and development teams to rapidly identify, isolate, and repair application issues. These teams usually have a high level understanding of how networks operate but not nearly the detailed knowledge required to resolve network related issues. They live and breathe application code, integration points, and component (server, OS, VM, JVM, etc…) metrics. They call in the network team when they think there is a network issue (hopefully based upon some high level network indicators). These teams usually have no control over the network and must follow the network teams process to get a physical device connected to the network. APM tools do not help you solve network problems.

aaNPM tools are network packet sniffers by definition. The majority of these tools (NetScout, Cisco, Fluke Networks, Network Instruments, etc.) are hardware appliances that you connect to your network. They need to be connected to each segment of the network where you want to collect packets or they must be fed filtered and aggregated packet streams by NPB (Network Packet Broker) devices. aaNPM tools contain a wealth of network level details in addition to some valuable application data (EUM metrics, transaction details, and application flows). aaNPM tools help network engineers solve network problems that are manifesting themselves as application problems. aaNPM tools are not capable of solving code issues as they have no visibility into application code.

If I were on a network support team I would want to understand if applications were being impacted by network issues so I could prioritize remediation properly and have a point of reference if I was called out by an application support team.

Network and Application Team Convergence

I’ve been asked if I see network and application teams converging in a similar way that dev and ops teams are converging as a result of the DevOps movement. I have not seen this with any of the companies I get to interact with. Based on my experience working in operations at various companies in the past, network teams and application support teams think in very different ways. Is it impossible for these teams to work in unison? No, but I see it as unlikely over the next few years at least.

But what about SDN (Software Defined Networking)? I think most network engineers see SDN as a threat to their livelihood whether that is true or not. No matter the case, SDN will take a long time to make it’s way into operational use and in the mean time network and application teams will remain separate.

I hope this was helpful in cutting through the marketing spin that many vendors are using to expand their reach. When it comes right down to it, your use case may require APM, aaNPM, a combination of both, or some other technology not discussed today. Technology is made to solve problems. I highly recommend starting out by defining your problems and then exploring the best solutions available to help solve your problem.

If you’ve decided to explore your APM tool options you can try AppDynamics for free by clicking here.

The Digital Enterprise – Problems and Solutions

According to a recent article featured in Wall Street and Technology, Financial Services (FS) companies have a problem. The article explains that FS companies built more datacenter capacity than they needed when profits were up and demand was rising. Now that profits are lower and demand has not risen as expected the data centers are partially empty and very costly to operate.

FS companies are starting to outsource their IT infrastructure and this brings a new problem to light…

“It will take a decade to complete the move to a digital enterprise, especially in financial services, because of the complexity of software and existing IT architecture. “Legacy data and applications are hard to move” to a third party, Bishop says, adding that a single application may touch and interact with numerous other applications. Removing one system from a datacenter may disrupt the entire ecosystem.”

Serious Problems

The article calls out a significant problem that FS companies are facing now and will be for the next decade but doesn’t mention a solution.

The problem is that you can’t just pick up an application and move it without impacting other applications. Based upon my experience working with FS applications I see multiple related problems:

  1. Disruption of other applications
  2. Baselining performance and availability before the move
  3. Ensuring performance and availability after the move

All of these problems increase risk and the chance that users will be impacted.

Solutions

1. Disruption of other applications – The solution to this problem is easy in theory and traditionally difficult in practice. The theory is that you need to understand all of the external interactions with application you want to move.

One solution is to use ADDM (Application Discovery and Dependency Mapping) tools that scan your infrastructure looking for application components and the various communications to and from them. This method works okay (I have used it in the past) but typically requires a lot of manual data manipulation after the fact to improve the accuracy of the discovered information.

ADDM1

ADDM product view of application dependencies.

Another solution is to use an APM (Application Performance Management) tool to gather the information from within the running application. The right APM tool will automatically see all application instances (even in a dynamically scaled environment) as well as all of the communications into and out of the monitored application.

Distributed Application View

APM visualization of an application and it’s components with remote service calls.

Remote Services 1

APM tool list of remote application calls with response times, throughput and errors.

 

A combination of these two types of tools would provide the ultimate in accurate and easy to consume information (APM strength) along with flexibility to cover all of the one off custom application processes that might not be supported by an APM tool (ADDM strength).

2. Baselining performance and availability before the move – It’s critically important to understand the performance characteristics of your application before you move. This will provide the baseline required for comparison sake after you make the move. The last thing you want to do is degrade application performance and user satisfaction by moving an application. The solution here is leveraging the APM tool referenced in solution #1. This is a core strength of APM and should be leveraged from multiple perspectives:

  1. Overall application throughput, response times, and availability
  2. Individual business transaction throughput and response times
  3. External dependency throughput and response times
  4. Application error rate and type
Application overview and baseline

Application overview with baseline information.

transactions and baselines

Business transaction overview and baseline information.

3. Ensuring performance and availability after the move – Now that your application has moved to an outsourcer it’s more important than ever to understand performance and availability. Invariably your application performance will degrade and the finger pointing between you and your outsourcer will begin. That is, unless you are using an APM tool to monitor your application. The whole point of APM tools is to end finger pointing and to reduce mean time to restore service (MTRS) as much as possible. By using APM after the application move you will provide the highest level of service to your customers as possible.

Compare Releases

Comparison of two application releases. Granular comparison to understand before and after states. – Key Performance Indicators

Compare releases 2

Comparison of two application releases. Granular comparison to understand before and after states. – Load, Response Time, Errors

If you’re considering or in the process of transitioning to a digital enterprise you should seriously consider using APM to solve a multitude of problems. You can click here to sign up for a free trial of AppDynamics and get started today.

Instrumenting .NET applications with AppDynamics using NuGet

Introduction

One of the coolest things to come out of the .NET stable at AppD this week was the NuGet package for Azure Cloud Services. NuGet makes it a breeze to deploy our .NET agent along with your web and worker roles from inside Visual Studio. For those unfamiliar with NuGet, more information can be found here.

Our NuGet package ensures that the .NET agent is deployed at the same time when the role is published to the cloud. After adding it to the project you’ll never have to worry about deploying the agent when you swap your hosting environment from staging to production in Azure or when Azure changes the machine from under your instance. For the remainder of the post I’ll use a web role to demonstrate how to quickly install our NuGet package, changes it makes to your solution and how to edit the configuration by hand if needed. Even though I’ll use a web role, things work exactly the same way for a worker role.

Installation

So, without further ado, let’s take a look at how to quickly instrument .NET code in Azure using AppD’s NuGet package for Windows Azure Cloud Services. NuGet packages can be added via the command line or the GUI. In order to use the command line, we need to bring up the package manager console in Visual Studio as shown below

PackageManager

In the console, type ‘install-package AppDynamics.WindowsAzure.CloudServices’ to install the package. This will bring up the following UI where you can enter the information needed by the agent to talk to the controller and upload metrics. You should have received this information in the welcome email from AppDynamics.

Azure

The ‘Application Name’ is the name of the application in the controller under which the metrics reported by this agent will be stored. When ‘Test Connection’ is checked we will check the information entered by trying to connect to the controller. An error message will be displayed if the test connection is unsuccessful. That’s it, enter the information, click apply and we’re done. Easy Peasy. No more adding files one by one or modifying scripts by hand. Once deployed, instances of this web role will start reporting metrics as soon as they experience any traffic. Oh, and by the way, if you prefer to use a GUI instead of typing commands on the console, the same thing can be done by right-clicking on the solution in Visual Studio and choosing ‘Manage NuGet Package’.

Anatomy of the package

If you look closely at the solution explorer you’ll notice that a new folder called ‘AppDynamics’ has been created in the solution explorer. On expanding the folder you’ll find the following two files:

  • Installer of the latest and greatest .NET agent.
  • Startup.cmd
The startup script makes sure that the agent gets installed as a part of the deployment process on Azure. Other than adding these files we also change the ServiceDefinition.csdef file to add a startup task as shown below.

Screen Shot 2013-11-27 at 8.11.27 PM

In case, you need to change the controller information you entered in the GUI while installing the package, it can be done by editing the startup section of the csdef file shown above. Application name, controller URL, port, account key etc. can all be changed. On re-deploying the role to Azure, these new values will come into effect.

Next Steps

Microsoft Developer Evangelist, Bruno Terkaly blogged about monitoring the performance of multi-tiered Windows Azure based web applications. Find out more on Microsoft Developer Network.

Find out more in our step-by-step guide on instrumenting .NET applications using AppDynamics Pro. Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

A Private Equity Buy Out Horror Story

This is a cautionary story from my personal work history about one private equity company’s buy-out of an enterprise software vendor.

A major news story broke today about Elliot Management Corporation’s bid to take Riverbed Technology private for a premium of $19.00 per share in cash. One of Riverbeds competitors Compuware has also been a target of buy-out offers and continues to have an uncertain future as a public company. Compuware announced today a partial divestiture of some of their business units.

In this blog post I’m going to share with you my personal experience dealing with the aftermath of a software vendor being bought by a private equity firm. All names are being withheld to protect the identity of the guilty parties.

2 Years of Good Value

The story begins several years ago when I was working as a Monitoring Architect for a major investment bank. We had purchased a two year, two million dollar, “all you can eat” site license from a major software vendor. This entitled us to deploy as many licenses of their product as we wanted within a two year period. At the end of those 2 years our licenses would turn perpetual and we would true-up the total number and pay 15% maintenance on the value of those licenses.

Towards the end of this two year period the major software vendor spun off their monitoring business by selling it to a private equity firm. This is when things started to go very wrong. We were near the end of our two year contract so we were beginning the negotiation process on a new contract. We were happy with the technology, support, and relationship with our major vendor and intended to continue using their software.

You want us to pay how much?

Private Equity Buy Out Riverbed CompuwareUnder new leadership, we were told by our account manager that we owed four million dollars in licensing fees since we deployed too many licenses under our “all you can eat” contract. The contract had no wording to support this and an argument ensued. After a very long negotiation period we eventually agreed to pay about four hundred thousand dollars in maintenance fees and we immediately started looking for a replacement vendor.

Get Out and Don’t Come Back

At this point the relationship was completely broken. There was no possible way this vendor could stop what they had set in motion. It was my job to find suitable replacements for their products being used by our company. With 2 years all of their software was ripped out and replaced by competing products. Every time I think about this situation I am amazed by the greed and the gall of this company, all under the direction of a private equity firm.

The guilty party is still in business today somehow, but from looking at their product portfolio I don’t see much progress in the past 5 years. It seems like the private equity company is just riding out the technology they bought and trying to squeeze as much profit as possible from it until such time where it will just die off. It’s sad to see good technology being left to grow antiquated and rot.

So that is my cautionary story. Be very wary when there is talk of a buy out from private equity. Be proactive and seek options in case things go down like they did for me at the investment bank. Have you ever been in a similar situation? Do you have a happy or sad tale to tell? Let me know in the comments section.

If you’re interested in trying AppDynamics you can click here to start a free 15 day trial.

Bootstrapping DropWizard apps with AppDynamics on OpenShift by Red Hat

Getting started with DropWizard, OpenShift, and AppDynamics

In this blog post, I’ll show you how to deploy a Dropwizard-based application on OpenShift by Red Hat and monitor it with AppDynamics.

DropWizard is a high-performance Java framework for building RESTful web services. It is built by the smart folks at Yammer and is available as an open-source project on GitHub. The easiest way to get started with DropWizard is with the example application. The DropWizard example application was developed to, as its name implies, provide examples of some of the features present in DropWizard.

DropWizard

OpenShift can be used to deploy any kind of application with the DIY (do it yourself) cartridge. To get started, log in to OpenShift and create an application using the DIY cartridge.

With the official OpenShift quick start guide to AppDynamics getting started with AppDynamics on OpenShift couldn’t be easier.

1) Signup for an account on OpenShift by RedHat

2) Setup RedHat client tools on your local machine

$ gem install rhc
$ rhc setup

3) Create a Do It Yourself application on OpenShift

$ rhc app create appdynamicsdemo diy-0.1
 --from-code https://github.com/Appdynamics/dropwizard-sample-app.git

Getting started is as easy as creating an application from an existing git repository: https://github.com/Appdynamics/dropwizard-sample-app.git

DIY Cartridge


% rhc app create appdynamicsdemo diy-0.1 --from-code https://github.com/Appdynamics/dropwizard-sample-app.git

Application Options
——————-
Domain: appddemo
Cartridges: diy-0.1
Source Code: https://github.com/Appdynamics/dropwizard-sample-app.git
Gear Size: default
Scaling: no

Creating application ‘appdynamicsdemo’ … done
Waiting for your DNS name to be available … done

Cloning into ‘appdynamicsdemo’…
Your application ‘appdynamicsdemo’ is now available.

URL: http://appdynamicsdemo-appddemo.rhcloud.com/
SSH to: 52b8adc15973ca7e46000077@appdynamicsdemo-appddemo.rhcloud.com
Git remote: ssh://52b8adc15973ca7e46000077@appdynamicsdemo-appddemo.rhcloud.com/~/git/appdynamicsdemo.git/

Run ‘rhc show-app appdynamicsdemo’ for more details about your app.

With the OpenShift Do-It-Yourself container you can easily run any application by adding a few action hooks to your application. In order to make DropWizard work on OpenShift we need to create three action hooks for building, deploying, and starting the application. Action hooks are simply scripts that are run at different points during deployment. To get started simply create a .openshift/action_hooks directory:

mkdir -p .openshift/action_hooks

Here is the example for the above sample application:

When checking out the repository use Maven to download the project dependencies and package the project for production from source code:

.openshift/action_hooks/build

cd $OPENSHIFT_REPO_DIR

mvn -s $OPENSHIFT_REPO_DIR/.openshift/settings.xml -q package

When deploying the code you need to replace the IP address and port for the DIY container. The properties are made available as environment variables:

.openshift/action_hooks/deploy

cd $OPENSHIFT_REPO_DIR

sed -i 's/@OPENSHIFT_DIY_IP@/'"$OPENSHIFT_DIY_IP"'/g' example.yml
sed -i 's/@OPENSHIFT_DIY_PORT@/'"$OPENSHIFT_DIY_PORT"'/g' example.yml

Let’s recap some of the smart decisions we have made so far:

  • Leverage OpenShift platform as a service (PaaS) for managing the infrastructure
  • Use DropWizard as a solid foundation for our Java application
  • Monitor the application performance with AppDynamics Pro

With a solid Java foundation we are prepared to build our new application. Next, try adding another machine or dive into the DropWizard documentation.

Combining DropWizard, OpenShift, and AppDynamics

AppDynamics allows you to instrument any Java application with by simply adding the AppDynamics agent to the JVM. Sign up for a AppDynamics Pro self-service account. Log in using your account details in your email titled “Welcome to your AppDynamics Pro SaaS Trial” or the account details you have entered during On-Premise installation.

The last step to combine the power of OpenShift and DropWizard is to instrument the app with AppDynamics. Simply update your AppDynamics credentials in the Java agent’s AppServerAgent/conf/controller-info.xml configuration file.

Finally, to start the application we need to run any database migrations and add the AppDynamics Java agent to the startup commmand:

.openshift/action_hooks/deploy

cd $OPENSHIFT_REPO_DIR

java -jar target/dropwizard-example-0.7.0-SNAPSHOT.jar db migrate example.yml

java -javaagent:${OPENSHIFT_REPO_DIR}AppServerAgent/javaagent.jar
     -jar ${OPENSHIFT_REPO_DIR}target/dropwizard-example-0.7.0-SNAPSHOT.jar
     server example.yml > ${OPENSHIFT_DIY_LOG_DIR}/helloworld.log &

OpenShift App

Additional resources on running DropWizard on OpenShift:

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

Insights from an Investment Banking Monitoring Architect

To put it very simply, Financial Services companies have a unique set of challenges that they have to deal with every day. They are a high priority target for hackers, they are highly regulated by federal and state governments, they deal with and employ some of the most demanding people on the planet, problems with their applications can have an impact on every other industry across the globe. I know this from first hand experience; I was an Architect at a major investment bank for over 5 years.

In this blog post I’m going to show you what’s really important when Financial Services companies consider application monitoring solutions and warn you about the hidden realities that only expose themselves after you’ve installed a large enough monitoring footprint.

1 – Product Architecture Plays a Major Role in Long Term Success or Failure

Every monitoring tool has a different core architecture. On the surface these architectures may look similar but it is imperative to dive deeper into the details of how all monitoring products work. We’ll use two real product architectures as examples.

Monitoring Architecture“APM Solution A” is an agent based solution. This means that a piece of vendor code is deployed to gather monitoring information from your running applications. This agent is intelligent and knows exactly what to monitor, how to monitor, and when to dial itself back to do no harm. The agent sends data back to central collector (called a controller) where this data is correlated, analyzed, and categorized automatically to provide actionable intelligence to the user. With this architecture the agent and the controller are very loosely coupled which lends itself well to highly distributed, virtualized environments like you see in modern application architectures.

“APM Solution B” is also agent based. They have a 3 tiered architecture which consists of agents, collectors, and servers. On the surface this architecture seems reasonable but when we look at the details a different story emerges. The agent is not intelligent therefore it does not know how to instrument an application. This means that every time an application is re-started, the agent must send all of the methods to the collector so that the collector can tell the agent how and what to instrument. This places a large load on the network, delays application startup time, and adds to the amount of hardware required to run your monitoring tool. After the collector has told the agent what to monitor the collectors job is to gather the monitoring data from the agent and pass it back to the server where it is stored and viewed. For a single application this architecture may seem acceptable but you must consider the implications across a larger deployment.

Choosing a solution with the wrong product architecture will impact your ability to monitor and manage your applications in production. Production monitoring is a requirement for rapid identification, isolation and repair of problems.

2 – Monitoring Philosophy

Monitoring isn’t as straight forward as collecting, storing, and showing data. You could use that approach but it would not provide much value. When looking at monitoring tools it’s really important to understand the impact of monitoring philosophy on your overall project and goals. When I was looking at monitoring tools I needed to be able to solve problems fast and I didn’t want to spend all of my time managing the monitoring tools. Let’s use examples to illustrate again.

Application Monitoring Philosophy“APM Solution A” monitors every business transaction flowing through whatever application it is monitoring. Whenever any business transaction has a problem (slow or error) it automatically collects all of the data (deep diagnostics) you need to figure out what caused the problem. This, combined with periodic deep diagnostic sessions at regular intervals, allows you to solve problems while keeping network, storage, and CPU overhead low. It also keeps clutter down (as compared to collecting everything all the time) so that you solve problems as fast as possible.

“APM Solution B” also monitors every transaction for each monitored application but collects deep diagnostic data for all transactions all the time. This monitoring philosophy greatly increases network, storage, and CPU overhead while providing massive amounts of data to work with regardless of whether or not there are application problems.

When I was actively using monitoring tools in the Investment Bank I never looked at deep diagnostic data unless I was working on resolving a problem.

3 – Analytics Approach

Analytics comes in many shapes and sizes these days. Regardless of the business or technical application, analytics does what humans could never do. It creates actionable intelligence from massive amounts of data and allows us to solve problems much faster than ever before. Part of my process for evaluating monitoring solutions has always been determining just how much extra help each tool would provide in identifying and isolating (root cause) application problems using analytics. Back to my example…

“APM Solution A” is an analytics product at it’s core. Every business transaction is analyzed to create a picture of “normal” response time (a baseline). When new business transactions deviate from this baseline they are automatically classified as either slow or very slow and deep diagnostic information is collected, stored, and analyzed to help identify and isolate the root cause. Static thresholds can be set for alerting but by default, alerts are based upon deviation from normal so that you can proactively identify service degradation instead of waiting for small problems to become major business impact.

“APM Solution B” only provides baselines for the business transactions you have specified. You have to manually configure the business transactions for each application. Again, on small scale this methodology is usable but quickly becomes a problem when managing the configuration of 10’s, 100’s, or 1000’s of applications that keep changing as development continues. Searching through a large set of data for a problem is much slower without the assistance of analytics.

Monitoring Analytics

4 – Vendor Focus

When you purchase software from a vendor you are also committing to working with that vendor. I always evaluated how responsive every vendor was during the pre-sales phase but it was hard to get a good measure of what the relationship would be like after the sale. No matter how good the developers are, there are going to be issues with software products. What matters the most is the response you get from the vendor after you have made the purchase.

5 – Ease of Use

This might seem obvious but ease of use is a major factor in software delivering a solid return on investment or becoming shelf-ware. Modern APM software is powerful AND easy to use at the same time. One of the worst mistakes I made as an Architect was not paying enough attention to ease of use during product evaluation and selection. If only a few people in a company are capable of using a product then it will never reach it’s full potential and that is exactly what happened with one of the products I selected. Two weeks after training a team on product usage, almost nobody remembered how to use it. That is a major issue with legacy products.

Enterprise software is undergoing a major disruption. If you already have monitoring tools in place, now is the right time to explore the marketplace and see how your environment can benefit from modern tools. If you don’t have any APM software in place yet you need catch up to your competition since most of them are already looking or have already implemented APM for their critical applications. Either way, you can get started today with a free trial of AppDynamics.