How Do you Monitor Your Logging?

Applications typically log additional data such as exceptions to different data sources. Windows event logs, local files, and SQL databases are most commonly used in production. New applications can take advantage of leveraging big data instead of individual files or SQL.

One of the most surprising experiences when we start monitoring applications is noticing the logging is not configured properly in production environments. There have been two types of misconfiguration errors we’ve seen often in the field:

  1. logging configuration is copied from staging settings

  2. while deploying the application to production environment, logging wasn’t fully configured and the logging failed to log any data

To take a closer look, I have a couple of sample applications to show how the problems could manifest themselves. These sample applications were implemented using MVC5 and are running in Windows Azure and using Microsoft Enterprise Library Exception Handling and Logging blocks to log exceptions to the SQL database. There is no specific preference regarding logging framework or storage, just wanted to demonstrate problems similar to what we’ve seen with different customers.

Situation #1 Logging configuration was copied from staging to production and points to the staging SQL database

When we installed AppDynamics and it automatically detected the application flowmap, I noticed the application talks to the production UserData database and… a staging database for logging.

The other issue was the extremely slow response time while calling the logging database. The following snapshot can explain the slow performance, as you see there’s an exception happening while trying to run an ADO.NET query:

Exception details confirm the application was not able to connect to a database, which is expected — the production environment in located in DMZ and usually can’t reach a staging network.

To restate what we see above — this is a failure while trying to log the original exception which could be anything from a user not being able to log into the website to failing to checkout.

At the same time the impact is even higher because the application spends 15 seconds trying to connect to the logging database and timeout, all while the user is waiting.

Situation #2 During deployment the service account wasn’t granted permissions to write to the logging database

This looks similar to the example above but when we drill inside the error we can see the request has an internal exception happened during the processing:

The exception says the service account didn’t have permissions to run the stored procedure “WriteLog” which logs entries to the logging database. From the performance perspective, the overhead of security failure is less from timeouts in the example above but the result is the same — we won’t be able to see the originating exception.

Not fully documenting or automating the application deployment/configuration process usually causes such problems.

These are one-time issues that once you fix it will work on the machine. However, next time you deploy the application to a new server or VM this will happen again until you fix the deployment.

Let’s check the EntLigLogging database — it has no rows

Here’s some analysis to explain why this happened:

  1. We found exceptions when the application was logging the error

  2. This means there was an original error and the application was trying to report it using logging

  3. Logging failed which means the original error was never reported!

  4. And… logging doesn’t log anywhere about its failures, which means from a logging perspective the application has no problems!!

This is logically correct — if you can’t log data to the storage database you can’t log anything. Typically, loggers are implemented similar to the following example:

Logging is the last option in this case and when it fails nothing else happens as you see in the code above.

Just to clarify, AppDynamics was able to report these exceptions because the agent instruments common methods like ADO.NET calls, HTTP calls, and other exit calls as well as error handlers, which helped in identifying the problem.

Going back to our examples, what if the deployment and configuration process is now fixed and fully automated so there can’t be a manual mistake? Do you still need to worry? Unfortunately, these issues happen more often than you’d expect, here is another real example.

Situation #3 What happens when the logging database fills up?

Everything is configured correctly but at some point the logging database fills up. In the screenshot above you can this this happened around 10:15pm. As a result, the response time and error rates have spiked.

Here is one of the snapshots collected at that time:

You can see that in this situation it took over 32 seconds trying to log data. Here are the exception details:

The worst part is at 10:15pm the application was not able to report about its own problems due to the database being completely full, which may incorrectly be translated that the application is healthy since it is “not failing” because there are no new log entries.

We’ve seen enough times that the logging database isn’t seen as a critical piece of the application therefore it gets pushed down the priority list and often overlooked. Logging is part of your application logic and it should fall into the same category as the application. It’s essential to document, test, properly deploy and monitor the logging.

This problem could be avoided entirely unless your application receives an unexpected surge of traffic due to a sales event, new release, marketing campaign, etc. Other than the rare Slashdotting effect, your database should never get to full capacity and result in a lack of logging. Without sufficient room in your database, your application’s performance is in jeopardy and you won’t know since your monitoring framework isn’t notifying you. Because these issues are still possible, albeit during a large load surge, it’s important to continuously monitor your loggingn as you wouldn’t want an issue to occur during an important event.

Key points:

  • Logging adds a new dependency to the application

  • Logging can fail to log the data – there could be several reasons why

  • When this happens you won’t be notified about the original problem or a logging failure and the performance issues will compound

This would never happen to your application, would it?

If you’d like to try AppDynamics check out our free trial and start monitoring your apps today! Also, be sure to check out my previous post, The Real Cost of Logging.

4 Reasons Why You Should Use APM When You Load Test Your Website

I wouldn’t do website load/performance testing any more without having an APM tool in place. Period. Full stop. End of story.

I’ve been involved in website load testing for over 10 years, as a “end-user” when I was web operations manager for an online job board, as a team leader for a company providing cloud load testing services, and as a consultant on web performance with my own company DevOpsGuys. The difference in the value you get from load/performance testing with and without APM tools is enormous.

We’ve probably all seen those testing reports that are full of graphs of response time versus req/sec, CPU utilisation curves, disk IO throughput, error rates ad nauseam. I, to my eternal shame, have even written them… and whilst they are useful for answering the (very simplistic) question of “how many simulated requests/users can my website support before it falls over?” generating any real application insight from what are essentially infrastructure metrics is difficult. This type of test report rarely results in any corrective actions other than (1) “lets throw more hardware at it” or (2) “let’s shout at the devs that they have to fix something because the application is slow”.  Quite often the report gets circular filed because no-one knows how to derive application insight and hence generate meaningful corrective actions at either the code, application stack configuration or infrastructure level. All that effort & expense is wasted.

So how are things different when using APM tools (like my preferred tool, AppDynamics)? Here are my top 4 reasons:

1. See the Big Picture (Systems Thinking)

“Systems thinking is a framework for seeing interrelationships rather than things, for seeing patterns rather than static snapshots.”  – Peter Senge, “The Fifth Discipline”.

The “first way of DevOps” is systems thinking, and APM tools reinforce the systems thinking perspective by helping you see the big picture very clearly. You can see the interrelationships between the web tier, application tier, database servers, message queues, external cloud services etc. in real time while you’re testing rather than being focussed on the metrics for each tier individually. You can instantly see where the bottlenecks in your application are in the example below the 4306ms calls to Navision stand out!

FlowMap

2. Drill Down to the Code Level

One of my favourite things when load testing with APM tools is being able to drill down to the stack trace level and identify the calls that are the most problematic. Suddenly, instead of talking about infrastructure metrics like CPU, RAM and Disk we are talking about application metrics — this business transaction (e.g. web page or API request) generates this flow across the application and 75% of the time is spent in this method call which makes 3 database calls and 2 web service calls and its this database call that’s slow and here’s the exact SQL statement that was executed. The difference in the response you get from the developer’s when you give them this level of detail compared to “your application is slow when we hit 200 users” is fantastic — now you are giving them real, pinpoint actionable intelligence on how their application responds under load.

DrillDown

 3. Iterate Faster

“the application was made 56x faster during a 12hr testing window”

Because you can move quickly to the code level in real-time while you test and because this facilitates better communication with the development team your load testing suddenly becomes a lot more collaborative, even if the load testing is being performed by an external 3rd party.

We generally have all the relevant parties on a conference call or HipChat chat session while we test and we are constantly exchanging information, screenshots, links to APM snapshots and the developers are often able to code fixes there and then because we can rapidly pinpoint the pain points.

If you’ve got a customer with an Agile mindset and continuous delivery capability it can enable you to do rapid test and fix cycles during testing, often multiple times times in a day. In one notable example, the application was made 56x faster during a 12hr testing window due to 4 application releases during that period.

56xFaster

4. Stop the “Blame Game”

“make the enemy poor performance, not each other…”

Traditionally in the old school (pre-APM tools) days, load tests were often conducted by external load testing consultancies who would come in, do the testing, and then deliver some big report on how things went.

The customer would assemble their team together in a conference room to go through the report, which often triggered the “blame game” – Ops blaming Dev, Dev blaming QA, QA blaming Ops, Ops blaming the hosting provider, the hosting provider blaming the customer’s code and around and around it would go.

But with the right APM tools in place we’ve found this negative team dynamic can be avoided.

As mentioned earlier, testing tends to become more collaborative because it’s easier to share the performance data in real time via the APM tool, and discussions become more evidence-based. It’s more about “what are we going to do about this problem we can see here in the APM tool” and less about trying to allocate blame when no-one really knows where the problem actually resides and they don’t want to be left holding the can. The system-thinking, holistic view of the application’s performance promulgated by the APM tool makes performance the enemy, not each other. And that means that the performance issues are likely to be fixed faster, and not ignored due to politics and infighting.

There are probably loads more reasons you can come up with for why load testing with APM tools are awesome (and I’d love you hear your thoughts in the comments), but I will leave you with one more bonus reason – because it’s fun. For me, using AppDynamics when I’m doing load testing and performance tuning has really bought the fun factor back into the work. It’s fun to see the load being applied to the system and to see (via AppDynamics) the effect across the entire application. It’s fun to work closer with the Dev & Ops teams (dare I say, “DevOps”!) and to share meaningful, actionable insights on where the problems lie, and it’s fun be able to rapidly iterate and show the performance improvements in real-time.

The Real Cost of Logging

In order to manage today’s highly dynamic application environments, many organizations turn to their logging system for answers – but reliance on these systems may be having an undesired impact on the applications themselves.

The vast majority of organisations use some sort of logging system — it could log errors, traces, information messages or debugging information. Applications can write logs to a file, database, Windows event log, or big data store and there are many logging frameworks and practices being used.

Logging brings good insights about the application behavior, especially about failures. However, by being part of the application, logging also participates in the execution chain, which can have its disadvantages. While working with customers we often see the negative consequences when logging alone introduced the adverse impact to the application.

Most of the time the overhead of logging is negligible. It only matters when the application is under significant load — but these are the times when it matters the most. Think about Walmart or Best Buy during Black Friday and Cyber Monday. Online sales are particularly crucial for these retail companies during this period and this is the time when their applications are under most stress.

To better explain the logging overhead I created a lightweight .NET application that:

  1. implemented using ASP.NET
  2. performs lightweight processing
  3. has an exception built in
  4. exceptions are always handled within try…catch statement
  5. exceptions are either logged using log4net or ignored based on the test

In my example I used log4net as I recently diagnosed a similar problem with a customer who was using log4net, however this could be replaced for any other framework that you use.

Test #1

First, we set up a baseline by running an application when exceptions are not being logged from the catch statement.

 

Test #2

Next I enabled logging exceptions by logging those to a local file and running same load test.

As you can see not only is the average response time significantly higher now, but also the throughput of the application is lower.

The AppDynamics snapshot is collected automatically when there is a performance problem or failure and includes full call graph with timings for each executed method.

By investigating the call graph AppDynamics produces, we see that log4net method, FileAppender, renders error information to a file using FileStream. On the right you can see duration of each call and the most time was spent in WriteFileNative as it was competing with similar requests trying to append the log file with error details.

Test #3

I often come across attempting to make exception logging asynchronous by using ThreadPool. Below is how the performance looks like in this setup under exactly the same load.

This is a clever concept and works adequately for low throughput applications, but as you can see the average response time is still in a similar range as the non-asynchronous version, but a slightly lower throughput is achieved.

Why is this? Having logging running in separate threads means the resources are still consumed, there are less threads available, and the number of context switches will be greater.

In my experience logging to a file is exceptionally common. Other types of storage could introduce better performance, however they always need to be further tested and logging to a file is the easier solution.

Summary

While logging is important and helps with application support and troubleshooting, logging should be treated as part of the application logic. This means logging has to be designed, implemented, tested, monitored and managed. In short, it should become a part of full application lifecycle management.

How well do you understand the your logging framework and it’s performance cost?

 Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

Tracking PHP Application Events with AppDynamics

Event Tracking

All too often PHP engineers find themselves repeating the same tasks to triage their application problems. Issues can range from poorly written code to database bottlenecks, slow remote service API calls, or machine issues including I/O bottlenecks — whether hardware or network related.

In certain cases, these issues are nearly impossible to discover due to the nonexistence of a mechanism for tracking and reporting particular events that may impact the performance of your application when those events are not directly related to the application code itself.

For example, imagine the frustration when a recent PHP upgrade causes a fatal error. What if routine configuration changes to your maintenance scripts also impacts your ability to read from your database?

Perhaps switching database table engines from MyISAM to to InnoDB is causing application slowdown. The numerous types of events outside of the normal development workflow can compromise the integrity of your application’s user experience while at the same time creating unwanted frustration.

Types of Events

Event tracking is an integral part of maintaining true and transparent insight into the various events that revolve around the performance of your application.  One of my favorite core APM features is Event Tracking: the ability to track a change in the state of your application that is of potential interest. Some examples of the various actions you can track are:

  • Upgrading your PHP framework

  • Application deployments AND rollbacks

  • Switching database table engines (e.g. MyISAM to InnoDB)

  • Changes/upgrades to hardware

  • Upgrades to your OS, MySQL, web server, etc.

  • PHP.ini changes

  • Installing/upgrading PHP extensions

I think you get the idea – you want to track anything that could potentially impact the performance of your application.

AppDynamics Event Tracking

The AppDynamics Event Tracking feature can be accessed by clicking the Events link in the main navigation menu.

Once clicked, you’re presented with a view of all events that have occurred in your application at that time. In this example, we’re presented with Health Rule Violations and an instance of a server being restarted. To narrow down what you’re looking for, you have the option of using an advanced search filter. Select ‘Show Filters’ and you will see a list of choices to the left of the event list.

Compare Releases

‘Compare Releases’ shows the real power of AppDynamics and is the reason why it remains one of my favorite core APM features. Under the Analyze menu item, click ‘Compare Releases’ and you’ll be shown a screen comparing your application between two different time periods. A unique column here is the ‘Events’ column displaying any events registered during the specified time range to give you further insight into what may have been previously overlooked. In this specific example, we’re comparing the application’s KPIs between two different weeks. You can see that our error rate decreased the week later with no health rule violations registered as events. The screenshot shows a definitive performance improvement between the two time periods.

We encourage you to explore the Events feature further. You will see how you can combine both the power of Change Releases and our Alert & Respond feature to execute custom scripts based upon triggered events..  an added bonus, the Events feature is also accessible by a RESTful API that allows you to register a change event from anywhere at anytime.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

 

Instrumenting .NET applications with AppDynamics using NuGet

Introduction

One of the coolest things to come out of the .NET stable at AppD this week was the NuGet package for Azure Cloud Services. NuGet makes it a breeze to deploy our .NET agent along with your web and worker roles from inside Visual Studio. For those unfamiliar with NuGet, more information can be found here.

Our NuGet package ensures that the .NET agent is deployed at the same time when the role is published to the cloud. After adding it to the project you’ll never have to worry about deploying the agent when you swap your hosting environment from staging to production in Azure or when Azure changes the machine from under your instance. For the remainder of the post I’ll use a web role to demonstrate how to quickly install our NuGet package, changes it makes to your solution and how to edit the configuration by hand if needed. Even though I’ll use a web role, things work exactly the same way for a worker role.

Installation

So, without further ado, let’s take a look at how to quickly instrument .NET code in Azure using AppD’s NuGet package for Windows Azure Cloud Services. NuGet packages can be added via the command line or the GUI. In order to use the command line, we need to bring up the package manager console in Visual Studio as shown below

PackageManager

In the console, type ‘install-package AppDynamics.WindowsAzure.CloudServices’ to install the package. This will bring up the following UI where you can enter the information needed by the agent to talk to the controller and upload metrics. You should have received this information in the welcome email from AppDynamics.

Azure

The ‘Application Name’ is the name of the application in the controller under which the metrics reported by this agent will be stored. When ‘Test Connection’ is checked we will check the information entered by trying to connect to the controller. An error message will be displayed if the test connection is unsuccessful. That’s it, enter the information, click apply and we’re done. Easy Peasy. No more adding files one by one or modifying scripts by hand. Once deployed, instances of this web role will start reporting metrics as soon as they experience any traffic. Oh, and by the way, if you prefer to use a GUI instead of typing commands on the console, the same thing can be done by right-clicking on the solution in Visual Studio and choosing ‘Manage NuGet Package’.

Anatomy of the package

If you look closely at the solution explorer you’ll notice that a new folder called ‘AppDynamics’ has been created in the solution explorer. On expanding the folder you’ll find the following two files:

  • Installer of the latest and greatest .NET agent.
  • Startup.cmd
The startup script makes sure that the agent gets installed as a part of the deployment process on Azure. Other than adding these files we also change the ServiceDefinition.csdef file to add a startup task as shown below.

Screen Shot 2013-11-27 at 8.11.27 PM

In case, you need to change the controller information you entered in the GUI while installing the package, it can be done by editing the startup section of the csdef file shown above. Application name, controller URL, port, account key etc. can all be changed. On re-deploying the role to Azure, these new values will come into effect.

Next Steps

Microsoft Developer Evangelist, Bruno Terkaly blogged about monitoring the performance of multi-tiered Windows Azure based web applications. Find out more on Microsoft Developer Network.

Find out more in our step-by-step guide on instrumenting .NET applications using AppDynamics Pro. Take five minutes to get complete visibility into the performance of your production applications with AppDynamics Pro today.

As always, please feel free to comment if you think I have missed something or if you have a request for content in an upcoming post.

AppDynamics Pro on the Windows Azure Store

Over a year ago, AppDynamics announced a partnership with Microsoft and launched AppDynamics Lite on the Windows Azure Store. With AppDynamics Lite, Windows Azure users were able to easily monitor their applications at the code level, allowing them to identify and diagnose performance bottlenecks in real time. Today we’re happy to announce that AppDynamics Pro is now available as an addon in the Windows Azure store, which makes it easier for developers to get complete visibility into their mission-critical applications running on Windows Azure.

  • Easier/simplified buying experience in Windows Azure Store
  • Tiered pricing based on number of agents and VM size
  • Easy deployment from Visual Studio with NuGet
  • Out-of-the-box support for more Windows Azure services

“AppDynamics is one of only a handful of application monitoring solutions that works on Windows Azure, and the only one that provides the level of visibility required in our distributed and complex application environments,” said James Graham, project manager at MacMillan Publishers. “The AppDynamics monitoring solution provides insight into how our .NET applications perform at a code level, which is invaluable in the creation of a dynamic, fulfilling user experience for our students.”

Easy buying experience

Purchasing the AppDynamics Pro add-on in the Windows Azure Store only takes a couple of minutes. In the Azure portal click NEW at the bottom left of the screen and then select STORE. Search for AppDynamics, choose your plan, add-on name and region.

4-choose-appdynamics

Tiered pricing

AppDynamics Pro for Windows Azure features new tiered pricing based on the size of your VM (extra small, small or medium, large, or extra large) and the number of agents required (1, 5 or 10). This new pricing allows organizations with smaller applications to pay less to store their monitoring data than those with larger, more heavily trafficked apps. The cost is added to your monthly Windows Azure bill, and you can cancel or change your plan at any time.

AppDynamics on Windows Azure Pricing

Deploying with NuGet

Use the AppDynamics NuGet package to deploy AppDynamics Pro with your solution from Visual Studio. For detailed instructions check out the how-to guide.

2-vs-package-search

Monitoring with AppDynamics

  • Monitor the health of Windows Azure applications
  • Troubleshoot performance problems in real time
  • Rapidly diagnose root cause of performance problems
  • Dynamically scale up and scale down their Windows Azure application based on performance metrics

AppDynamics .Net App

Additional platform support

AppDynamics Pro automatically detects and monitors most Azure services out-of-the-box, including web and worker roles, SQL, Azure Blob, Azure Queue and Windows Azure Service Bus. In addition, AppDynamics Pro now supports MVC 4. Find out more in our getting started guide for Windows Azure.

Get started monitoring your Windows Azure app by adding the AppDynamics Pro add-on in the Windows Azure Store.

Top Tips for Managing .NET Application Performance

There are many technical articles/blogs on the web that jump straight into areas of .NET code you can instantly optimize and tune. Before we get to some of those areas, it’s good to take a step back and ask yourself, “Why am I here?” Are you interested in tuning your app, which is slow and keeps breaking, or are you looking to prevent these things from happening in the future? When you start down the path of Application Performance Management (APM), it is worth asking yourself another important question – what is success? This is especially important if you’re looking to tune or optimize your application. Knowing when to stop is as important as knowing when to start.

A single code or configuration change can have a dramatic impact on your application’s performance. It’s therefore important that you only change or tune what you need to – less is often more when it comes to improving application performance. I’ve been working with customers in APM for over a decade and it always amazes me how dev teams will browse through packages of code and rewrite several classes/methods at the same time with no real evidence that what they are changing will actually make an impact. For me, I learned the most about writing efficient code in code reviews with peers, despite how humbling it was. What I lacked the most as a developer, though, was visibility into how my code actually ran in a live production environment. Tuning in development and test is not enough if the application still runs slow in production. When manufacturers design and build cars they don’t just rely on simulation tests – they actually monitor their cars in the real world. They drive them for hundreds of thousands of miles to see how their cars will cope in all conditions they’ll encounter. It should be the same with application performance. You can’t simulate every use case or condition in dev and test, so you must understand your application performance in the real world.

.NET Application Performance delivered with AppDynamics 3.3

It’s official: AppDynamics support for Microsoft .NET and Windows Azure is finally here! We’ve got the same Kick Ass Product with the same Secret Sauce–but now it sports a shiny new CLR agent. So whether your apps are Java,  .NET or hybrid, with AppDynamics you have the best of both worlds when it comes to managing application performance.

We thought it was only fair to share our secret sauce and love with the Microsoft community, given that 40,000+ users of the Java community have been enjoying it for over 18 months. Our mission is to simplify the way organizations manage their agile, complex, and distributed applications. For .NET, this means that AppDynamics supports the latest and greatest technologies from Microsoft, including their new PaaS platform Windows Azure.

So, what does this mean for organizations with .NET or Azure apps? Let me summarize:

#1 You get to visualize in real-time what your .NET application actually looks like, along with its health and performance across any distributed production environment. It’s the 50,000 foot view that shows how your application and your business performs in your data center or Windows Azure.

#2 Ability to track all business transactions that flow through your .NET application. This gives you insight into business activity, health, and impact in the event that a slowdown or problem occurs in your production environment. This unique context and visibility helps you troubleshoot through the eyes of the business, so you can see their pain instantly in production and resolve it in minutes. We auto-discover and map every business transaction automatically–so don’t worry about configuration. We’ve got that taken care of.

 

#3 Deep diagnostic information on how your business transactions actually execute through your CLRs and/or JVMs (if you’ve got a hybrid app). This means complete call stacks of code execution with latency breakdowns across all your namespaces, classes and methods, which your business transactions invoke. You get maximum visibility in production with zero configuration, allowing you to perform root cause analysis in minutes.

#4 Ability to plot, correlate, and trend any CLR or OS metric over time–whether it’s logic thread counts, garbage collection time, or simply how much CPU your application CLR is burning. We let you report and analyze all this so you understand your CLR run-time and OS system resource.

Don’t believe us? Sign up for our free 30-day trial and we’ll provision you a SaaS login. You can then download and install our lightweight agents and see for yourself just how easy it can be!

As well as our .NET support, we’ve also crammed in some great innovation into the 3.3 release.

Real-Time JVM MBean Viewer:

In addition to trending standard JMX metrics from the JVM, users can now discover and trend any MBean attributes on the fly for short term analysis in real-time. Our new UI dialogue allows the user to browse through hundreds of available metrics which are automatically discovered and reported at the touch of a button. If the user wishes to convert any MBean attribute into a standard JMX metric they can just click “Create Metric” and AppDynamics will collect and report that metric as standard in the JMX Metrics viewer.

Search Business Transactions by their content/payload:

For example, you might have launched a new product on your application or website and need to understand its performance by looking at all business transactions that interact with that product. With AppDynamics v3.3 users can now search business transactions by any transaction payload. For example, the below screenshot shows how a user can search for all business transactions that relate to the book “Harry Potter”.

Additional Platform Support:

  • Auto-discovery and mapping of LDAP, SAP and JavaMail tiers to business transaction flows for increased visibility.
  • MongoDB support allowing users to see BSON queries and associated latency for calls made from Java applications.
  • Enhanced support for WebSphere on Z/OS with automatic JVM naming pools to help customers identify and manage short-living and dynamic JVM run-times.
All in all another great release packed full of innovation from the AppDynamics team. Stay tuned over the next few weeks for more information on specific 3.3 features.
App Man.