APM vs aaNPM – Cutting Through the Marketing BS

Marketing_BSMarketing; mixed feelings! I’m really looking forward to the new super bowl ads (some are pure marketing genius) in a few weeks but I really dislike all of the confusion that marketing tends to create in the technology world. In todays blog post I’m going to attempt to cut through all of the marketing BS and clearly articulate the differences between APM (Application Performance Management) and aaNPM (Application Aware Network Performance Management). I’m not going to try to convince you that you need one instead of the other. This isn’t a sales pitch, it’s an education session.

Definitions

APM – Gartner has been hard at work trying to clearly define the APM space. According to Gartner, APM tools should contain the following 5 dimensions:

  • End-user experience monitoring (EUM) – (How is the application performing according to what the end user sees)
  • Runtime application architecture discovery modeling and display (A view of the logical application flow at any given point in time)
  • User-defined transaction profiling (Tracking user activity across the entire application)
  • Component deep-dive monitoring in application context (Application code execution, sql query execution, message queue behavior, etc…)
  • Analytics

aaNPM – Again, Gartner has created a definition for this market segment which you can read about here

“These solutions allow passive packet capture of network traffic and must include the following features, in addition to packet capture technology:

  • Receive and process one or more of these flow-based data sources: NetFlow, sFlow and Internet Protocol Flow Information Export (IPFIX).
  • Provide roll-ups and dashboards of collected data into business-relevant views, consisting of application-centric performance displays.
  • Monitor performance in an always-on state, and generate alarms based on manual or automatically generated thresholds.
  • Offer protocol analysis capabilities to decode and understand multiple applications, including voice, video, HTTP and database protocols. The tool must provide end-user experience information for these applications.
  • Have the ability to decrypt encrypted traffic if the proper keys are provided to the solution.”

Reality

So what do these definitions mean in real life?

APM tools are typically used by application support, operations, and development teams to rapidly identify, isolate, and repair application issues. These teams usually have a high level understanding of how networks operate but not nearly the detailed knowledge required to resolve network related issues. They live and breathe application code, integration points, and component (server, OS, VM, JVM, etc…) metrics. They call in the network team when they think there is a network issue (hopefully based upon some high level network indicators). These teams usually have no control over the network and must follow the network teams process to get a physical device connected to the network. APM tools do not help you solve network problems.

aaNPM tools are network packet sniffers by definition. The majority of these tools (NetScout, Cisco, Fluke Networks, Network Instruments, etc.) are hardware appliances that you connect to your network. They need to be connected to each segment of the network where you want to collect packets or they must be fed filtered and aggregated packet streams by NPB (Network Packet Broker) devices. aaNPM tools contain a wealth of network level details in addition to some valuable application data (EUM metrics, transaction details, and application flows). aaNPM tools help network engineers solve network problems that are manifesting themselves as application problems. aaNPM tools are not capable of solving code issues as they have no visibility into application code.

If I were on a network support team I would want to understand if applications were being impacted by network issues so I could prioritize remediation properly and have a point of reference if I was called out by an application support team.

Network and Application Team Convergence

I’ve been asked if I see network and application teams converging in a similar way that dev and ops teams are converging as a result of the DevOps movement. I have not seen this with any of the companies I get to interact with. Based on my experience working in operations at various companies in the past, network teams and application support teams think in very different ways. Is it impossible for these teams to work in unison? No, but I see it as unlikely over the next few years at least.

But what about SDN (Software Defined Networking)? I think most network engineers see SDN as a threat to their livelihood whether that is true or not. No matter the case, SDN will take a long time to make it’s way into operational use and in the mean time network and application teams will remain separate.

I hope this was helpful in cutting through the marketing spin that many vendors are using to expand their reach. When it comes right down to it, your use case may require APM, aaNPM, a combination of both, or some other technology not discussed today. Technology is made to solve problems. I highly recommend starting out by defining your problems and then exploring the best solutions available to help solve your problem.

If you’ve decided to explore your APM tool options you can try AppDynamics for free by clicking here.

The Digital Enterprise – Problems and Solutions

According to a recent article featured in Wall Street and Technology, Financial Services (FS) companies have a problem. The article explains that FS companies built more datacenter capacity than they needed when profits were up and demand was rising. Now that profits are lower and demand has not risen as expected the data centers are partially empty and very costly to operate.

FS companies are starting to outsource their IT infrastructure and this brings a new problem to light…

“It will take a decade to complete the move to a digital enterprise, especially in financial services, because of the complexity of software and existing IT architecture. “Legacy data and applications are hard to move” to a third party, Bishop says, adding that a single application may touch and interact with numerous other applications. Removing one system from a datacenter may disrupt the entire ecosystem.”

Serious Problems

The article calls out a significant problem that FS companies are facing now and will be for the next decade but doesn’t mention a solution.

The problem is that you can’t just pick up an application and move it without impacting other applications. Based upon my experience working with FS applications I see multiple related problems:

  1. Disruption of other applications
  2. Baselining performance and availability before the move
  3. Ensuring performance and availability after the move

All of these problems increase risk and the chance that users will be impacted.

Solutions

1. Disruption of other applications – The solution to this problem is easy in theory and traditionally difficult in practice. The theory is that you need to understand all of the external interactions with application you want to move.

One solution is to use ADDM (Application Discovery and Dependency Mapping) tools that scan your infrastructure looking for application components and the various communications to and from them. This method works okay (I have used it in the past) but typically requires a lot of manual data manipulation after the fact to improve the accuracy of the discovered information.

ADDM1

ADDM product view of application dependencies.

Another solution is to use an APM (Application Performance Management) tool to gather the information from within the running application. The right APM tool will automatically see all application instances (even in a dynamically scaled environment) as well as all of the communications into and out of the monitored application.

Distributed Application View

APM visualization of an application and it’s components with remote service calls.

Remote Services 1

APM tool list of remote application calls with response times, throughput and errors.

 

A combination of these two types of tools would provide the ultimate in accurate and easy to consume information (APM strength) along with flexibility to cover all of the one off custom application processes that might not be supported by an APM tool (ADDM strength).

2. Baselining performance and availability before the move – It’s critically important to understand the performance characteristics of your application before you move. This will provide the baseline required for comparison sake after you make the move. The last thing you want to do is degrade application performance and user satisfaction by moving an application. The solution here is leveraging the APM tool referenced in solution #1. This is a core strength of APM and should be leveraged from multiple perspectives:

  1. Overall application throughput, response times, and availability
  2. Individual business transaction throughput and response times
  3. External dependency throughput and response times
  4. Application error rate and type
Application overview and baseline

Application overview with baseline information.

transactions and baselines

Business transaction overview and baseline information.

3. Ensuring performance and availability after the move – Now that your application has moved to an outsourcer it’s more important than ever to understand performance and availability. Invariably your application performance will degrade and the finger pointing between you and your outsourcer will begin. That is, unless you are using an APM tool to monitor your application. The whole point of APM tools is to end finger pointing and to reduce mean time to restore service (MTRS) as much as possible. By using APM after the application move you will provide the highest level of service to your customers as possible.

Compare Releases

Comparison of two application releases. Granular comparison to understand before and after states. – Key Performance Indicators

Compare releases 2

Comparison of two application releases. Granular comparison to understand before and after states. – Load, Response Time, Errors

If you’re considering or in the process of transitioning to a digital enterprise you should seriously consider using APM to solve a multitude of problems. You can click here to sign up for a free trial of AppDynamics and get started today.

DevOps Scares Me – Part 3

Hey, who invited the developer to our operations conversation? In Devops Scares Me – Part 2 my colleague Dustin Whittle shared his developer point of view on DevOps. I found his viewpoint quite interesting and it made me realize that I take for granted the knowledge I have about what it takes to get an application into production in a large enterprise. As Dustin called out, there are many considerations including but not limited to code management, infrastructure management, configuration management, event management, log management, performance management, and general monitoring. In his blog post Dustin went on to cover some of the many tools available to help automate and manage all of the considerations previously mentioned. In my post I plan to explore if DevOps is only for loosely managed e-commerce providers or if it can really be applied to more traditional and highly regulated enterprises.

Out With the Old

In the operations environments I have worked in there were always strict controls on who could access production environments, who could make changes, when changes could be made, who could physically touch hardware, who could access what data centers, etc… In these highly regulated and process oriented enterprises the thought of blurring the lines between development and operations seems like a non-starter. There is so much process and tradition standing in the way of using a DevOps approach that it seems nearly impossible. Let’s break it down into small pieces and see if could be feasible.

Here are the basic steps to getting a new application built and deployed from scratch (from an operations perspective) in a stodgy Financial Services environment. If you’ve never worked in this type of environment some of the timing of these steps might surprise you (or be very familiar to you). We are going to assume this new application project has already been approved by management and we have the green light to proceed.

  1. Place order for dev, test, uat, prod, dr, etc… infrastructure. (~8 weeks lead time, all hardware ordered up front)
  2. Development team does dev stuff while us ops personnel are filling out miles of virtual paperwork to get the infrastructure in place. Much discussion occurs about failover, redundancy, disaster recovery, data center locations, storage requirements, etc… None of this discussion includes developers, just operations and architects…oops.
  3. New application is added to CMDB (or similar) to include new infrastructure components, application components, and dependencies.
  4. Operations is hopeful that the developers are making good progress in the 8 weeks lead time provided by the operational request process (actually the ops teams don’t usually even think about what dev might be working on). Servers have landed and are being racked and stacked. Hopefully we guessed right when we estimated the number of users, efficiency of code, storage requirements, etc… that were used to size this hardware. In reality we will have to see what happens during load testing and make adjustments (i.e. tell the developers to make it use fewer resources or order more hardware).
  5. We’re closing in on one week until the scheduled go-live date but the application isn’t ready for testing yet. It’s not the developers fault that the functional requirements keep changing but it is going to squeeze the testing and deployment phases.
  6. The monitoring team has installed their standard monitoring agents (usually just traditional server monitoring) and marked off that checkbox from the deployment checklist.
  7. It’s 2 days before go-live and we have an application to test. The load test team has coded some form of synthetic load to be applied to the servers. Functional testing showed that the application worked. Load testing shows slow response times and lot’s of errors. Another test is scheduled for tomorrow while the development team works frantically to figure out what went wrong with this test.
  8. One day until go-live, load test session 2, still some slow response time and a few errors but nothing that will stop this application from going into production. We call the load test a “success” and give the green light to deploy the application onto the production servers. The app is deployed, functional testing looks good, and we wait until tomorrow for the real test…production users!
  9. Go-Live … Users hit the application, the application pukes and falls over, the operations team check the infrastructure and gets the developers the log files to look at. Management is upset. Everyone is asking if we have any monitoring tools that can show what is happening in the application.
  10. Week one is a mess with the application working, crashing, restarting, working again, and new emergency code releases going into production to fix the problems. Week 2 and each subsequent week will get better  until new functionality gets released in the next major change window.
Nobody wins with a "toss it over the wall" mentality.

Nobody wins with a “toss it over the wall” mentality.

In With the New

Part of the problem with the scenario above is that the development and operations teams are so far removed from each other that there is little to no communication during the build and test phases of the development lifecycle. What if we took a small step towards a more collaborative approach as recommended by DevOps? How would this process change? Let’s explore (modified process steps are highlighted using bold font)…

  1. Place order for dev, test, uat, prod, dr, etc… infrastructure. (~8 weeks lead time, all hardware ordered up front)
  2. Development and operations personnel fill out virtual paperwork together which creates a much more accurate picture of infrastructure requirements. Discussions about failover, redundancy, disaster recovery, data center locations, storage requirements, etc… progress more quickly with better estimations of sizing and understanding of overall environment.
  3. New application is added to CMDB (or similar) to include new infrastructure components, application components, and dependencies.
  4. Operations is fully aware of the progress the developers are making. This gives the operations staff an opportunity to disucss monitoring requirements from both a business and IT perspective with the developers. Operations starts designing the monitoring architecture while the servers have arrived and are being racked and stacked. Both the development and operations teams are comfortable with the hardware requirement estimates but understand that they will have to see what happens during load testing and make adjustments (i.e. tell the developers to make it use fewer resources or order more hardware). Developers start using the monitoring tools in their dev environment to identify issues before the application ever makes it to test.
  5. We’re closing in on one week until the scheduled go-live date but the application isn’t ready for testing yet. It’s not the developers fault that the functional requirements keep changing but it is going to squeeze the testing and deployment phases.
  6. The monitoring team has installed their standard monitoring agents (usually just traditional server monitoring) as well as the more advanced application performance monitoring (APM) agents across all environments. This provides the foundation for rapid triage during development, load testing, and production.
  7. It’s 2 days before go-live and we have an application to test. The load test team has coded a robust set of synthetic load based upon application monitoring data gathered during development. This load is applied to the application which reveals some slow response times and some errors. The developers and operations staff use the APM tool together during the load test to immediately identify the problematic code and have a new release available by the end of the original load test. This process is repeated until the slow response times and errors are resolved.
  8. One day until go-live, we were able to stress test overnight and everything looks good. We have the green light to deploy the application onto the production servers. The app is deployed, functional testing looks good, business and IT metric dashboard looks good, and we wait until tomorrow for the real test…production users!
  9. Go-Live … Users hit the application, the application works well for the most part. The APM tool is showing some slow response time and a couple of errors to the developers and the operations staff. The team agrees to implement a fix after business hours as the business dashboard shows that things are generally going well. After hours the development and operations team collaborate on the build, test, and deploy of the new code to fix the issues identified that day. Management is happy.
  10. Week one is highly successful with issues being rapidly identified and dealt with as they come up. Week 2 and each subsequent week are business as usual and the development team is actively focused on releasing new functionality while operations adapts monitoring and dashboards when needed.
DevAndOps

Developers and operations personnel living together in harmony!

So what scenario sounds better to you? Have you ever been in a situation where increased collaboration caused more problems than it solved? In this example the overall process was kept mostly intact to ensure compliance with regulatory audit procedures. Developers were never granted access to production (regulatory issue for Financial Services companies) but by being tightly coupled with operations they had access to all of the information they needed to solve the issues.

It seems to me that you can make a big impact across the lifecycle of an application by implementing parts of the DevOps philosophy in even a minor way. In this example we didn’t even touch the automation aspects of DevOps. That’s where all of those fun and useful tools come into play so that is where we will pick up next time.

If you’re interested in adding an APM tool to your DevOps, development, or operations toolbox you can take a free self guided trial by clicking here and following the prompts.

Click here for DevOps Scares Me Part 4.

Intelligent Alerting for Complex Applications – PagerDuty & AppDynamics

Screen Shot 2013-04-16 at 2.39.00 PMToday AppDynamics announced integration with PagerDuty, a SaaS-based provider of IT alerting and incident management software that is changing the way IT teams are notified, and how they manage incidents in their mission-critical applications.  By combining AppDynamics’ granular visibility of applications with PagerDuty’s reliable alerting capabilities, customers can make sure the right people are proactively notified when business impact occurs, so IT teams can get their apps back up and running as quickly as possible.

You’ll need a PagerDuty and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of PagerDuty and AppDynamics online.  Once you complete this simple installation, you’ll start receiving incidents in PagerDuty created by AppDynamics out-of-the-box policies.

Once an incident is filed it will have the following list view:

incident

When the ‘Details’ link is clicked, you’ll see the details for this particular incident including the Incident Log:

incident_details

If you are interested in learning more about the event itself, simply click ‘View message’ and all of the AppDynamics event details are displayed showing which policy was breached, violation value, severity, etc. :

incident_message

Let’s walk through some examples of how our customers are using this integration today.

Say Goodbye to Irrelevant Notifications

Is your work email address included in some sort of group email alias at work and you get several, maybe even dozens, of notifications a day that aren’t particularly relevant to your responsibilities or are intended for other people on your team?  I know I do.  Imagine a world where your team only receives messages when the notifications have to do with their individual role and only get sent to people that are actually on call.  With AppDynamics & PagerDuty you can now build in alerting logic that routes specific alerts to specific teams and only sends messages to the people that are actually on-call.  App response time way above the normal value?  Send an alert to the app support engineer that is on call, not all of his colleagues.  Not having to sift through a bunch of irrelevant alerts means that when one does come through you can be sure it requires YOUR attention right away.

on_call_schedules

Automatic Escalations

If you are only sending a notification and assigning an incident to one person, what happens if that person is out of the office or doesn’t have access to the internet / phone to respond to the alert?  Well, the good thing about the power of PagerDuty is that you can build in automatic escalations.  So, if you have a trigger in AppDynamics to fire off a PagerDuty alert when a node is down, and the infrastructure manager isn’t available, you can automatically escalate and re-assign / alert a backup employee or admin.

escalation_policy

The Sky is Falling!  Oh Wait – We’re Just Conducting Maintenance…

Another potentially annoying situation for IT teams are all of the alerts that get fired off during a maintenance window.  PagerDuty has the concept of a maintenance window so your team doesn’t get a bunch of doomsday messages during maintenance.  You can even setup a maintenance window with one click if you prefer to go that route.

maintenance_window

Either way, no new incidents will be created during this time period… meaning your team will be spared having to open, read, and file the alerts and update / close out the newly-created incidents in the system.

We’re confident this integration of the leading application performance management solution with the leading IT incident management solution will save your team time and make them more productive.  Check out the AppDynamics and PagerDuty integration today!

AppDynamics & Splunk – Better Together

AppD & Splunk LogoA few months ago I saw an interesting partnership announcement from Foursquare and OpenTable.  Users can now make OpenTable reservations at participating restaurants from directly within the Foursquare mobile app.  My first thought was, “What the hell took you guys so long?” That integration makes sense on so many levels, I’m surprised it hadn’t already been done.

So when AppDynamics recently announced a partnership with Splunk, I viewed that as another no-brainer.  Two companies with complementary solutions making it easier for customers to use their products together – makes sense right?  It does to me, and I’m not alone.

I’ve been demoing a prototype of the integration for a few months now at different events across the country, and at the conclusion of each walk-through I’d get some variation of the same question, “How do I get my hands on this?”  Well, I’m glad to say the wait is over – the integration is available today as an App download on Splunkbase.  You’ll need a Splunk and AppDynamics license to get started – if you don’t already have one, you can sign up for free trials of Splunk and AppDynamics online.

Exceptional Performance Improvement

This post is part of our series on the Top Application Performance Challenges.

Over the past couple of weeks we helped a few DevOps groups troubleshoot performance problems in their production Java-based applications. While the applications themselves bore little resemblance to one another, they all shared one common problem. They were full of runtime exceptions.

When I brought the exceptions to the attention of some of the teams I got a variety of different answers.

  • “Oh yeah, we know all about that one, just some old debug info … I think”
  • “Can you just tell the APM tool to just ignore that exception because it happens way too much”
  • “I have no idea where that exception comes from, I think it’s the vendor/contractor/off shore team’s/someone else’s code”
  • “Sometimes that exception indicates an error, sometimes we use it for flow control”

The response was the same – because the exception did not affect the functionality of (read: did not break) the application it was deemed unimportant. But what about performance?

In one high-volume production application (100’s of calls/second) we observed an NPE (Null Pointer Exception) rate of more than 1000 exceptions per minute (17,800/15 min). This particular NPE occurred in the exact same line of code and propagated some 20 calls up in the stack before a catch clause handled it. The pseudo code for the line read if the string equaled null and the length of the trimmed string was zero then do x otherwise just exit. When we read the line out loud the problem was immediately obvious – you cannot trim a null string. While the team agreed that they needed to fix the code, they remained skeptical that simply changing ‘=’ to ‘!’ would really improve performance.

Using basic simulations we estimate this error rate imposes a 3-4% slow down on basic response time at a minimum. You heard me, 3-4% slow down at a minimum.

And these response issues get compounded in multi-threaded application server environments. When multiple pooled threads slow down to cope with error handling then the application has fewer resources available to process new requests. At the JVM level the errors cause more IO, more objects created in the heap, more garbage collection. In a multi-threaded environment the error threatens more than response times, it threatens load capacity.

And if the application is distributed across multiple JVM’s … well you get the picture.

The web contains many articles and discussions on the performance impact of Java’s exception handling. The examples in these posts provide sufficient data to show that a real performance penalty exists for throwing and catching exceptions. Doing the fix really does improve the performance!

Runtime Exceptions happen. When they occur frequently, they do appreciably slow down your application. The slowness becomes contagious to all transactions being served by the application. Don’t mute them. Don’t ignore them. Don’t dismiss them. Don’t convince yourself they are harmless. If you want a simple way to improve your applications performance, start by fixing up these regular occurring exceptions. Who knows, you just might help everyone’s code run a little faster.

Application Virtualization & A Free iPad!

Interested in winning an iPad? Take our Application Virtualization survey, and we’ll give you a more-than-decent shot at winning your very own!

In talking to our customers, AppDynamics recognizes that virtualizing mission-critical applications is at the top of everyone’s mind, even though some companies are at different stages of the process than others.

Many IT departments have already finished virtualizing their less critical apps and they realize that there are many benefits to be had in virtualizing the rest. But, in many cases, application owners are concerned about performance impacts to their mission-critical apps, and they’re saying, “Hands off my app.”

What’s it like at your organization? Is the path to app virtualization a smooth one, or are you experiencing some bumps along the way? We’d like to find out more, and we’d like your help to do it.

Answer a few short questions on your app virtualization strategy and be automatically entered to win an iPad. In addition, we’ll send you a copy of the results, so you can discover what your peers at other organizations are doing.

Find out:

– What percentage of Tier 1 apps have been virtualized by other companies
– Whether the virtualization project is a stepping stone towards the cloud
– What challenges are preventing people from finishing (or even starting) virtualization of Tier 1 apps

The survey is only 10 questions long–and again, entering gives you the possibility of winning an iPad, as well as a guaranteed copy of the survey results.

Take the survey now!

We look forward to reviewing your responses!

It’s All About The Business

I’ve never written a blog post before, but Jyoti tells me, from experience, that if you write about your passion, it’s a piece of cake. So here goes. My passion is to create technology to make application management easier and more productive so that companies can focus on delivering value instead of fretting about just being able to run apps 24×7. Since I intend to keep up this newly acquired habit of mine, let me start by establishing the fundamental tenet on which this technology we build at AppDynamics is based – Business Transactions.

You may think you’ve heard this discussion before, but I’d wager a guess that you’ve actually heard more about transactions than about business. I want to tell you why you actually need to put the focus on the business.

In today’s world where more and more businesses move to the web, the application is the face of the company. It is the business, it is the revenue stream. From DVD rentals to talent management, everything is an online application. Someone suggested to me recently that the only business they cannot use the internet for was getting a haircut. But I am sure they are working on it too!!

From the CIO to the ops team pushing the latest application release out, there needs to be a common context that drives the business unit towards a robust, efficient and a highly competitive application. That context is the business transaction.

Here are a few reasons why you should focus on the ‘business’ side of the business transaction:

1. When the application is the business, competitive advantage means having a better application. Now in an ideal world, when someone builds an application they would like it to be better than anything out there. But having an edge is not a one time thing. It’s constant evolution, which means rapid change and faster adaptation (Flickr does 10+ deployments a day!!). Dynamic changes to an application’s features and functions directly impact the business transaction and the overall user experience. At the end of the day, what you care about most is how your users are being serviced.

2. When the application is the business, managing an application is no longer just about monitoring CPU, JVM memory or timing key methods. It’s really about understanding the user experience, managing the business operations and creating service levels to ensure optimum performance. Business transactions are the binding factor for attaching SLAs to business operations and for creating a common ground between dev teams and ops teams. Focusing on the business transaction is a great way to validate the whole cycle including a rollout or the current state of the app at any given time.  Also, the growth of business means more capacity to serve. This means the growth of servers. So, instead of multiplying management complexity with more resources, a focus on user-centric SLAs is the only scalable way to address performance.

3. When the application is the business, the responsibility of managing applications (and hence the revenue stream) falls on IT Operations. But development is responsible for building the app and for the innovation in it. The creates an interdependence (and what we see as the DevOps movement). By creating a common language to categorize user requests – “Are the checkouts doing ok?” “Is the sales order processing running faster than before?” – these teams are better able to communicate and effectively manage application performance over time.

Using the business transaction as a unit of management for the application makes your company more agile, competitive, scalable and high performance. In future posts, I’ll review the various ways to approach overall application management and take a deeper dive on why a business transaction focus is critical to success.

At the end of the day, it should be all about the business!

Who owns the application?

The debate over who owns an application in production continues to unfurl.  What I’ve found interesting is, after a period of time where writers and bloggers were looking towards IT Operations to be application owners, I’ve started to detect a bit of a backlash.

The arguments for IT Operations owning the application is simple:

  • Development should stay focused on creating new applications and adding features, not maintaining what’s already in production
  • If you ask a developer, they’ll always say they’d rather spend their time on innovation, rather than production support or bug fixes
  • IT Operations generally oversees the production infrastructure (servers, storage, networks etc), so they are the natural caretakers of production applications as well

Analysts often point out that this is easier said that done: what is sometimes called the “required cooperation” between operations and development is often difficult to obtain.  I’ve also seen it suggested that putting production applications in the hands of operations is a Utopian dream, leading to performance issues and SLA violations.

If I were to describe my own view of “natural selection” in regards to managing application performance, it would be more along the lines of collaboration. The development team is likely to help support applications as long as IT Operations is able to provide them with the information and data they need to fix root-cause issues quickly.  Because much development is now done in agile environments, this sort of teamwork is becoming less of a philosophical choice and more of a business necessity.

If you look through blogs and Twitter, you’ll find some interesting grassroots movements such as #devops and #agileoperations.  These are communities forming that acknowledge the need to break down the traditional walls that exist between Dev and Ops, and radically restructure those relationships so that they are focused on shared goals and outcomes.

One devops proponent, James Turnbull at Kartar.net, explains the problem:

“So … why should we merge or bring together the two realms?  Well there are lots of reasons but first and foremost because what we’re doing now is broken.  Really, really broken.  In many shops the relationship between development (or engineering) and operations is dysfunctional to the point of occasional toxicity.”

(I love the phrase “occasional toxicity”…)

He goes on to add:

“DevOps is all about trying to avoid that epic failure and working smarter and more efficiently at the same time. It is a framework of ideas and principles designed to foster cooperation, learning and coordination between development and operational groups. In a DevOps environment, developers and sysadmins build relationships, processes, and tools that allow them to better interact and ultimately better service the customer.

“DevOps is also more than just software deployment – it’s a whole new way of thinking about cooperation and coordination between the people who make the software and the people who run it.  Areas like automation, monitoring, capacity planning & performance, backup & recovery, security, networking and provisioning can all benefit from using a DevOps model to enhance the nature and quality of interactions between development and operations teams.”

I believe that the question that comes naturally out of these conversations is this: does IT operations have the tools they need to facilitate this collaboration with their peers in development?  Traditionally, they haven’t.  The tools either didn’t provide much deep visibility into the application, or when they did provide deep visibility, they were extremely complicated for IT Operations to be able to understand and use. But creating those tools, and encouraging that collaboration, is one of my own company’s guiding principles.

Applications are becoming more complex and distributed, and development is increasingly taking place in the context of agile release cycles.  So really, the question isn’t “who owns the app”–but how best to foster the collaborative process that enables dev and ops to both build out applications and resolve their performance problems, and to do so in record time.

Is this thing on?

I haven’t written a blog post before, but I’m told that writing is easy if you write about your passions. So let me discuss why I founded AppDynamics.

I began the company because I saw a gap that I wanted to fill. The world is host to countless thousands of applications, and an almost equal number of people wanting to help IT professionals manage those applications. But somehow, no one ever made managing application performance easy. It was always difficult, complex, and costly.

In addition, most application performance companies were quickly falling behind the times. You may have heard the expression that “nothing dates like science fiction”–it’s easy to look at a science fiction film and instantly know when it was made. Similarly, application performance solutions quickly lose their luster if they’re not constantly adapting to the changing IT environment. What I knew was that distributed applications were becoming more, not less common–and that SOA, virtualization, and the cloud were becoming ever prevalent. Traditional monitoring companies couldn’t keep up with the demands of these new environments.

I wanted to start a company that brought relief to IT professionals who were still on the front lines, managing application performance, but no longer had tools equipped to do the job. I wanted to make managing a complex application as easy as reading a Google traffic map. I wanted to give them the ability to not only monitor application performance, but also find and fix root-cause problems. I wanted to combine traditional monitoring functions with the ability to orchestrate capacity in the cloud.

I won’t use this blog space to only talk about what my company does–I expect to discuss trends in the industry, issues facing IT operations and developers, and other topics that come to mind. But I will always circle back to the bottom line–simplifying the lives of IT professionals by making application performance easy.

Because if you’re supposed to write about your passion, then that is my passion.