The Incredible Extensible Machine Agent

Our users tell us all the time: The AppDynamics platform is amazing right out of the box. But everybody has something special they want to do, whether it’s to add some functionality, set up a unique monitoring scenario, whatever. That’s what makes AppDynamics’ emphasis on open architecture so important and useful. The functionality of the AppDynamics machine agent can be customized and extended to perform specific tasks to meet specific user needs, either through existing extensions from the AppDynamics Exchange or through user customizations.

It helps to understand what the machine agent is and how it works. The machine agent is a stand-alone java application that can be run in conjunction with application agents or separate from them. This means monitoring can be extended to environments outside the realm of the application being monitored. It can be deployed to application servers, databases servers, web servers — really anything running Linux, UNIX, Windows, or MAC.

Screen Shot 2014-08-21 at 9.03.06 AM

The real elegance of the machine agent is its tremendous extensibility. For non-Windows environments, there are three ways to extend the machine agent: through a script, with Java, or by sending metrics to the agent’s HTTP listener. If you have a .NET environment, you also have the capability of adding additional hardware metrics, over and above these three ways.

Let’s look at a real-life example. Say I want to create a extension using cURL that would give the HTTP status of certain websites. My first step is to look for one in the AppDynamics Exchange, our library of all the extensions and integrations currently available. It’s also the place one can request extensions that they need or submit extensions they have built.

Sure enough, there’s one already available (community.appdynamics.com/t5/AppDynamics-eXchange/idbp/extensions) called Site Monitor, written by Kunal Gupta. I decided to use it, and followed these steps to create my HTTP status collection functionality.

1. Download the extension to the machine agent on a test machine.
2. Edit the Site Monitor configuration file (site-config.xml) to ping the sites that I wanted (in this case www.appdynamics.com). The sites can also be HTTPS sites if needed.
3. Restart the machine agent.

That’s it. It started pulling in the status code right away and, as a bonus, also the response time for requesting the status code of the URL that I wanted.

Screen Shot 2014-08-21 at 9.02.55 AM

It’s great that I can now see the status code (200 in this case), but now I can truly use its power. I can quickly build dashboards displaying the information.

Screen Shot 2014-08-21 at 9.02.45 AM

There also is the ability to hook the status code into custom health rules, which provide alerts when performance becomes unacceptable.

Screen Shot 2014-08-21 at 9.02.35 AM
Screen Shot 2014-08-21 at 9.02.14 AM

So there it is. In just a matter of minutes, the extension was up and running, giving me valuable data about the ongoing status of my application. If the extension that I wanted didn’t exist, it would have been just as easy to use the cURL command (curl –sL –w “{http_code} \\n “ www.appdynamics.com -o /dev/null).

Either way, the machine agent can be extended to support your specific needs and solve specific challenges. Check out the AppDynamics Exchange to see what kinds of extensions are already available, and experiment with the machine agent to see how easily you can expand its capabilities.

If you’d like to try AppDynamics check out our free trial and start monitoring your apps today!

The Intelligent Approach to Production Monitoring

We get a lot of questions about our analytics-driven Application Performance Management (APM) collection and analysis technology. Specifically, people want to know how we capture so much detailed information while maintaining such low overhead levels. The short answer is, our agents are intelligent and know when to capture every gory detail (the full call stack) and when to only collect the basics for every transaction. Using an analytics-driven approach, AppDynamics is able to provide the highest level of detail to solve performance issues during peak application traffic times.

AppDynamics, An Efficient Doctor

AppDynamics’ APM solution monitors, baselines and reports on the performance of every single transaction flowing through your application. However, unlike other APM solutions that got their start in development environments, ours was built for production, which requires a more agile approach to capturing transaction details.

I’d like to share with you a story which illustrates AppDynamics analytics-based methodology and compares it with many of our competitors’ “capture as much detail as possible whether there are problems or not” (aka, our agents are too old to have intelligence built in) approach.

You visit Dr. AppDynamics for your regular health checkups. She takes your vital signs, records weight, measures reflexes and compares every metric taken against known good baselines. When your statistics are close to the baselines the doctor sends you home and sees the next patient without delay. When your health vitals deviate too far from the pre-established baselines the smart doctor orders more relevant tests to diagnose your problem. This methodology minimizes the burden on the available resources and efficiently and effectively diagnoses any issues you have.

In contrast, you visit Dr. Legacy for your regular health checkups. She takes your vital signs, records weight, measures reflexes and immediately orders a battery of diagnostic tests even though you are perfectly healthy. She does this for every single patient she sees. The medical system is now overburdened with extra work that was not required in the first place. This burden is slowing down the entire system so in order to ensure things move faster Dr. Legacy decides to reduce the amount of diagnostics tests being run on every single patient (even the ones with actual problems). Now the patients who have legitimate problems are going undiagnosed in the waiting room during the time when they need the most attention. In addition, due to the large amount of diagnostics testing and data being generated, the cost of care is driven up needlessly and excessively.

Does Dr. Legacy’s methodology make any sense to you when better methods exist?

AppDynamics intelligent approach to collecting data and inducing diagnostics makes it easier to spot outliers and, because deep diagnostic data is provided for only the transactions that require this level of detail, there is less impact on system resources and very little monitoring overhead.

Monitoring 100% of Your Business Transactions All the Time

AppDynamics monitors every single business transaction (BT) that flows through your applications. There is no exception to this rule. We automatically learn and develop a dynamic baseline for end-to-end response time as well as the response time of every component along the transaction flow, and also for all critical business metrics within your application.

We score each transaction by comparing the actual response time to the self-learned baseline. When we determine that a BT has deviated too far from normal behavior (using a tunable algorithm), our agent knows to automatically collect full call stack details for your troubleshooting pleasure. This analytics-based methodology allows AppDynamics to detect and alert on problems right from the start so they can be fixed before they cause a major impact.

Of course, there are times when deep data capture of every transaction is advantageous—such as during development—and the AppDynamics APM solution has another intelligent feature to address this need. We’ve built a simple, one-click button to enable full data recording system-wide. Developer mode is ideal for pre-production environments when engineers are profiling and load testing the application. Developer mode will capture a transaction snapshot for every single request. In production this would be overkill and wasteful. It’s even smart enough to know when you’re done using it and will automatically shut off when it is unintentionally left on, so your system won’t get bogged down if transaction volume increases.

Who Looks at Production Call Stacks When There are No Problems?

One of the worst qualities about legacy APM solutions is the fact that they collect as much data as they can, all the time. Usually this originates from the APM tool starting as a profiling tool for developers that has been molded to work in production. While this methodology is fine for development environments (we support this with dev-mode as described above), it fails miserably in any high volume scenario like load testing and production. Why does it fail? I’m glad you asked 😉

Any halfway decent APM tool has built-in overhead limiters to keep themselves from causing harm and introducing too much overhead within a running application. When you are collecting as much deep dive data as possible with no intelligent way of focusing your data collection you are inducing the maximum allowed overhead basically all the time (assuming reasonable load). The problem is that as your application load gets higher, this is the time when your problems are most likely to surface, and this is the time when legacy APM overhead is skyrocketing (due to massive amounts of code execution and deep collection being “always on”) so the overhead limiters kick in and reduce the amount of data being collected or kill off data collection altogether. In plain English this means that legacy APM tools can’t tell good transactions from bad and will provide you with the least amount of data at the time you need the most data. Isn’t it funny how marketing and sales teams try to turn this methodology into the best thing ever?

I have personally used many different APM tools in production and I never needed to look at a full call stack when there was no problem. I was too busy getting my job accomplished to poke around in mostly meaningless data just for the fun of it.

Distributed Intelligence for Massive Scalability

All of the intelligent data collection mentioned above requires a very small amount of extra processing to determine when to go deep and what to save. This is a place where the implementation details really make a difference.

At AppDynamics, we put the smarts where they are best suited to be – at the agent level. It’s a simple paradigm shift that distributes the workload across your install base (where it’s not even noticed) rather than concentrating it a single point. This important architectural design makes it so that as the load on the application goes up, the load on the management server remains low.

Contrasting this with legacy APM solutions, restricting whatever intelligence you have to the central monitoring server(s) causes higher resource requirements and therefore a monitoring infrastructure that requires more servers and greater levels of care and feeding.

Collecting, transmitting, storing, and analyzing large amounts of unneeded data comes with a high total cost of ownership (TCO). It takes a lot of people, servers, and storage to properly manage those legacy APM tools in an enterprise environment. Most APM vendors even want to sell you their expensive full time consultancy services just to manage their complex solutions. Intelligent APM tools ease your burden instead of increasing it like the legacy APM tools do.

All software tools go through transition periods where improvements are made and generational gaps are recognized. What was once cutting edge becomes hopelessly outdated unless you invest heavily in modernization. Hopefully this detailed look at APM methodologies helps you cut through the giant pile of sales and marketing propaganda that develops and IT ops folks are constantly exposed to. It’s important to understand what software vendors really do, but it’s most important to understand how they do it as it will have a major impact on real life usage.

Understanding Performance of PayPal as a Service (PPAAS)

In a previous post – Agile Performance Testing – Proactively Managing Performance – I discussed some of the challenges faced in managing a successful performance engineering practices in an Agile development model.  Let’s continue this with a real world example, highlighting how AppDynamics simplifies the collection and comparison of Key Performance Indicators (KPIs) to give visibility into an Agile development team’s integration with PayPal as a Service (PPaaS).

Our dev team is tasked with building a new shopping cart and checkout capability for an online merchant. They have designed a simple Java Enterprise architecture with a web front-end, built on Apache TomEE, a set of mid-tier services, on JBoss AS 7, and have chosen to integrate with PayPal as the backend payment processor. With PayPal’s Mobile, REST and Classic SDKs, integrating secure payments into their app is a snap and our team knows this is a good choice.

However, the merchant has tight Service Level Agreements (SLAs) and it’s critical the team proactively analyze, and resolve, performance issues in pre-production as part of their Agile process. In order to prepare for meeting these SLAs, they plan to use AppDynamics as part of development and performance testing for end-to-end visibility, and to collect and compare KPIs across sprints.

The dev team is agile and are continuously integrating into their QA test and performance environment. During one of the first sprints they created a basic checkout flow, which is shown below:

Screen Shot 2014-04-29 at 9.28.21 AM

For this sprint they stubbed several of the service calls to PayPal, but coded the first step in authenticating — getting an OAuth Access Token, used to validate payments.

Enabling AppDynamics on their application was trivial, and the dev team got immediate end-to-end visibility into their application flow, performance timings across all tiers, as well as the initial call to PayPal. Based on some initial performance testing everything looks great!

Untitled1

NOTE: in our example AppDynamics is configured to identify backend HTTP Requests (REST Service Invocations) using the first 3 segments of the target URL. This is an easy change and the updated configuration is automatically pushed to the AppDynamics agent without any need to change config files, or restart the application.

Untitled2

In a later sprint, our dev team finished integrating the full payments process flow. They’re using PayPal’s SDK and while it’s a seamless integration, they’re unclear exactly what calls to PayPal are happening under the covers.

Because AppDynamics automatically discovers, maps, and scores all incoming transactions end-to-end, our dev team was able to get immediate and full visibility into two new REST invocations, authorization and payment.

Untitled3

The dynamic discovery of AppDynamics is extremely important in an Agile, continuous integration, or continuous release models where code is constantly changing. Having to manually configure what methods to monitor is a burden that degrades a team’s efficiency.

Needing to understand performance across the two sprints, the team leverages AppDynamics’ Compare Releases functionality to quickly understand the difference between performance runs across the sprints.

Untitled4

AppDynamics flow map visualize the difference in transaction flow between the sprints, highlighting the additional REST calls required to fully process the payment. Also, the KPI comparison gives the dev team an easy way to quickly measure the differences in performance.

Untitled5

Performance has changed, as expected, when implementing the full payment processing flow. During a performance test AppDynamics automatically identifies and takes diagnostics on the abnormal transactions.

Untitled6

Transaction Snapshots capture source line of code call graphs, end-to-end across the Web and Service tiers. Drilling down across the call graphs, the dev team clearly identifies the payment service as the long running call.

Untitled7

AppDynamics provides full context on the REST invocation, and highlights the SDK was configured to talk to PayPal’s sandbox environment, explaining the occasional high-response times.

To recap, our Agile dev team leveraged AppDynamics to get deep end-to-end visibility across their pre-production application environment. AppDynamics release comparison provided the means to understand differences in the checkout flows across sprints, and the dynamic discovery, application mapping, and automatic detection allowed the team to quickly understand and quantify their interactions with PayPal. When transactions deviated away from normal, AppDynamics automatically identified and captured the slowness to provide end-to-end source line of code root-cause analysis.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

Agile Performance Testing – Proactively Managing Performance

Just in case you haven’t heard, Waterfall is out and Agile is in.  For organizations that thrive on innovation, successful agile development and continuous deployment processes are paramount to reducing go to market time, fast tracking product enhancements and quickly resolving defects.

Executed successfully, with the right team in place, Agile practices should result in higher functional product quality.  Operating in small, focused teams that work well-defined sprints with clearly groomed stories is ideal for early QA involvement, parallel test planning and execution.

But how do you manage non-functional performance quality in an Agile model?  The reality is that traditional performance engineering, and testing, is often best performed over longer periods of time; workload characterizations, capacity planning, script development, test user creation, test data development, multi-day soak tests and more… are not always easily adaptable into 2-week, or shorter, sprints.  And the high-velocity of development change often cause continuous, and sometimes large, ripples that disrupt a team’s ability to keep up with these activities; anyone ever had a data model change break their test dataset?

Before joining AppDynamics I faced this exact scenario as the Lead Performance Engineer for PayPal’s Java Middleware team.  PayPal was undergoing an Agile transformation and our small team of historically matrix aligned, specialty engineers, was challenged to adapt.

Here are my best practices and lessons learned, sometimes the hard way, of how to adapt performance-engineering practices into an agile development model:

  1. Fully integrate yourself into the Sprint team, immediately.  My first big success at PayPal was the day I had my desk moved to sit in the middle of the Dev team.  I joined the water cooler talk, attended every standup, shot nerf missiles across the room, wrote and groomed stories as a core part of the scrum team.  Performance awareness, practices, and results organically increased because it was a well represented function within the team as opposed to an after thought farmed out to a remote organization.
  2. Build multiple performance and stress test scenarios with distinct goals and execution schedules.  Plan for longer soak and stress tests as part of the release process, but have one or more per-sprint, and even nightly, performance tests that can be continually executed to proactively measure performance, and identify defects as they are introduced.  Consider it your mission to quantify the performance impact of a code change.
  3. Extend your Continuous Integration (CI) pipelines to include performance testing.  At PayPal, I built custom integrations between Jenkins and JMeter to automate test execution and report generation.  Our pipelines triggered automated nightly regressions on development branches and within a well-understood platform where QA and development could parameterize workload, kick-off a performance test and interpret a test report.  Unless you like working 18-hour days, I can’t overstate the importance of building integrations into tools that are already or easily adopted by the broader team.  If you’re using Jenkins, you might take a look at the Jenkins Performance Plugin.
  4. Define Key Performance Indicators (KPIs).  In an Agile model you should expect smaller scoped tests, executed at a higher frequency.  It’s critical to have a set of KPIs the group understands, and buys into, so you can quickly look at a test and interpret if a) things look good, or b) something funky happened and additional investigation is needed. Some organizations have clearly defined non-functional criteria, or SLAs, and many don’t. Be Agile with your KPIs, and refine them over time. Here are some of the KPIs we commonly evaluated:
  • Percentile Response-Time – 90th, 95th, 99th – Summary and Per-Transaction
  • Throughput – Summary and Per-Transaction
  • Garbage Collector (GC) Performance – % non-paused time, number of collections (major and minor), and collection times.
  • Heap Utilization – Young Generation and Tenured Space
  • Resource Pools – Connection Pools and Thread Pools

5. Invest in best of breed tooling.  With higher velocity code change and release schedules, it’s essential to have deep visibility into your performance environment. Embrace tooling, but consider these factors impacted by Agile development: 

  • Can your toolset automatically, and continuously discover, map and diagnose failures in a distributed system without asking you to configure what methods should be monitored?  In an Agile team the code base is constantly shifting.  If you have to configure method-level monitoring, you’ll spend significant time maintaining tooling, rather than solving problems.
  • Can the solution be enabled out of the box under heavy loads?  If the overhead of your tooling degrades performance under high loads, it’s ineffective in a performance environment.  Don’t let your performance monitoring become your performance problem.

When a vendor recommends you reduce monitoring coverage to support load testing, consider a) the effectiveness of a tool which won’t provide 100% visibility, and b) how much time will be spent consistently reconfiguring monitoring for optimal overhead.

Performance testing within an Agile organization challenges us as engineers to adapt to a high velocity of change.  Applying best practices gives us the opportunity to work as part of the development team to proactively identify and diagnose performance defects as code changes are introduced.  Because the fastest way to resolve a defect in production is to fix it before it gets there.

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

Quantifying the value of DevOps

In my experience when you work in IT the executive team rarely focuses on your team until you experience a catastrophic failure – once you do you are the center of attention until services are back to normal. It is easy to ignore the background work that IT teams spend most of their days on just to keep everything running smoothly. In this post I will discuss how to quantify the value of DevOps to organizations. The notion of DevOps is simple: Developers working together with Operations to get things done faster in an automated and repeatable way. If the process is working the cycle looks like:

DevOps

DevOps consists of tools, processes, and the cultural change to apply both across an organization. In my experience in large companies this is usually driven from the top down, and in smaller companies this comes organically from the bottom up.

When I started in IT I worked as a NOC engineer for a datacenter. Most my days were spent helping colocation customers install or upgrade their servers. If one of our managed servers failed it was my responsibility to fix it as fast as possible. Other days were spent as a consultant helping companies manage their applications. This is when most web applications were simple with only two servers – a database and an app server:

monolithic_app

As I grew in my career I moved to the engineering side and worked developing very large web applications. The applications I worked on were much more complex then what I was used to in my datacenter days. It is not just the architecture and code that is more complex, but the operational overhead to manage such large infrastructure requires an evolved attitude and better tools.

distributed_app

When I built and deployed applications we had to build our servers from the ground up. In the age of the cloud you get to choose which problems you want to spend time solving. If you choose an Infrastructure as a service provider you own not only your application and data, but the middleware and operating system as well. If you pick a platform as a service you just have to support your application and data. The traditional on-premise option while giving you the most freedom, also carries the responsibility for managing the hardware, network, and power. Pick your battles wisely:

Screen Shot 2014-03-12 at 11.50.15 AM

As an application owner on a large team you find out quickly how well a team works together. In the pre-DevOps days the typical process to resolve an operational issues looked like this:

Screen Shot 2014-03-12 at 11.49.50 AM

1)     Support creates a ticket and assigns a relative priority
2)     Operations begins to investigate and blames developers
3)     Developer say its not possible as it works in development and bounces the ticket back to operations
4)     Operations team escalates the issue to management until operations and developers are working side by side to find the root cause
5)     Both argue that the issue isn’t as severe as being stated so they reprioritize
6)     Management hears about the ticket and assigns it Severity or Priority 1
7)     Operations and Developers find the root cause together and fix the issue
8)     Support closes the ticket

Many times we wasted a lot of time investigating support tickets that weren’t actually issues. We investigated them because we couldn’t rely on the health checks and monitoring tools to determine if the issue was valid. Either the ticket couldn’t be reproduced or the issues were with a third-party. Either way we had to invest the time required to figure it out. Never once did we calculate how much money the false positives cost the company in man-hours.

Screen Shot 2014-03-12 at 11.50.35 AM

With better application monitoring tools we are able to reduce the number of false positive and the wasted money the company spent.

How much revenue did the business lose?

noidea

I never once was able to articulate how much money our team saved the company by adding tools and improving processes. In the age of DevOps there are a lot of tools in the DevOps toolchain.

By adopting infrastructure automation with tools like Chef, Puppet, and Ansible you can treat your infrastructure as code so that it is automated, versioned, testable, and most importantly repeatable. The next time a server goes down it takes seconds to spin up an identical instance. How much time have you saved the company by having a consistent way to manage configuration changes?

By adopting deployment automation with tools like Jenkins, Fabric, and Capistrano you can confidently and consistently deploy applications across your environments. How much time have you saved the company by reducing build and deployment issues?

By adopting log automation using tools such as Logstash, Splunk, SumoLogic and Loggly you can aggregate and index all of your logs across every service. How much time have you saved the company by not having to manually find the machine causing the problem and retrieve the associated logs in a single click?

By adopting application performance management tools like AppDynamics you can easily get code level visibility into production problems and understand exactly what nodes are causing problems. How much time have you saved the company by adopting APM to decrease the mean time to resolution?

By adoption run book automation through tools like AppDynamics you can automate responses to common application problems and auto-scale up and down in the cloud. How much time have you saved the company by automatically fixing common application failures with out even clicking a button?

Understanding the value these tools and processes have on your organization is straightforward:

devops_tasks

DevOps = Automation & Collaboration = Time = Money 

When applying DevOps across your organization the most valuable advice I can give is to automate everything and always plan to fail. A survey from RebelLabs/ZeroTurnaround shows that:

1)     DevOps teams spend more time improving things and less time fixing things
2)     DevOps teams recover from failures faster
3)     DevOps teams release apps more than twice as fast

 

How much does an outage cost in your company?

This post was inspired by a tech talk I have given in the past: https://speakerdeck.com/dustinwhittle/devops-pay-raise-devnexus

 

 

Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

The New Generation of Enterprise Java: Designing for the Next Big Thing

https://www.youtube.com/watch?v=ytmZ4SF3hKI

There’s been a generational shift in how Java enterprise applications are created: they have been broken down from a monolithic architecture into multiple services, and they’re highly interconnected and distributed. How can Java developers and Operations teams adapt to these changes?

This keynote will discuss the 4 Big Things that Java professionals need to design for now:

  • Cloud: Most applications built will have some part of its service in the cloud
  • Big Data: With the advent of NoSQL, Hadoop, and distributed caches, how should we now approach the data layer?
  • Agile Development & Operations: Developers won’t just be responsible for the code, but how it’s deployed. How does that affect the DevOps relationships?
  • Failure is an option: Distributed systems won’t just invite but demand failure, so how can failure become part of the initial design?

This talk will present recommended strategies and approaches for these new design imperatives.

You can watch the keynote here

 

Why I Joined The Leading APM Provider AppDynamics

A new year, a new iPhone and a new quarter. What else is new? How about a new company?

Last month I was fortunate enough to join a stellar marketing team at one of the fastest growing enterprise software startups in the bay area. The company you ask? AppDynamics, and did I mention we’re also the leading next generation Application Performance Management (APM) provider for modern architectures in distributed, cloud, virtualized and on-premise environments? We exceeded our targets for 2011 achieving an astonishing 400% growth in bookings. Not too shabby for being the new kid on the block in a competitive market already inundated with vendors. You have old school APM tools from megavendors like CA, HP and Compuware (was dynaTrace). Then you have the new school breed such as New Relic and AppDynamics. In fact, Gartner’s MQ lists over twenty vendors. So with such a crowded market why did I even consider such a move?

Well there’s a laundry list of reasons, but here are the top ones that come to mind.

1. Business Innovation. This is another kind of BI not just Business Intelligence. It’s really a breath of fresh air to be working with an organization that is not only obsessed with pumping out insanely great technology every few quarters or so, but also open to embracing innovative approaches to every discipline of the business including creative marketing and sales strategies. Often times enterprise software companies unabashedly attempt to cloak themselves in slideware selling a “vision” or an enterprise solution poles apart from reality. Unfortunately when it comes down to an actual evaluation, you end up having to attend a dozen meetings just to see an applicable demo, a one week to two month proof-of-concept followed by throwing millions of dollars at consulting and implementation services, which segues to my next point.

2. Ease-of-Use. This simple yet powerful concept has been repeatedly neglected or intentionally ignored by many enterprise software companies. Luckily, the Leaders of the New School such as Apple, Salesforce, Box, etc. (not Busta Rhymes group) have changed the way end users value an intuitive user interface and design. At AppDynamics, we’ve adopted a similar mindshare. “Easy” is the new world order in this industry because the managers, engineers and folks in IT operations are encountering enough complexity as it is with these modern architectures. I doubt the last thing that they want is another tool to further complicate their lives causing more frustration on the job. At the end of the day everyone is a consumer – the least common denominator – who wants to use software that helps us demystify our lives and makes us successful at our jobs (unless you’re a sadist).

Software that is easy to install, implement and use can have a tremendous impact on the bottom-line of a business. Suppose you end up rolling out a new system but end up having to spend a chunk of company change on implementation and training costs. What impact does that have on your productivity and ultimately your company’s bottom-line? Here’s an example from Avon’s Q3, 2011 earnings transcript,

“Despite extensive pre-implementation testing, we had greater than anticipated implementation challenges in the go-live. Significantly higher business complexity in this market contributed to a greater than expected level of disruption, as I said, when we went to the go-live environment.”

Many vendors make enterprise deployments akin to embarking on an IT version of manifest destiny. I’m sure you can think of a few applications in your own IT toolbox that fit the bill where at some point you ended up asking yourself, “Why can’t this be as easy as [fill in the blank with some consumer app]?”  Fig. 2. See empathetic frustrated user to your left.

That was compelling enough for me to join AppDynamics. We truly understand the business significance as to why software ought to be easy 360 degrees around especially in production. I’m not saying that the work designers and developers have to do to achieve this “Easy” goal is easy in itself. I have an unrequited love for the folks in engineering who possess the talent and perseverance in coding applications, but that doesn’t excuse a vendor from selling you a dream and then leaving you stranded to implement a nightmare all because there wasn’t enough emphasis on ease-of-use.

3. Application Performance. This one is near and dear to my heart and arguably the main reason for me to join AppDynamics. It takes me back to the challenging days and sleepless nights I endured while working on a massive global PDM implementation at LG Electronics jointly with Dassault Systemes. The year was 2008. Skynet hadn’t become self-aware yet. App Man was just A Man in the throes and woes of IT operations, and half way around the world over in Seoul, Korea I was managing juggling recurring performance issues on a weekly basis with our PMO having to answer to the beck and call of the LGE CIO. The project’s launch date had been delayed due to various complications with the implementation (that’s a whole other story). Any ideas what one of those might have entailed? If you guessed “performance”, congratulations! You’ve won! Download your free copy AppDynamics Lite.

Every week new customizations were being released from R&D back in the states, PS in Korea and SI’s sitting on the other side of the room. You could call it Agile development’s nemesis, frAgile development. The dynamic nature of our java-based environment only introduced more challenges to the performance team who were heads-down trying to reverse engineer someone else’s code and refactor it using APM tools that just didn’t provide us with the full visibility we needed to comprehensively profile and diagnose application performance issues (using JenniferSoft). In fact, one of the consultants on our team ended up creating his own profiler to expose these blind spots, but what we really needed was a next-generation APM tool that would visually map and connect the dots for us like the one below.

Then we ran into another stumbling block after we completed migrating legacy data to a new “production” environment. When the time came to retest the entire set of performance use cases in this new environment we experienced all kinds of performance regressions. Since everyone was collaborating so well with each other for over the past two years, we all cheerfully marched forward without any finger pointing as to what the root cause was. Ok, so it wasn’t that utopian. Fortunately, because of everyone’s undying commitment and personal sacrifices, the project went live successfully in mid 2010 with over 2,000 users visiting the system per day. In hindsight, we could have easily saved a month’s worth had we used a better tool thereby eliminating the usual suspects.

From that experience I’ve come to appreciate and understand how business-critical managing application performance is for any company. Now I am on a mission to spread the word of AppDynamics to help companies manage rapidly evolving, distributed environments.

Buckle up 2012, we’re just getting started.