If you can’t see it, you can’t manage it – ITOA use case #1

“There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing…,” – Eric Schmidt, Former CEO, Google.

If IT leaders hadn’t already heard Schmidt’s famous quotation, today they are definitely facing the challenge he describes. Gone are the days when IT leaders were tasked with just keeping an organization running, now IT teams are charged with driving innovation. As businesses become defined by the software that runs them, IT leaders must not only collect and try to make sense of the increasing amount of information these systems generate, but leverage this data as a competitive advantage in the marketplace. This type of competitive advantage may come in many forms, but generally speaking, the more IT leaders know about their environments and the ways end users interact with them, the better off they (and the business) will be. Gleaning this type of insight from IT environments is what analysts refer to as IT Operations Analytics (ITOA). ITOA solutions collect the structured and unstructured data generated by IT environments, process that data, and display the information in an actionable way so operations teams can make better informed decisions in real-time. I’d like to discuss five common ITOA use cases we see across our customer base in this series, starting with visualizing your environment. In the rest of this series I’ll examine each of the other use cases and describe how a solution like the Application Intelligence Platform can address each and in turn provide value for operations teams.

The five common ITOA use cases I’ll delve into are:

  • Visualize the environment
  • Rapid troubleshooting
  • Prioritize issues and opportunities
  • Analyze business impact
  • Create action plans

Visualizing the environment

The first use case refers to the ability for an ITOA system to model infrastructure and / or the application stack being monitored. These models vary in nature but oftentimes are topological representations of the environment. Being able to visualize the application environment and see the dependencies is an important foundation for the rest of the use cases on this list.

In the Summer ‘14 release announcement blog, we highlighted the enhancements we’ve made in regard to our flow maps, which is the visual representation of the application environment, including application servers, databases, web services, and more.

What’s great about the AppDynamics approach is that this flow map is discovered automatically out of the box, unlike legacy monitoring solutions that require significant manual configuration to get the same kind of view. We also automatically adjust this flow map on the fly when your application changes (re-architected app, code release, etc.). Because we know all the common entry and exit points of each node, we simply tag and trace the paths the different user requests take to paint a picture of the flow of requests and all the interactions between different components inside the application. Most customers see something like the flow map below within minutes of installing AppDynamics in their environment.
Screen Shot 2014-12-11 at 8.41.42 AM
Now a flow map like this is obviously very valuable, but what happens when the application environment is very large and complex? How does this kind of view scale for the kinds of enterprise applications many AppDynamics customers have deployed? Environments with thousands of nodes and potentially hundreds of tiers? Luckily for our customers, the Application Intelligence Platform was built from the ground up to handle these kinds of environments with ease. There are two characteristics of our flow maps that enable operations teams to manage flow maps of large-scale application performance management deployments; self-aggregation and self-organizing layouts.

Self-aggregation refers to our powerful algorithms that make these complex environments more manageable by condensing and expanding the visualization to enable intelligent zooming in and zooming out of the topology of the application. This allows us to automatically deliver the right level of application health indicators to match the zoom level.

For example, this is what a complex application could look like when zoomed all the way out:
Screen Shot 2014-12-11 at 8.41.51 AM
As one zooms in, relevant metrics information becomes visible:
Screen Shot 2014-12-11 at 8.42.00 AM
Until you are zoomed all the way in on a particular tier and can see all of the associated metrics you’d care about:
Screen Shot 2014-12-11 at 8.42.08 AM
The ability to iterate back and forth between a macro-level view of the application and a close-up of a particular part of the environment gives operations teams the visibility they need to understand exactly how an application functions and how the different components interact with each other.

Self-organizing layouts relates to our ability to automatically format the service and tier dependencies by using auto-grouping heuristics to dynamically determine tier and node weightages. By leveraging static data (like application tier profiles) and dynamic KPIs (like transaction response times) we organize the business-critical tiers in a way that brings the most important parts of the application to the forefront depending on the type of layout you prefer.

One can automatically group the flow map into a circular view:
Screen Shot 2014-12-11 at 8.42.18 AM
You can let AppDynamics suggest a layout:
Screen Shot 2014-12-11 at 8.42.27 AM
You can create a custom layout just by dragging and dropping individual components:
Screen Shot 2014-12-11 at 8.42.36 AM
And you can auto-fit your layout to the screen for efficient zooming in / out:
Screen Shot 2014-12-11 at 8.42.45 AM
You’ve seen how AppDynamics can visualize individual applications, but what if, like many of our large enterprise customers, you have many different complex applications that have dependencies on one or more other applications? How does one obtain a data-center view to understand, at a high level, what application health looks like across all applications?

With the cross-app business flow feature, customers can do just that. AppDynamics even supports role-based access control (RBAC) so administers can limit user access to a particular application. We allow customers to group, define, and limit access to applications however makes the most sense for their individual environments and for their business.

Screen Shot 2014-12-11 at 8.42.54 AM

As you can see, AppDynamics provides a great way for IT Operations teams to discover and visualize their application environment. We automatically map the application out of the box, we provide flexible layout options so customers can customize the view to their liking, and offer a way for Ops teams to understand how different applications interact with each other.

In the next post in this series, we’ll discuss how the Application Intelligence platform can address the second common ITOA use case, rapid troubleshooting. In the meantime, I encourage you to sign up for free and try AppDynamics for yourself.

Focusing on Business Transactions is a Proven Best Practice

In today’s software-defined business era, uptime and availability are key to the business survival. The application is the business. However, ensuring proper application performance remains a daunting task for their production environments, where do you start?

Enter Business Transactions.

By starting to focus on the end-user experience and measuring application performance based on their interactions, we can correctly gauge how the entire environment is performing. We follow each individual user request as it flows through the application architecture, comparing the response time to its optimal performance. This inside-out strategy allows AppDynamics to instantly identify performance bottlenecks and allows application owners to get to the root-cause of issues that much faster.
output_R1SFaZ

By becoming business transaction-centric application owners can ensure uptime and availability even within a challenging application environment. Business transactions give them the insight that’s required in order to react to quickly changing conditions and respond accordingly.

So, what exactly is a Business Transaction?

Simply: any and every online user request.

Consider a business transaction to be a user-generated action within your system. The best practice for determining the performance of your application isn’t to measure CPU usage, but to track the flow of a transaction that your customer, the end user, has requested.

It can be requests such as:

  • logging in
  • adding an item to a cart
  • checking out
  • searching for an item
  • navigating different tabs
  • Shifting your focus to business transactions completely changes the game in terms of your ability to support application performance.

    Business Transactions and APM

    Business Transactions equip application owners with three important advantages.

    Knowledge of User Experience

    If a business transaction is a “user-generated action,” then it’s pretty clear how monitoring business transactions can have a tremendous effect on your ability to understand the experience of your end user.

    If your end user adds a book to a shopping cart, is the transaction performing as expected or is it taking 3 seconds longer? (And what kind of impact will that have on end users? Will they decide to surf away and buy books somewhere else, thus depriving your business of not just the immediate purchase but the potential loss of lifetime customer revenue?)

    Monitoring business transactions gives you a powerful insight into the experience of your end user.

    Service Assurance – the ability to track baseline performance metrics

    AppDynamics hears from our clients all the time that it’s difficult to know what “normal” actually is. This is particularly true in an ever-changing application environment. If you try to determine normal performance by correlating code-level metrics – while at the same time reacting to frequent code drops – you will never get there.

    Business transactions offer a Service Assurance constant that you can use for ongoing monitoring. The size of your environment may change and the number of nodes may come and go, but by focusing on business transactions as your ongoing metric, you can begin to create baseline performance for your application. Understanding this baseline performance is exactly what you need in order to understand whether your application is running as expected and desired, or whether it’s completely gone off the rails.

    For example, you may have a sense of how your application is supposed to perform. But do you really know how it performs every Sunday at 6 p.m.? Or the last week of December? And if you don’t, how will you know when the application is deviating from acceptable performance? It’s figuring out “normal” in terms of days, weeks, and even seasons that you need to truly understand your application’s baseline performance.

    Triage & Diagnosis – always knowing where to look to solve problems

    Finally, when problems occur, business transactions prevent you from hunting through logs and swimming through lines of code. The transaction’s poor performance immediately shines a spotlight on the problem – and your ability to get to root cause quickly is dramatically improved.

    If you’re tracking code-level metrics in a large environment instead of monitoring business transactions, the chances are that the fire you’re troubleshooting is going to roar out of hand before you’re able to douse it.

    Summary

    Application owners are under extraordinary pressure to incorporate frequent code changes while still being held responsible for 100% application uptime and performance. In a distributed and rapidly changing environment, meeting these high expectations becomes tremendously challenging.

    A strong focus on business transactions becomes absolutely essential for maintaining application performance. Transaction-centric monitoring provides the basis for a stable performance assurance metric, it delivers powerful insights into user experience, and it ensures the ability to know where to hunt during troubleshooting.

    The right APM solution can automate much of this work. It can help application owners identify and bucket their business transactions, as well as assist with triage, troubleshooting, and root cause diagnosis when transactions violate their performance baselines. In this way, business transactions are essential to ensuring the success of Developers, Operations, and Architects – anyone with a stake in application performance.

    The Incredible Extensible Machine Agent

    Our users tell us all the time: The AppDynamics platform is amazing right out of the box. But everybody has something special they want to do, whether it’s to add some functionality, set up a unique monitoring scenario, whatever. That’s what makes AppDynamics’ emphasis on open architecture so important and useful. The functionality of the AppDynamics machine agent can be customized and extended to perform specific tasks to meet specific user needs, either through existing extensions from the AppDynamics Exchange or through user customizations.

    It helps to understand what the machine agent is and how it works. The machine agent is a stand-alone java application that can be run in conjunction with application agents or separate from them. This means monitoring can be extended to environments outside the realm of the application being monitored. It can be deployed to application servers, databases servers, web servers — really anything running Linux, UNIX, Windows, or MAC.

    Screen Shot 2014-08-21 at 9.03.06 AM

    The real elegance of the machine agent is its tremendous extensibility. For non-Windows environments, there are three ways to extend the machine agent: through a script, with Java, or by sending metrics to the agent’s HTTP listener. If you have a .NET environment, you also have the capability of adding additional hardware metrics, over and above these three ways.

    Let’s look at a real-life example. Say I want to create a extension using cURL that would give the HTTP status of certain websites. My first step is to look for one in the AppDynamics Exchange, our library of all the extensions and integrations currently available. It’s also the place one can request extensions that they need or submit extensions they have built.

    Sure enough, there’s one already available (community.appdynamics.com/t5/AppDynamics-eXchange/idbp/extensions) called Site Monitor, written by Kunal Gupta. I decided to use it, and followed these steps to create my HTTP status collection functionality.

    1. Download the extension to the machine agent on a test machine.
    2. Edit the Site Monitor configuration file (site-config.xml) to ping the sites that I wanted (in this case www.appdynamics.com). The sites can also be HTTPS sites if needed.
    3. Restart the machine agent.

    That’s it. It started pulling in the status code right away and, as a bonus, also the response time for requesting the status code of the URL that I wanted.

    Screen Shot 2014-08-21 at 9.02.55 AM

    It’s great that I can now see the status code (200 in this case), but now I can truly use its power. I can quickly build dashboards displaying the information.

    Screen Shot 2014-08-21 at 9.02.45 AM

    There also is the ability to hook the status code into custom health rules, which provide alerts when performance becomes unacceptable.

    Screen Shot 2014-08-21 at 9.02.35 AM
    Screen Shot 2014-08-21 at 9.02.14 AM

    So there it is. In just a matter of minutes, the extension was up and running, giving me valuable data about the ongoing status of my application. If the extension that I wanted didn’t exist, it would have been just as easy to use the cURL command (curl –sL –w “{http_code} \\n “ www.appdynamics.com -o /dev/null).

    Either way, the machine agent can be extended to support your specific needs and solve specific challenges. Check out the AppDynamics Exchange to see what kinds of extensions are already available, and experiment with the machine agent to see how easily you can expand its capabilities.

    If you’d like to try AppDynamics check out our free trial and start monitoring your apps today!

    Transforming IT: Building a business-driven infrastructure for the software defined business

    Executives charged with building business-driven applications have an extremely challenging task ahead of them. However, the cavalry has arrived with useful tools and strategies built specifically to keep modern applications working efficiently.

    We partnered with Gigaom Research to carefully grasp, and articulate, how these modern methodologies are improving the lives of IT professionals in today’s software-driven businesses. Typically, this knowledge has been so fragmented it’s been hard to find all this helpful knowledge in one cohesive area. Several blogs and research reports touch on various aspects, but what we learned from our research has been astounding.

    We carefully identified these challenges as the major hurdles facing IT today:

    • Customers are digital and connected
    • Business demand is growing
    • Apps are complex, distributed, and changing rapidly
    • Traditional app performance management is growing

    Clearly these have become major issues affecting companies everywhere, however more importantly, these are affecting end-users and in turn company’s bottom lines. Customers have grown accustomed to expect things instantly and when apps are performing adequately, they will quickly take their business elsewhere.

    Here are some key takeaways we noticed:

    • Customer experience is driving business performance
    • Proactively managing this experience requires new methods and tools
    • Modernize your infrastructure and approaches, but don’t forget the humans
    • Analytics is rapidly changing, fueled by the growth of big data

    This report highlights the value of proactively managing the customer experience with new methods and tools built for modern, complex applications in order to help drive business performance.

    Interested in the next-gen IT strategy and trends, check out the report!

    AppDynamics Brings Big Data Science to APM in Summer ’14 Release

    Today I am pleased to announce the availability of the AppDynamics Summer ‘14 Release.  With this release, AppDynamics brings the first event store to capture and process big data streams in real-time to the APM industry.  Large and complex applications generate data at an extremely high velocity, requiring a monitoring platform to scale along with them. Many business critical applications and operational insights are hidden in the data generated by these applications. This unified platform delivers a central, massively scalable platform to manage all tiers of the application infrastructure.

    This release has major enhancements for each of the three layers of the Application Intelligence platform:

    Clear, meaningful data visualization

    AppDynamics was the first to market with transaction-based topology views of applications that make managing and scaling service oriented architectures easier than ever.  In our latest release, we’ve raised the bar again by offering clear, meaningful data visualization powered by self-learning algorithms for today’s leading enterprise companies.

    Advanced flow map visualizations

    Self-aggregating flow maps

    AppDynamics introduces advanced flow maps that are powered by sophisticated algorithms to make complex architectures more manageable by condensing and de-condensing information to enable intelligent zooming in and out of the topology. These visualization techniques also deliver the right level of granularity of application health indicators and traffic reports to match the zoom level.

    Screen Shot 2014-08-13 at 11.13.47 AM

    Self-organizing layouts

    In the Summer ‘14 release, the dashboards self-organize complex graphs of service and tier dependencies by using auto-grouping heuristics to dynamically determine tier and node weightages. These auto-grouping heuristics rely on dynamic patterns detected across static data such as application tier profiles and dynamic KPIs such as transaction response times, business data complexity, etc.  These algorithms then surface up the business-critical nodes and tiers to application owners and administrators for appropriate attention.

    Screen Shot 2014-08-13 at 11.13.58 AM

    Self-learning transaction engine

    Application owners can benefit greatly if they are armed with smart engines that automatically identify and group these transactions taking away the guesswork out of the exercise. This is based on a combination of historical as well as statistical analysis of large volumes of live execution data.  AppDynamics uniquely does live traffic introspection and creates groupings of business transactions from millions of requests of live traffic to improve business manageability.

    Screen Shot 2014-08-13 at 11.14.23 AM

    Smart dashboards

    Managing and monitoring large deployments with thousands of nodes and tiers can be overwhelming for the APM professional. Creating a separate dashboard for each of the thousand nodes and tiers individually is a near impossible task. In this release, AppDynamics introduces powerful dashboarding templates that auto-generate dashboards based on configurable parameterized characteristics of the nodes or tiers. This new feature enhances monitoring productivity by making the dashboards reusable for all nodes without having to replicate efforts.

    Screen Shot 2014-08-13 at 11.25.48 AM

     

    Platform to capture and process Big Data streams in real-time

    Screen Shot 2014-08-13 at 11.14.48 AM

    A new infinitely scalable event service that captures real-time events generated by an application. With this event service, organizations can flexibly define structured and unstructured events, and start capturing them with a public API. This infinitely scalable service has been certified for up to 10 trillion events. Archives of these events can be captured forever, and can be used for historical analyses as well.

    A new Hadoop-powered metrics service that crunches massive volumes of time-series data to deliver key application and business metrics in real-time. With its new enhancements, organizations can easily roll-up metrics at the tier levels, application levels, or time-series levels with no loss in granularity of the information. Leveraging the new complex algorithms that can crunch billions of metrics, this metrics service generates self-learning baselines that are refreshed reflecting the up to the minute application and business performance.

    We’ve also improved our real-time percentile metrics capabilities that put application and business performance in context. Metrics without a statistical context often don’t reveal the real picture. SLA metrics are more meaningful when presented with the context of percentiles, and when the outliers are automatically identified and surfaced up with alerts for immediate attention or automated remediation.  The percentile functionality in AppDynamics is now configurable, allowing teams to define what percentiles they want to be collected.

    Screen Shot 2014-08-13 at 11.15.00 AM

    This unified platform delivers a central, linearly scalable platform to manage all tiers of their application infrastructure. Through a single pane of glass, IT organizations can break down application tier silos and monitor their business with comprehensive end-to-end visibility. This unified platform lowers the total cost of ownership of their application infrastructure while lowering their time to issue resolution.

    Industry’s most comprehensive monitoring and data collection offering

    The AppDynamics Summer ‘14 Release includes several new and enhanced features related to data collection and monitoring, including distributed transaction correlation among all of the languages we support.

    Screen Shot 2014-08-13 at 11.19.11 AM

    With the industry’s first Node.js distributed transaction monitoring users can now monitor distributed Node.js transactions across all application tiers including Java, .NET and PHP. Node.js can automatically correlate downstream calls to quickly and efficiently isolate and troubleshoot performance bottlenecks.

    Screen Shot 2014-08-13 at 11.15.17 AM

    AppDynamics adds support for instrumenting native C++ applications with the beta release of the AppDynamics C++ SDK, which provides visibility into C++ applications and tiers. We’ve also added support for Java 8, which makes it easier for businesses to deploy and integrate AppDynamics into the latest generation of Java and Scala applications.

    Finally, we’ve announced support for monitoring .NET asynchronous transactions. AppDynamics gives customers the ability to automatically identify asynchronous transactions in dashboards, troubleshoot asynchronous calls in transaction snapshots and analyze async activity in the metric browser.

    Screen Shot 2014-08-13 at 11.15.26 AM

    For a detailed look at these advancements, check out our webinar recording.

    If you’d like to try these new capabilities out for yourself, start your free trial of AppDynamics today.

     

    The future of Ops, part 2

    In my first post, I discussed how software and various tools are dramatically changing the Ops department. This post centers on the automation process.

    When I was younger, you actually had to build a server from scratch, buy power and connectivity in a data center, and manually plug a machine into the network. After wearing the operations hat for a few years, I have learned many operations tasks are mundane, manual, and often have to be done at two in the morning once something has gone wrong. DevOps is predicated on the idea that all elements of technology infrastructure can be controlled through code and automated. With the rise of the cloud it can all be done in real-time via a web service.

    Infrastructure automation + virtualization solves the problem of having to be physically present in a data center to provision hardware and make network changes. Also, by automating the mundane tasks you can remove unnecessary personnel. The benefits of using cloud services is costs scale linearly with demand and you can provision automatically as needed without having to pay for hardware up front.

     

    Platform Overload

    The various platforms you’re likely to encounter in this new world can be divided into 3 main groups:

    • IaaS services like Amazon Web Services & Windows Azure — These allow you to quickly create servers and storage as needed. With IaaS you are responsible for provisioning compute + storage and own the operating system up.
    • PaaS services like Pivotal Web Services, Heroku, and EngineYard — These are platforms built on top of IaaS providers that allow you to deploy a specific stack with ease. With PaaS you are responsible for provisioning apps and own only your app + data.
    • SaaS services – these are platforms usually build on top of PaaS providers built to deploy a specific app (like a host ecommerce shop or blog).

    All of these are clouds — IaaS, PaaS, SaaS — pick which problems you want to spend time solving. However, often the most complexed environments can’t be managed by a third party.

    Screen Shot 2014-08-11 at 5.59.12 PM

     

    Monitoring Complex Environments

    Modern monitoring is not focused on infrastructure and availability, but rather takes the perspective down through the application. The simple reality is perceived user experience is the only metric that matters. Either applications are working or they are not. The complexity of monitoring applications is compounded by the availability of applications on many platforms such as web, mobile, and desktop devices.

    Screen Shot 2014-08-11 at 5.59.22 PM

    By leveraging monitoring tools and strategic product integrations, the future of Ops can be focused on efficiency, optimization, and providing a seamless user experience. At AppDynamics we have a robust list of extensions aimed to leverage existing technology to help Ops (and Dev) departments in this modern era. You can check out our list of extensions at the AppDynamics Exchange.

    Ops people, don’t just take my word for it, bring your department into the modern age and try AppDynamics for FREE today!

    The future of Ops, part 1

    The disruption of industries through software

    Marc Andreessen famously stated in 2011 that “software is eating the world”. The world runs on software defined businesses. These businesses realize that in order to be efficient and stay ahead of the competition they must innovate or they will die. Technology is no longer secondary to your business, but is now the actual business.

    Nowadays there is an app for nearly everything and consumers have the expectation most processes are automated. Access to these apps is ubiquitously available — from the web and mobile. Every disruptive billion dollar company in the last decade has innovated through applications by fundamentally changing the market and user experience. Companies like Netflix, Uber, Square, Tesla, Nest, Instacart, and many others have capitalized on this new user experience and catering to their elevated expectations. The disruption stems from an improved user experience, and enabled through technology.

    The evolution of application complexity

    Gone are the days where applications were this simple:

    Screen Shot 2014-07-30 at 6.34.55 PM

    The reality nowadays is applications are extremely complex and distributed using several platforms. Most application architecture we come across utilize several languages such as Java, .NET, PHP, and Node.js. Operations becomes even more complex with virtualization and cloud environments, deploying to containers, and managing application made up of many microservices.

    Screen Shot 2014-07-17 at 3.50.39 PM (2)

    It is not DevOps, it’s the next generation of Ops

    Most people and companies abuse the term DevOps to no end. It is a bit embarrassing, but buzzwords flow rampant on an expo floor of a technology convention. The reality is quite simply that the operations tools engineers use to build and manage complex applications have evolved to match the complexity. I believe the operations complexity breaks down into a few main categories: infrastructure automation, configuration management, deployment automation, log management, performance management, and monitoring.

    The evolution of the Ops problem

    The modern operations reality is that the cloud is the standard platform, operations are automated through code, testing and quality assurance are automated through code, deployments are automated through code, and monitoring and instrumentation is critical to success.

    iStock_000023396874_Small

    The DevOps Report from Puppet Labs surveyed the DevOps community and found some interesting results, most notably: “companies with high-performing IT organizations are twice as likely to exceed their profitability, market share and productivity goals.”

    The report also found successful DevOps teams tended to share these characteristics:

    • use continuous delivery to ensure consistent and stable deployments
    • leverage version control not just for code, but infrastructure and configuration to track and manage all environments states
    • automate testing to have confidence about the quality of every release
    • invest in monitoring and logging to be proactive about problems
    • correlate IT performance with organizational performance

    Download the entire report from Puppet Labs

    The enterprise catch up game

    Most enterprises are not able to adopt cutting edge technology at a rapid pace so they are in a constant state of migration and catching up. Furthermore, their challenges are exacerbated when dealing with hybrid environments consisting of on-premise legacy systems combined with new public and private cloud environments. Larger, less flexible, legacy companies are just starting to invest in the latest generation of programming languages such as Scala, Node.js, and Go and nosql datastores like Cassandra and Redis.

    Though enterprises may experience the challenges adapting to the latest operations trends, there are several tools out there which will help ease the transition. A good APM solution helps foster DevOps best practices and increase collaboration between the traditionally separated Dev and Ops teams.

    Don’t believe me? Try AppDynamics for FREE today!

    The Intelligent Approach to Production Monitoring

    We get a lot of questions about our analytics-driven Application Performance Management (APM) collection and analysis technology. Specifically, people want to know how we capture so much detailed information while maintaining such low overhead levels. The short answer is, our agents are intelligent and know when to capture every gory detail (the full call stack) and when to only collect the basics for every transaction. Using an analytics-driven approach, AppDynamics is able to provide the highest level of detail to solve performance issues during peak application traffic times.

    AppDynamics, An Efficient Doctor

    AppDynamics’ APM solution monitors, baselines and reports on the performance of every single transaction flowing through your application. However, unlike other APM solutions that got their start in development environments, ours was built for production, which requires a more agile approach to capturing transaction details.

    I’d like to share with you a story which illustrates AppDynamics analytics-based methodology and compares it with many of our competitors’ “capture as much detail as possible whether there are problems or not” (aka, our agents are too old to have intelligence built in) approach.

    You visit Dr. AppDynamics for your regular health checkups. She takes your vital signs, records weight, measures reflexes and compares every metric taken against known good baselines. When your statistics are close to the baselines the doctor sends you home and sees the next patient without delay. When your health vitals deviate too far from the pre-established baselines the smart doctor orders more relevant tests to diagnose your problem. This methodology minimizes the burden on the available resources and efficiently and effectively diagnoses any issues you have.

    In contrast, you visit Dr. Legacy for your regular health checkups. She takes your vital signs, records weight, measures reflexes and immediately orders a battery of diagnostic tests even though you are perfectly healthy. She does this for every single patient she sees. The medical system is now overburdened with extra work that was not required in the first place. This burden is slowing down the entire system so in order to ensure things move faster Dr. Legacy decides to reduce the amount of diagnostics tests being run on every single patient (even the ones with actual problems). Now the patients who have legitimate problems are going undiagnosed in the waiting room during the time when they need the most attention. In addition, due to the large amount of diagnostics testing and data being generated, the cost of care is driven up needlessly and excessively.

    Does Dr. Legacy’s methodology make any sense to you when better methods exist?

    AppDynamics intelligent approach to collecting data and inducing diagnostics makes it easier to spot outliers and, because deep diagnostic data is provided for only the transactions that require this level of detail, there is less impact on system resources and very little monitoring overhead.

    Monitoring 100% of Your Business Transactions All the Time

    AppDynamics monitors every single business transaction (BT) that flows through your applications. There is no exception to this rule. We automatically learn and develop a dynamic baseline for end-to-end response time as well as the response time of every component along the transaction flow, and also for all critical business metrics within your application.

    We score each transaction by comparing the actual response time to the self-learned baseline. When we determine that a BT has deviated too far from normal behavior (using a tunable algorithm), our agent knows to automatically collect full call stack details for your troubleshooting pleasure. This analytics-based methodology allows AppDynamics to detect and alert on problems right from the start so they can be fixed before they cause a major impact.

    Of course, there are times when deep data capture of every transaction is advantageous—such as during development—and the AppDynamics APM solution has another intelligent feature to address this need. We’ve built a simple, one-click button to enable full data recording system-wide. Developer mode is ideal for pre-production environments when engineers are profiling and load testing the application. Developer mode will capture a transaction snapshot for every single request. In production this would be overkill and wasteful. It’s even smart enough to know when you’re done using it and will automatically shut off when it is unintentionally left on, so your system won’t get bogged down if transaction volume increases.

    Who Looks at Production Call Stacks When There are No Problems?

    One of the worst qualities about legacy APM solutions is the fact that they collect as much data as they can, all the time. Usually this originates from the APM tool starting as a profiling tool for developers that has been molded to work in production. While this methodology is fine for development environments (we support this with dev-mode as described above), it fails miserably in any high volume scenario like load testing and production. Why does it fail? I’m glad you asked 😉

    Any halfway decent APM tool has built-in overhead limiters to keep themselves from causing harm and introducing too much overhead within a running application. When you are collecting as much deep dive data as possible with no intelligent way of focusing your data collection you are inducing the maximum allowed overhead basically all the time (assuming reasonable load). The problem is that as your application load gets higher, this is the time when your problems are most likely to surface, and this is the time when legacy APM overhead is skyrocketing (due to massive amounts of code execution and deep collection being “always on”) so the overhead limiters kick in and reduce the amount of data being collected or kill off data collection altogether. In plain English this means that legacy APM tools can’t tell good transactions from bad and will provide you with the least amount of data at the time you need the most data. Isn’t it funny how marketing and sales teams try to turn this methodology into the best thing ever?

    I have personally used many different APM tools in production and I never needed to look at a full call stack when there was no problem. I was too busy getting my job accomplished to poke around in mostly meaningless data just for the fun of it.

    Distributed Intelligence for Massive Scalability

    All of the intelligent data collection mentioned above requires a very small amount of extra processing to determine when to go deep and what to save. This is a place where the implementation details really make a difference.

    At AppDynamics, we put the smarts where they are best suited to be – at the agent level. It’s a simple paradigm shift that distributes the workload across your install base (where it’s not even noticed) rather than concentrating it a single point. This important architectural design makes it so that as the load on the application goes up, the load on the management server remains low.

    Contrasting this with legacy APM solutions, restricting whatever intelligence you have to the central monitoring server(s) causes higher resource requirements and therefore a monitoring infrastructure that requires more servers and greater levels of care and feeding.

    Collecting, transmitting, storing, and analyzing large amounts of unneeded data comes with a high total cost of ownership (TCO). It takes a lot of people, servers, and storage to properly manage those legacy APM tools in an enterprise environment. Most APM vendors even want to sell you their expensive full time consultancy services just to manage their complex solutions. Intelligent APM tools ease your burden instead of increasing it like the legacy APM tools do.

    All software tools go through transition periods where improvements are made and generational gaps are recognized. What was once cutting edge becomes hopelessly outdated unless you invest heavily in modernization. Hopefully this detailed look at APM methodologies helps you cut through the giant pile of sales and marketing propaganda that develops and IT ops folks are constantly exposed to. It’s important to understand what software vendors really do, but it’s most important to understand how they do it as it will have a major impact on real life usage.

    Understanding Performance of PayPal as a Service (PPAAS)

    In a previous post – Agile Performance Testing – Proactively Managing Performance – I discussed some of the challenges faced in managing a successful performance engineering practices in an Agile development model.  Let’s continue this with a real world example, highlighting how AppDynamics simplifies the collection and comparison of Key Performance Indicators (KPIs) to give visibility into an Agile development team’s integration with PayPal as a Service (PPaaS).

    Our dev team is tasked with building a new shopping cart and checkout capability for an online merchant. They have designed a simple Java Enterprise architecture with a web front-end, built on Apache TomEE, a set of mid-tier services, on JBoss AS 7, and have chosen to integrate with PayPal as the backend payment processor. With PayPal’s Mobile, REST and Classic SDKs, integrating secure payments into their app is a snap and our team knows this is a good choice.

    However, the merchant has tight Service Level Agreements (SLAs) and it’s critical the team proactively analyze, and resolve, performance issues in pre-production as part of their Agile process. In order to prepare for meeting these SLAs, they plan to use AppDynamics as part of development and performance testing for end-to-end visibility, and to collect and compare KPIs across sprints.

    The dev team is agile and are continuously integrating into their QA test and performance environment. During one of the first sprints they created a basic checkout flow, which is shown below:

    Screen Shot 2014-04-29 at 9.28.21 AM

    For this sprint they stubbed several of the service calls to PayPal, but coded the first step in authenticating — getting an OAuth Access Token, used to validate payments.

    Enabling AppDynamics on their application was trivial, and the dev team got immediate end-to-end visibility into their application flow, performance timings across all tiers, as well as the initial call to PayPal. Based on some initial performance testing everything looks great!

    Untitled1

    NOTE: in our example AppDynamics is configured to identify backend HTTP Requests (REST Service Invocations) using the first 3 segments of the target URL. This is an easy change and the updated configuration is automatically pushed to the AppDynamics agent without any need to change config files, or restart the application.

    Untitled2

    In a later sprint, our dev team finished integrating the full payments process flow. They’re using PayPal’s SDK and while it’s a seamless integration, they’re unclear exactly what calls to PayPal are happening under the covers.

    Because AppDynamics automatically discovers, maps, and scores all incoming transactions end-to-end, our dev team was able to get immediate and full visibility into two new REST invocations, authorization and payment.

    Untitled3

    The dynamic discovery of AppDynamics is extremely important in an Agile, continuous integration, or continuous release models where code is constantly changing. Having to manually configure what methods to monitor is a burden that degrades a team’s efficiency.

    Needing to understand performance across the two sprints, the team leverages AppDynamics’ Compare Releases functionality to quickly understand the difference between performance runs across the sprints.

    Untitled4

    AppDynamics flow map visualize the difference in transaction flow between the sprints, highlighting the additional REST calls required to fully process the payment. Also, the KPI comparison gives the dev team an easy way to quickly measure the differences in performance.

    Untitled5

    Performance has changed, as expected, when implementing the full payment processing flow. During a performance test AppDynamics automatically identifies and takes diagnostics on the abnormal transactions.

    Untitled6

    Transaction Snapshots capture source line of code call graphs, end-to-end across the Web and Service tiers. Drilling down across the call graphs, the dev team clearly identifies the payment service as the long running call.

    Untitled7

    AppDynamics provides full context on the REST invocation, and highlights the SDK was configured to talk to PayPal’s sandbox environment, explaining the occasional high-response times.

    To recap, our Agile dev team leveraged AppDynamics to get deep end-to-end visibility across their pre-production application environment. AppDynamics release comparison provided the means to understand differences in the checkout flows across sprints, and the dynamic discovery, application mapping, and automatic detection allowed the team to quickly understand and quantify their interactions with PayPal. When transactions deviated away from normal, AppDynamics automatically identified and captured the slowness to provide end-to-end source line of code root-cause analysis.

    Take five minutes to get complete visibility into the performance of your production applications with AppDynamics today.

    4 Reasons Why You Should Use APM When You Load Test Your Website

    I wouldn’t do website load/performance testing any more without having an APM tool in place. Period. Full stop. End of story.

    I’ve been involved in website load testing for over 10 years, as a “end-user” when I was web operations manager for an online job board, as a team leader for a company providing cloud load testing services, and as a consultant on web performance with my own company DevOpsGuys. The difference in the value you get from load/performance testing with and without APM tools is enormous.

    We’ve probably all seen those testing reports that are full of graphs of response time versus req/sec, CPU utilisation curves, disk IO throughput, error rates ad nauseam. I, to my eternal shame, have even written them… and whilst they are useful for answering the (very simplistic) question of “how many simulated requests/users can my website support before it falls over?” generating any real application insight from what are essentially infrastructure metrics is difficult. This type of test report rarely results in any corrective actions other than (1) “lets throw more hardware at it” or (2) “let’s shout at the devs that they have to fix something because the application is slow”.  Quite often the report gets circular filed because no-one knows how to derive application insight and hence generate meaningful corrective actions at either the code, application stack configuration or infrastructure level. All that effort & expense is wasted.

    So how are things different when using APM tools (like my preferred tool, AppDynamics)? Here are my top 4 reasons:

    1. See the Big Picture (Systems Thinking)

    “Systems thinking is a framework for seeing interrelationships rather than things, for seeing patterns rather than static snapshots.”  – Peter Senge, “The Fifth Discipline”.

    The “first way of DevOps” is systems thinking, and APM tools reinforce the systems thinking perspective by helping you see the big picture very clearly. You can see the interrelationships between the web tier, application tier, database servers, message queues, external cloud services etc. in real time while you’re testing rather than being focussed on the metrics for each tier individually. You can instantly see where the bottlenecks in your application are in the example below the 4306ms calls to Navision stand out!

    FlowMap

    2. Drill Down to the Code Level

    One of my favourite things when load testing with APM tools is being able to drill down to the stack trace level and identify the calls that are the most problematic. Suddenly, instead of talking about infrastructure metrics like CPU, RAM and Disk we are talking about application metrics — this business transaction (e.g. web page or API request) generates this flow across the application and 75% of the time is spent in this method call which makes 3 database calls and 2 web service calls and its this database call that’s slow and here’s the exact SQL statement that was executed. The difference in the response you get from the developer’s when you give them this level of detail compared to “your application is slow when we hit 200 users” is fantastic — now you are giving them real, pinpoint actionable intelligence on how their application responds under load.

    DrillDown

     3. Iterate Faster

    “the application was made 56x faster during a 12hr testing window”

    Because you can move quickly to the code level in real-time while you test and because this facilitates better communication with the development team your load testing suddenly becomes a lot more collaborative, even if the load testing is being performed by an external 3rd party.

    We generally have all the relevant parties on a conference call or HipChat chat session while we test and we are constantly exchanging information, screenshots, links to APM snapshots and the developers are often able to code fixes there and then because we can rapidly pinpoint the pain points.

    If you’ve got a customer with an Agile mindset and continuous delivery capability it can enable you to do rapid test and fix cycles during testing, often multiple times times in a day. In one notable example, the application was made 56x faster during a 12hr testing window due to 4 application releases during that period.

    56xFaster

    4. Stop the “Blame Game”

    “make the enemy poor performance, not each other…”

    Traditionally in the old school (pre-APM tools) days, load tests were often conducted by external load testing consultancies who would come in, do the testing, and then deliver some big report on how things went.

    The customer would assemble their team together in a conference room to go through the report, which often triggered the “blame game” – Ops blaming Dev, Dev blaming QA, QA blaming Ops, Ops blaming the hosting provider, the hosting provider blaming the customer’s code and around and around it would go.

    But with the right APM tools in place we’ve found this negative team dynamic can be avoided.

    As mentioned earlier, testing tends to become more collaborative because it’s easier to share the performance data in real time via the APM tool, and discussions become more evidence-based. It’s more about “what are we going to do about this problem we can see here in the APM tool” and less about trying to allocate blame when no-one really knows where the problem actually resides and they don’t want to be left holding the can. The system-thinking, holistic view of the application’s performance promulgated by the APM tool makes performance the enemy, not each other. And that means that the performance issues are likely to be fixed faster, and not ignored due to politics and infighting.

    There are probably loads more reasons you can come up with for why load testing with APM tools are awesome (and I’d love you hear your thoughts in the comments), but I will leave you with one more bonus reason – because it’s fun. For me, using AppDynamics when I’m doing load testing and performance tuning has really bought the fun factor back into the work. It’s fun to see the load being applied to the system and to see (via AppDynamics) the effect across the entire application. It’s fun to work closer with the Dev & Ops teams (dare I say, “DevOps”!) and to share meaningful, actionable insights on where the problems lie, and it’s fun be able to rapidly iterate and show the performance improvements in real-time.