Diagnose Network Problems with Integrated Network Visibility

More and more distributed apps are being deployed in the private, hybrid, and public clouds, and the performance of these apps is becoming increasingly critical for enterprises.

In fact, the AppDynamics 2017 App Attention Index highlights the modern day consumer demand for speed and consistency, with 62 percent of respondents expressing increased expectations for how well digital services should perform. What’s more, when apps don’t perform correctly, 80 percent of users will delete the app. Needless to say, the bar for application performance is extremely high.

AppDynamics APM is well-equipped to monitor the performance of these apps, pinpointing app flows that degrade the end-user experience through the lens of the Business Transaction (BT). However, operations teams triaging problems are always challenged with the question of whether the underlying network is the cause of the degradation.

Enterprises typically have dedicated teams to manage the infrastructure (including network) and apps, but these teams don’t necessarily speak the same language, thus creating a communication barrier. AppDynamics Integrated Network Visibility attempts to facilitate collaboration between teams and bring down mean time to repair (MTTR). It’s a solution that is designed to enable AppOps to identify network-level problems during the “First Call” and escalate it to the right network team with actionable information. It also seamlessly integrates with application flow maps and directly correlates network performance metrics with application performance metrics, all within the context of business transactions.

Dynamic Dashboard for Network Visibility

One of the standout features of Network Visibility is the Dynamic Dashboard – a set of widgets showcasing trends of Transmission Control Protocol (TCP) connection metrics and host-level TCP socket metrics for selected time ranges. It also includes native metrics like Throughput, Loss, Data and SACK Retransmissions, TCP Resets, Connection Information, and uber metrics (single representation of a bunch of related metrics) like Network Errors and Performance Impacting Events (PIE). For example:

  • Network Errors bundles FIN Errors, Syn Black-holes, Syn Resets and RST on Established which captures errors that can occur on init or teardown of TCP connections.

  • PIE coalesces Client Zero Window, Client Limited, RTOs, Server Zero Window,  and Server Limited which help highlight symptoms of a problem on the client node, server node, or the path between them. Full list of dashboard metrics can be found here.

With this data, you can now identify the contribution of the underlying network infrastructure. For example, consider a stalled transaction on your application flow map. With Network Visibility, users can launch this dashboard for the affected Tier / Node / Link and gain insightful network information, including:

  • A spike in the Latency trend, which could indicate a sluggish TCP connection between two services.
  • An uptick in Retransmissions, which could indicate network congestion.
  • High values of Client / Server Limited, Client / Server Zero Window or PIE, which could imply inadequate TCP window sizing (Back Pressure) and a need for TCP optimization.
  • “Network Impact on Transactions” juxtaposes PIE and Network Errors against Transactions, so network contribution for afflicted transactions can be identified.
  • Network Errors and Connection Information widgets, which help identify issues with TCP connections and their lifetimes.
  • Host Stack KPIs widget, which has metrics like Interface collisions & Wait Sockets which can help unearth issues in NIC or Duplex configurations.
  • Throughput, Loss and Latency widgets, which highlight the network health of the selected entity.

 DD-Launch.gif

Snapshot Correlation

As the name implies, Transaction Snapshots is a popular feature in which AppDynamics retains a snapshot of certain transaction instances. This could be triggered by an automatic detection of slow transactions or a user-driven diagnostic session. A transaction snapshot gives you a cross-tier view of the processing flow for that particular transaction.

Transaction Snapshot drill downs will come with a network tab for the dynamic dashboard which will allow you to correlate network metrics captured at the time of snapshot collection. Each chart has the snapshot time range highlighted. You can then look for correlations in these charts and drill down to the root cause.

DD -SnapShot.gif

With integrated network visibility now running alongside the APM metrics you rely on to run your business critical applications, you can easily switch to a view of critical network performance indicators for your tiers, nodes and the flows between them.

Learn more about network visibility or start a free trial today.

APM vs NPM. 2nd Round K.O.

Round Two – Last time I wrote a blog comparing APM versus network-based APM tools, which I still consider NPM at it’s core regardless of what some critics and competitors claim. Let me make one thing clear though, NPM is great for equipping IT network administrators to see how fast or slow data is traveling through the pipes of their application. Unfortunately, network-based APM tools simply cannot provide App Ops granular visibility into the application runtime when isolating bottlenecks go beyond the system level and it’s final destination – the end user’s browser.

I find several of the blogs and YouTube clips from such NPM vendors quite comical as they try to throw punches at APM companies. Their arguments are centered primarily against agent-based approaches being an inadequate APM solution due to today’s fickle and distributed application architectures. It’s not like I haven’t heard it before.

The amusing thing about it…they’re completely right! In fact, we couldn’t agree more, and that’s why Jyoti Bansal founded AppDynamics to address these perennial shortcomings legacy APM vendors have been ignoring. Even the smallest businesses next to the largest enterprises have complex applications that have outpaced their App Ops teams’ current set of monitoring tools. That’s why AppDynamics is reinventing and reigniting the application performance management space by enabling IT operations to monitor complex, modern applications running in the cloud or the data center. So let me respond to those claims they’ve made.

The Claims

“Agents have high deployment and ongoing maintenance burden.”
Legacy APM: TRUE
AppDynamics: FALSE. No manual instrumentation required. It’s automatic.

“Agents are invasive which can perturb the systems being monitored.”
Legacy APM: TRUE
AppDynamics: FALSE. Our customers see less than 1-2% overhead in production. 

“Performance management vendors have over promised and under delivered for decades.”
Legacy APM: TRUE
AppDynamics: FALSE. Things are going well thanks. Check our customer list and 400% growth.

All AppDynamics. The next-gen of APM.

Example App with application performance issues

I drew a parallel in my previous post that using NPM concepts to monitor application performance is like inspecting UPS packages en-route to figure out why operations at a hub came to a screeching halt. Remember, even if the package contents is visible from afar, it doesn’t explain why the hub conveyors, which electronically guide packages to their appropriate destination chute is broken, nor can it identify why cargo operations have stalled. In other words, good luck trying to gather anything beyond the scope of the application’s infrastructure. Using network monitoring tools to collect even the most basic system health metrics such as CPU utilization, memory usage, thread pool consumption and thrashing? Time to throw in the towel.

And what about End User Monitoring?

What’s becoming just as important as being able to monitor server side processing and network time is the ability to monitor end user performance. When NPM tools are only able to see the last packet sent from the server, how does that help you understand the browser’s performance? It doesn’t since once again, this kind of analysis is only feasible higher up the stack at the Application Layer. And just to clarify when I say Application Layer, I mean application execution time, not “network process time to application” as defined by OSI Layer 7.

On the other hand, injected agents residing in that layer can insert JavaScript into the Web page to determine the execution time spent in the browser. This is becoming more of a concern for App Ops and Dev Ops now that 80-90% of the end-user response time is spent on the frontend executing JavaScript, rendering markup and stylesheets. As business logic continues it’s migration to the browser while increasing it’s processing burden, the client is looking more and more like the new server. Network monitoring tools must move to an agent-based approach if they are to truly deliver the monitoring visibility needed for the application and end user experience, otherwise their visibility will remain between a rock and a hard place.

On top of that, what about those customers running their applications in a public cloud? Are you going to convince your cloud provider to install a network appliance into their infrastructure? I highly doubt it. With AppDynamics, we have partnerships with cloud providers such as Amazon EC2, Azure, RightScale and Opsource allowing developers and operations to easily deploy AppDynamics with a flick of a switch and monitor their applications in production 24/7.

Once again, next-gen APM triumphs over NPM based application performance on not just the server side, but also the browser. AppDynamics is embracing this and fully aware of the technical and business significance of monitoring end user performance. We’re delighted to offer this kind of end-to-end visibility to our customers who will now be able to monitor application performance from the end users’ browser to the backend application tiers (databases, mainframes), all through a single pane of glass view.

APM vs NPM. Round One.

Another three-letter acronym I see frequently mixed in with APM is NPM which stands for Network Performance Management. At first glance they look very similar. The distinction appears very subtle with just a one letter difference, but it speaks volumes because their core technologies and approaches to monitoring application performance are fundamentally different.

Application Performance Management tools typically use agents that live in the application run-time which capture performance details of how application logic computes across the application infrastructure.

In contrast, Network Performance Monitoring tools are agent-less appliances that sit on the network analyzing content and traffic by capturing packets flowing through the network. They can measure response times and find errors for your applications by understanding the wire protocols across the application tiers. Like agent based APM solutions, they can automatically discover your application topology, but NPM tools lack deep code-level diagnostics which is paramount to helping App Ops and Dev teams solve problems. Here’s why.

Imagine your application’s architecture is FedEx providing package delivery services to it’s customers in a timely manner. If you were to apply the concept of network monitoring to this analogy, using an NPM tool would help track how long it takes for packages to travel in transit from one hub to the next, examining it’s contents and the expected arrival time to it’s final destination.

Now let’s say that a package was being sent from Paris to Los Angeles but hit a snag at the London hub along the way. Operations at London came to a screeching hault for some inexplicable reason and your task is to figure out why and how to fix it ASAP so operations are running smooth again. In this case, APM is the favorable solution over NPM here. APM tools not only empower you with the ability to see how long it takes for your package to travel to different locations, but also how the package is processed at the facility and why operations came to a standstill. At this juncture, you probably wouldn’t even care to analyze the package’s contents since you already know what’s inside and it’s still in London.

When applying this concept to the world of business applications, NPM tools won’t provide you with visibility into application logic inefficiencies, especially when experiencing a performance issue or service level breach to key business transactions. NPM performance metrics are derived from captured packets and the network protocols the servers support, but if you’re making inefficient, iterative business logic calls in your code, how will capturing and analyzing packets help you triage the problem? Its the same story for identifying the root cause of memory leaks or thread synchronization issues; execution and visibility like this occurs at the application layer, not at the network layer. Thus, I would argue that NPM compliments APM but doesn’t serve as a viable replacement.

Now before I go enumerating my theories as to why I see this trend, let me just say while I respect their hustle, nice try but not quite. While scouring the web I found many NPM companies trying to crash the popular APM party by marketing their network performance monitoring utilities as an application monitoring and management solution. It can be confusing for a first time buyer trying to evaluate APM tools when you see NPM companies using phrases like, “network-based APM” or “gather data already on your network” in their product messaging.

One of the reasons for this messaging is because NPM is an immensely crowded market. Here’s a list of NPM tools compiled by some IT folks at Stanford. There’s over 300 network monitoring tools, and that’s only going back five years to 2007!! Take your pick. Second, applications and the tools to monitor their performance and availability speaks to customers at a higher level from a technical and business perspective. If I’m in charge of some transaction intensive application, my customers are directly interfacing with the Application Layer in order to complete business transactions which takes place at the highest layer on the stack. Customers aren’t 5-6 levels deep at the Network Layer interacting with packets or the routers and switches they pass through.

By the way, have you seen Gartner’s APM spend analysis? It’s a $2 billion market and growing. This most certainly is a valid reason why NPM solutions are now positioning themselves as an APM solution to customers.

If you have absolutely nothing to monitor any part of your applications infrastructure, then something is better than nothing. However, at some point you’ll need better visibility at the application layer regardless how fast packets are traveling through the pipes. NPM won’t necessarily provide the range of visibility your App Ops and Dev teams need to diagnose, troubleshoot, and idenfity problem root causes.

Those who live day in and day out the networking world will naturally gravitate to what they’re familiar with, but if you haven’t ventured out into the evolving APM market, I highly recommend it. Even though you can achieve some level of performance monitoring with NPM, you’ll find yourself running into limitations in application visibility pretty quick. But as the old saying goes, “If all you have is a hammer, then everything looks like a nail.”

I’m curious to know what are your thoughts are. Which class of tools would you choose? Why?