Diagnose Network Problems with Integrated Network Visibility

More and more distributed apps are being deployed in the private, hybrid, and public clouds, and the performance of these apps is becoming increasingly critical for enterprises.

In fact, the AppDynamics 2017 App Attention Index highlights the modern day consumer demand for speed and consistency, with 62 percent of respondents expressing increased expectations for how well digital services should perform. What’s more, when apps don’t perform correctly, 80 percent of users will delete the app. Needless to say, the bar for application performance is extremely high.

AppDynamics APM is well-equipped to monitor the performance of these apps, pinpointing app flows that degrade the end-user experience through the lens of the Business Transaction (BT). However, operations teams triaging problems are always challenged with the question of whether the underlying network is the cause of the degradation.

Enterprises typically have dedicated teams to manage the infrastructure (including network) and apps, but these teams don’t necessarily speak the same language, thus creating a communication barrier. AppDynamics Integrated Network Visibility attempts to facilitate collaboration between teams and bring down mean time to repair (MTTR). It’s a solution that is designed to enable AppOps to identify network-level problems during the “First Call” and escalate it to the right network team with actionable information. It also seamlessly integrates with application flow maps and directly correlates network performance metrics with application performance metrics, all within the context of business transactions.

Dynamic Dashboard for Network Visibility

One of the standout features of Network Visibility is the Dynamic Dashboard – a set of widgets showcasing trends of Transmission Control Protocol (TCP) connection metrics and host-level TCP socket metrics for selected time ranges. It also includes native metrics like Throughput, Loss, Data and SACK Retransmissions, TCP Resets, Connection Information, and uber metrics (single representation of a bunch of related metrics) like Network Errors and Performance Impacting Events (PIE). For example:

  • Network Errors bundles FIN Errors, Syn Black-holes, Syn Resets and RST on Established which captures errors that can occur on init or teardown of TCP connections.

  • PIE coalesces Client Zero Window, Client Limited, RTOs, Server Zero Window,  and Server Limited which help highlight symptoms of a problem on the client node, server node, or the path between them. Full list of dashboard metrics can be found here.

With this data, you can now identify the contribution of the underlying network infrastructure. For example, consider a stalled transaction on your application flow map. With Network Visibility, users can launch this dashboard for the affected Tier / Node / Link and gain insightful network information, including:

  • A spike in the Latency trend, which could indicate a sluggish TCP connection between two services.
  • An uptick in Retransmissions, which could indicate network congestion.
  • High values of Client / Server Limited, Client / Server Zero Window or PIE, which could imply inadequate TCP window sizing (Back Pressure) and a need for TCP optimization.
  • “Network Impact on Transactions” juxtaposes PIE and Network Errors against Transactions, so network contribution for afflicted transactions can be identified.
  • Network Errors and Connection Information widgets, which help identify issues with TCP connections and their lifetimes.
  • Host Stack KPIs widget, which has metrics like Interface collisions & Wait Sockets which can help unearth issues in NIC or Duplex configurations.
  • Throughput, Loss and Latency widgets, which highlight the network health of the selected entity.

 DD-Launch.gif

Snapshot Correlation

As the name implies, Transaction Snapshots is a popular feature in which AppDynamics retains a snapshot of certain transaction instances. This could be triggered by an automatic detection of slow transactions or a user-driven diagnostic session. A transaction snapshot gives you a cross-tier view of the processing flow for that particular transaction.

Transaction Snapshot drill downs will come with a network tab for the dynamic dashboard which will allow you to correlate network metrics captured at the time of snapshot collection. Each chart has the snapshot time range highlighted. You can then look for correlations in these charts and drill down to the root cause.

DD -SnapShot.gif

With integrated network visibility now running alongside the APM metrics you rely on to run your business critical applications, you can easily switch to a view of critical network performance indicators for your tiers, nodes and the flows between them.

Learn more about network visibility or start a free trial today.

UX – Monitor the Application or the Network?

Last week I flew into Las Vegas for #Interop fully suited and booted in my big blue costume (no joke). I’d been invited to speak in a vendor debate on User eXperience (UX): Monitor the Application or the Network? NetScout represented the Network, AppDynamics (and me) represented the Application, and “Compuware dynaTrace Gomez” sat on the fence representing both. Moderating was Jim Frey from EMA, who did a great job introducing the subject, asking the questions and keeping the debate flowing.

At the start each vendor gave their usual intro and company pitch, followed by their own definition on what User Experience is.

Defining User Experience

So at this point you’d probably expect me to blabber on about how application code and agents are critical for monitoring the UX? Wrong. For me, users experience “Business Transactions”–they don’t experience applications, infrastructure, or networks. When a user complains, they normally say something like “I can’t Login” or “My checkout timed out.” I can honestly say I’ve never heard them say –  “The CPU utilization on your machine is too high” or “I don’t think you have enough memory allocated.”

Now think about that from a monitoring perspective. Do most organizations today monitor business transactions? Or do they monitor application infrastructure and networks? The truth is the latter, normally with several toolsets. So the question “Monitor the Application or the Network?” is really the wrong question for me. Unless you monitor business transactions, you are never going to understand what your end users actually experience.

Monitoring Business Transactions

So how do you monitor business transactions? The reality is that both Application and Network monitoring tools are capable, but most solutions have been designed not to–just so they provide a more technical view for application developers and network engineers. This is wrong, very wrong and a primary reason why IT never sees what the end user sees or complains about. Today, SOA means applications are more complex and distributed, meaning a single business transaction could traverse multiple applications that potentially share services and infrastructure. If your monitoring solution doesn’t have business transaction context, you’re basically blind to how application infrastructure is impacting your UX.

The debate then switched to how monitoring the UX differs from an application and network perspective. Simply put, application monitoring relies on agents, while network monitoring relies on sniffing network traffic passively. My point here was that you can either monitor user experience with the network or you can manage it with the application. For example, with network monitoring you only see business transactions and the application infrastructure, because you’re monitoring at the network layer. In contrast, with application monitoring you see business transactions, application infrastructure, and the application logic (hence why it’s called application monitoring).

Monitor or Manage the UX?

Both application and network monitoring can identify and isolate UX degradation, because they see how a business transaction executes across the application infrastructure. However, you can only manage UX if you can understand what’s causing the degradation. To do this you need deep visibility into the application run-time and logic (code). Operations telling a Development team that their JVM is responsible for a user experience issue is a bit like Fedex telling a customer their package is lost somewhere in Alaska. Identifying and Isolating pain is useful, but one could argue it’s pointless without being able to manage and resolve the pain (through finding the root cause).

Netscout made the point that with network monitoring you can identify common bottlenecks in the network that are responsible for degrading the UX. I have no doubt you could, but if you look at the most common reason for UX issues, it’s related to change–and if you look at what changes the most, it’s application logic. Why? Because Development and Operations teams want to be agile, so their applications and business remains competitive in the marketplace. Agile release cycles means application logic (code) constantly changes. It’s therefore not unusual for an application to change several times a week, and that’s before you count hotfixes and patches. So if applications change more than the network, then one could argue it’s more effective for monitoring and managing the end user experience.

UX and Web Applications

We then debated which monitoring concept was better for web-based applications. Obviously, network monitoring is able to monitor the UX by sniffing HTTP packets passively, so it’s possible to get granular visibility on QoS in the network and application. However, the recent adoption of Web 2.0 technologies (ajax, GWT, Dojo) means application logic is now moving from the application server to the users browser. This means browser processing time becomes a critical part of the UX. Unfortunately, Network monitoring solutions can’t monitor browser processing latency (because they monitor the network), unlike application monitoring solutions that can use techniques like client-side instrumentation or web-page injection to obtain browser latency for the UX.

The C Word

We then got to the Cloud and which made more sense for monitoring UX. Well, network monitoring solutions are normally hardware appliances which plug direct into a network tap or span port. I’ve never asked, but I’d imagine the guys in Seattle (Amazon) and Redmond (Windows Azure) probably wouldn’t let you wheel a network monitoring appliance into their data-centre. More importantly, why would you need to if you’re already paying someone else to manage your infrastructure and network for you? Moving to the Cloud is about agility, and letting someone else deal with the hardware and pipes so you can focus on making your application and business competitive. It’s actually very easy for application monitoring solutions to monitor UX in the cloud. Agents can piggy back with application code libraries when they’re deployed to the cloud, or cloud providers can embed and provision vendor agents as part of their server builds and provisioning process.

What’s interesting also is that Cloud is highlighting a trend towards DevOps (or NoOps for a few organizations) where Operations become more focused on applications vs infrastructure. As the network and infrastructure becomes abstracted in the Public Cloud, then the focus naturally shifts to the application and deployment of code. For private clouds you’ll still have network Ops and Engineering teams that build and support the Cloud platform, but they wouldn’t be the people who care about user experience. Those people would be the Line of Business or application owners which the UX impacts.

In reality most organizations today already monitor the application infrastructure and network. However, if you want to start monitoring the true UX, you should monitor what your users experience, and that is business transactions. If you can’t see your users’ business transactions, you can’t manage their experience.

What are your thoughts on this?

AppDynamics is an application monitoring solution that helps you monitor business transactions and manage the true user experience. To get started sign-up for a 30-day free trial here.

I did have an hour spare at #Interop after my debate to meet and greet our competitors, before flying back to AppDynamics HQ. It was nice to see many of them meet and greet the APM Caped Crusader.

App Man.