Visualizing and tracking your microservices

There is no question that microservices architectures are the current rage for software design. IT professionals and developers I speak to are migrating to this pattern pretty consistently. Meeting dozens of prospects and customers in the last ten weeks at AppDynamics, I’ve asked this question regularly— are you using or thinking about moving to microservices? — and most often the answer is “yes.” I typically follow this up with a question on the use of container technology (such as Docker), and the answer is “maybe.” I would suspect as containers mature and become cross-platform, that the answer will likely change.

I’m not going to explain the basics of microservices, as that’s that’s handled elsewhere. The pattern of using APIs, initially built to cross application boundaries within a single enterprise or organization, is now being leveraged within a single application architecture to deliver functionality. Microservices adoption is being driven by two forces: the need for agility and speed; and the re-composing of applications enabling experimentation and demands to support new delivery platforms such as web, mobile web, native apps, and partners. Defining these boundaries allows independent development of micoservices.

There are several design criteria identified by early adopters, such as those at Netflix. In this great article, Ngnix interviews Adrian Cockroft, formerly of Netflix fame and now with Battery Ventures (which happens to also be one of our major investors). In this article, there is some discussion around the architecture, and one item that was specifically concerning to me —  thinking with my IT operations hat on —  was the separate back-end storage for each microservice. Disparate storage requires a complex master data management solution or strategy to keep data in sync. Inconsistent storage also causes issues should a disaster arise and recovery be necessary. The level of complexity in managing all of these separate backends seems like a recipe for technical debt. Technical debt is the buildup of old and possibly short-term decisions, which cause systems rigidity. I reached out to Adrian Cockroft on this specific topic and got the following back from him:

“Replication across data centers is handled using Cassandra or Riak for each data store; it’s an orthogonal problem.

Keeping foreign keys in sync across data stores can be done on an exception basis when inconsistencies are detected (like read-repair) or as a nightly batch job using a throttled full table scan.

Each data store is extremely simple and will be maintained independently. In practice, this is vastly easier than a single central database with ‘kitchen sink’ schema.”

This insight provided the guidance other posts didn’t. Adrian specifically states that development should be using a standard data storage technology. In his use cases, that would be Cassandra or Riak, keeping consistency from a support perspective. How many enterprises wish they had two specific platforms for data storage? These architectures were pioneered by the web-scale innovators to meet service demands and agile release velocity. I found many of these stats to be compelling:

  • calls between 100-150 services (APIs) to build a page.
  • Netflix microservices architecture services 5 billion API calls per day, of which 99.7% of the calls are internal. Netflix (2014)
  • API traffic makes up over 60% of the traffic serviced by our application tier overall. (October 2013)

DevOps engineers and teams responsible for operating microservices are realizing a few things, aside from the level of complexity and scale created by these new architectures:

  • Determining which services are being called to deliver application functionality to a specific user is difficult.
  • Documenting and/or visualizing the fluid application topology is something few have been able to do.
  • Building an architecture map or blueprint of the services design is nearly impossible.

Today’s monitoring approaches consist of the following broken strategies:

In this plan, status codes are logged and examined for an individual microservice. There is no way to determine the health of the application being delivered to a user, which is a major issue. The design then couples this with another tool for visualization of basic metrics about the microservice. This approach helps determine the health of each component, but once again, this flawed approach is similar to the way server monitoring works today, which is completely broken. Views which exist in silos do not provide the visibility required to understand the end-to-end transaction path. If there is a service failure that cascades to other service failures, determining root cause is virtually impossible due to the asynchronous nature of microservices. Services often call additional services, which means that there is an n-to-n relationship between services:


Another approach is to use seven different tools to visualize each component individually once again; this is the root issue of #monitoringsucks:

The last example is a common pattern I’ve seen, but once again consists of a component-level view using several monitoring tools that collect data independently. In this case, the architecture consists of CloudWatch for the infrastructure, Zabbix for the server, statsd and Collectd for metrics (which feed into Graphite). The result is three consoles, three tools, and three views of each component. These tools and consoles do not handle infrastructure monitoring nor touch on application performance data.

Clearly, there needs to be a visualization of the services path including traceability from end to end for each interaction, from the user through all of the services calls, along with the infrastructure. AppDynamics delivers this capability. Here is an example of a microservices architecture running on Docker being monitored with AppDynamics. This is actually our demo environment, where we are running load generators on Docker along with each microservice for our demo application instances. We don’t publish all of this, but some of it is on our Docker repository:


C:\Users\Jonah\AppData\Local\Temp\Image.png en-resource://resourcemap/8d2b9eb3507b836e07c4c81f7642f0f2

We hope to present more details at DockerCon if our talks are selected.

So what about those who don’t want to pay for software? You’ll likely pay with people and time, or actual money. If you evaluate or select AppDynamics, it can be deployed on premises or SaaS (and you can switch between both deployment models). Adrian Cockroft is working on a cool new open source project called Spigo, which visualizes topologies (instrumentation is actually the hardest part, which this doesn’t do). This new open source project is built on d3 JavaScript, and you can see early examples and download source code here. Today, the tool doesn’t have real-time capabilities, but those will come over time. AppDynamics views are also pure HTML5 and JavaScript, including our rich topology map pictured above. We also animate and show detailed data regarding usage and performance across the communication paths. Expect additional visibility as we add new data sources to enhance the topology maps.

Topology and application paths are key to managing complex architectures, and with the addition of microservices and Docker, everyone will need these capabilities. AppDynamics is the most advanced topology visualization on the market to manage these new and increasingly popular complex architectures, but open source projects such as Spigo will improve visualization.

Try it for yourself, download a FREE trial of AppDynamics today!