Engineering, Product

Why APM Matters in AWS Lambda and Serverless

By | | 3 min read


Summary
Serverless compute services such as AWS Lambda offer many advantages to their customers, most notably the ability to reduce costs and minimise infrastructure. But serverless can also bring trade-offs that give rise to application performance issues.

In the last few years, we’ve heard a lot of buzz about cloud strategies. Although it has become something of a buzzword in broader circles, from a technical perspective, the purposes of “cloud” can be boiled down into two objectives:

  • Actively manage as little infrastructure as possible
  • Facilitate change within applications as much as possible

From a business perspective, these goals translate to minimising cost and maximising the ability to address new opportunities.

The ability to minimise infrastructure has been made possible by the fact that most servers are heavily underutilized. This observation has led the industry down a path from physical servers dedicated to particular workloads (through virtualisation to consolidate workloads onto fewer servers), to moving virtualised workloads onto someone else’s servers (e.g. Cloud v1.0) and finally to serverless compute: removing any visibility whatsoever of the infrastructure underlying the applications.

Amazon AWS Lambda currently rules the serverless space with 70% market share, according to a study last year by the Cloud Native Computing Foundation (CNCF). Other emerging serverless competitors include Apache OpenWhisk, Azure Functions and Google Cloud Functions.

The Need for Application Performance Monitoring

During this change in infrastructure approach, one thing has remained paramount: The level of service delivered to end users of applications must be maintained—or even improved—to meet ever-increasing customer expectations, particularly those set by the “dial tone” quality of ubiquitous internet services such as Google Search, Twitter, Netflix and so on.

From an application-quality perspective, this has been a positive development, since monitoring focused on infrastructure health—which has little or no correlation to service quality—has given way to application performance monitoring (APM).

APM focuses on surfacing health issues, whether at the application or infrastructure level, in the context of the business transaction—the outcome the affected system is trying to deliver. APM solutions achieve this feat by tracing transactions end-to-end through the system architecture, and using transaction health as the primary KPI.

This also has produced a fortunate side effect: tracing transactions end-to-end helps with cloud migrations, since intersystem dependencies are automatically revealed and application-oriented service assurance can be provided to de-risk the migrations.

The Challenges of Serverless

Whilst serverless is great from a cost-cutting, infrastructure-minimising perspective, it poses interesting challenges for APM. On one hand, it’s very important to measure and score every single application transaction to assure good levels of service to application consumers. On the other hand, sending detailed diagnostic data about every single transaction to a central management hub is costly and inefficient. (Note that most 1.0 serverless transaction monitors—for example, Amazon’s own X-Ray—use a sampling approach to measure and report only a subset of transactions.)

The serverless compute infrastructure itself also brings trade-offs that can give rise to their own application performance issues. Transactions may be subject to serverless runtime startup overhead at unexpected moments, and concurrency limits imposed by serverless platforms can also cause performance headaches in high-load situations.

AppDynamics’ conventional agents find the sweet spot on the cost-versus-transaction coverage spectrum by measuring and scoring every transaction, but also reporting aggregate statistics that are rolled up locally within the agent to avoid excessive chatter, coupled with collecting detailed diagnostic snapshots only for problem transactions. This behavior is a requirement in the serverless world, since runtimes are stateless and can shut down without warning. We’re currently working on an innovative approach to bring the intelligence of our conventional agents into this newer serverless paradigm and look forward to sharing more information soon.

Conclusion

Serverless compute brings new cost-optimization opportunities for customers, who are finding great value in serverless offerings such as AWS Lambda. And while high-fidelity transaction monitoring in the serverless world has introduced new technical hurdles for APM vendors, these challenges provide a great backdrop for another wave of innovation in the APM space.

Indeed, APM in the serverless world is more valuable than ever. See how AppDynamics can deliver a new wave of value to your existing and future cloud environments, including AWS Lambda.