Why Kubelet TLS Bootstrap in Kubernetes 1.12 is a Very Big Deal

Kubelet TLS Bootstrap, an exciting and highly-anticipated feature in Kubernetes 1.12, is graduating to general availability. As you know, the Kubernetes orchestration system provides such key benefits as service discovery, load balancing, rolling restarts, and the ability to maintain container counts by replacing failed containers. And by using Kubernetes-compliant extensions, you can seamlessly enhance system functionality. This is similar to how Istio (with Kubernetes) provides added benefits such as robust tracing/monitoring, traffic management, and so on.

Until now, however, Kubernetes did not provide similar automation features for security best practices, such as mutually-authenticated TLS connections (mutual-TLS or mTLS). These connections enable developers to use simple certificate directives that limit nodes to communicate with predetermined services—all without writing a single line of additional code. Even though the use of TLS 1.2 certificates for service-to-service communication is a known best-practice, very few companies use mutual-TLS to deploy their systems. This lack of adoption is due mostly to greater deployment difficulties in creating and managing public key infrastructures (PKI). This is why the new TLS Bootstrap module in Kubernetes 1.12 is so exciting: It provides features for adding authentication and authorization to each service at the application level.

The Power of mTLS

Mutual-TLS mandates that both the client and server must authenticate themselves by exchanging identities (certificates). mTLS is made possible by provisioning a TLS certificate to each Kubelet. The client and server use the TLS handshake protocol to negotiate and set up a secure encryption channel. As part of this negotiation, each party checks the validity of the other party’s certificate. Optionally, they can add more verification, such as authorization (the principle of least privilege). Hence, mTLS will provide added security to your application and data. Even if malicious software has taken over a container or host, it cannot connect to any service without providing a valid identity/authorization.

In addition, the Kubelet certificate rotation feature (currently in beta) has an automated way to get a signed certificate from the cluster API server. The Kubelet process accepts an argument, -rotate-certificates, which controls whether the kubelet will automatically request a new certificate as the current one nears expiration. The kube-controller-manager process accepts the argument –experimental-cluster-signing-duration, which controls the length of time each certificate will be in use.

When a kubelet starts up, it uses its initial certificate to connect to the Kubernetes API and issue a certificate-signing request. Upon approval (which can be automated with a few checks), the controller manager signs a certificate issued for a time period specified by the duration parameter. This certificate is then attached to the Certificate Signing Request. The kubelet uses an API call to retrieve the signed certificate, which it uses to connect to the Kubernetes API. As the current certificate nears expiration, the kubelet will use the same process described above to get a new certificate.

Since this process is fully automated, certificates can be created with a very short expiry time. For example, if the expiration time is one hour, even if a malicious agent gets hold of the certificate, the compromised certificate will still expire in an hour.

Robust Security and the Strength of APM

Mutual-TLS and automated certificate rotation give organizations robust security without having to spend heavily on firewalls or intrusion-detection services. mTLS is also the first step towards eliminating the distinction of trusted and non-trusted connections. In this new paradigm, connections coming from inside the firewall or corporate network are treated exactly the same as those from the internet. Every client must identify itself and receive authorization to access a resource, regardless of the originating host’s location. This approach safeguards resources, even if a host inside the corporate firewall is compromised.

AppDynamics fully supports mutually-authenticated TLS connections between its agents and the controller. Our agents running inside a container can communicate with the controller in much the same way as microservices connect to each other. In hybrid environments, where server authentication is available only for some agents and mutual authentication for others, it’s possible to set up and configure multiple HTTP listeners in Glassfish—one for server authentication only, another for both server and client authentication. The agent and controller connections can be configured to use the TLS 2 protocol as well.

See how AppDynamics can provide end-to-end, unified visibility into your Kubernetes environment!

 

 

Deploying AppDynamics Agents to OpenShift Using Init Containers

There are several ways to instrument an application on OpenShift with an AppDynamics application agent. The most straightforward way is to embed the agent into the main application image. (For more on this topic, read my blog Monitoring Kubernetes and OpenShift with AppDynamics.)

Let’s consider a Node.js app. All you need to do is to add a require reference to the agent libraries and pass the necessary information on the controller. The reference itself becomes a part of the app and will be embedded in the image. The list of variables (e.g., controller host name, app/tier name, license) the agent needs to communicate with the controller can be embedded, though it is best practice to pass them into the app on initialization as configurable environmental variables.

In the world of Kubernetes (K8s) and OpenShift, this task is accomplished with config maps and secrets. Config maps are reusable key value stores that can be made accessible to one or more applications. Secrets are very similar to config maps with an additional capability to obfuscate key values. When you create a secret, K8s automatically encodes the value of the key as a base64 string. Now the actual value is not visible, and you are protected from people looking over your shoulder. When the key is requested by the app, Kubernetes automatically decodes the value. Secrets can be used to store any sensitive data such as license keys, passwords, and so on. In our example below, we use a secret to store the license key.

Here is an example of AppD instrumentation where the agent is embedded, and the configurable values are passed as environment variables by means of a configMap, a secret and the pod spec.

var appDobj = {
   controllerHostName: process.env[‘CONTROLLER_HOST’],
   controllerPort: CONTROLLER_PORT,
   controllerSslEnabled: true,
accountName: process.env[‘ACCOUNT_NAME’],
   accountAccessKey: process.env[‘ACCOUNT_ACCESS_KEY’],
   applicationName: process.env[‘APPLICATION_NAME’],
   tierName: process.env[‘TIER_NAME’],
   nodeName: ‘process’
}
require(“appdynamics”).profile(appDobj);

Pod Spec
– env:
   – name: TIER_NAME
     value: MyAppTier
   – name: ACCOUNT_ACCESS_KEY
     valueFrom:
       secretKeyRef:
         key: appd-key
         name: appd-secret
   envFrom:
     – configMapRef:
         name: controller-config

A ConfigMap with AppD variables.

AppD license key stored as secret.

The Init Container Route: Best Practice

The straightforward way is not always the best. Application developers may want to avoid embedding a “foreign object” into the app images for a number of good reasons—for example, image size, granularity of testing, or encapsulation. Being developers ourselves, we respect that and offer an alternative, a less intrusive way of instrumentation. The Kubernetes way.

An init container is a design feature in Kubernetes that allows decoupling of app logic from any type of initialization routine, such as monitoring, in our case. While the main app container lives for the entire duration of the pod, the lifespan of the init container is much shorter. The init container does the required prep work before orchestration of the main container begins. Once the initialization is complete, the init container exists and the main container is started. This way the init container does not run parallel to the main container as, for example, a sidecar container would. However, like a sidecar container, the init container, while still active, has access to the ephemeral storage of the pod.

We use this ability to share storage between the init container and the main container to inject the AppDynamics agent into the app. Our init container image, in its simplest form, can be described with this Dockerfile:

FROM openjdk:8-jdk-alpine
RUN apk add –no-cache bash gawk sed grep bc coreutils
RUN mkdir -p /sharedFiles/AppServerAgent
ADD AppServerAgent.zip /sharedFiles/
RUN unzip /sharedFiles/AppServerAgent.zip -d /sharedFiles/
AppServerAgent /
CMD [“tail”, “-f”, “/dev/null”]

The above example assumes you have already downloaded the archive with AppDynamics app agent binaries locally. When the container is initialized, it unzips the binaries into a new directory. To the pod spec, we then add a directive that copies the directory with the agent binaries to a shared volume on the pod:

spec:
     initContainers:
     – name: agent-repo
       image: agent-repo:x.x.x
       imagePullPolicy: IfNotPresent
       command: [“cp”,  “-r”,  “/sharedFiles/AppServerAgent”,  /mountpath/AppServerAgent”]
       volumeMounts:
       – mountPath: /mountPath
         name: shared-files
     volumes:
       – name: shared-files
         emptyDir: {}
     serviceAccountName: my-account

After the init container exits, the AppDynamics agent binaries are waiting for the application to be picked up from the shared volume on the pod.

Let’s assume we are deploying a Java app, one normally initialized via a script that calls the java command with Java options. The script, startup.sh, may look like this:

# startup.sh
JAVA_OPTS=”$JAVA_OPTS -Dappdynamics.agent.tierName=$TIER_NAME”
JAVA_OPTS=”$JAVA_OPTS -Dappdynamics.agent.reuse.nodeName=true -Dappdynamics.agent.reuse.nodeName.prefix=$TIER_NAME”
JAVA_OPTS=”$JAVA_OPTS
-javaagent:/sharedFiles/AppServerAgent/javaagent.jar”
JAVA_OPTS=”$JAVA_OPTS
-Dappdynamics.controller.hostName=$CONTROLLER_HOST -Dappdynamics.controller.port=$CONTROLLER_PORT -Dappdynamics.controller.ssl.enabled=$CONTROLLER_SSL_ENABLED”
JAVA_OPTS=”$JAVA_OPTS -Dappdynamics.agent.accountName=$ACCOUNT_NAME -Dappdynamics.agent.accountAccessKey=$ACCOUNT_ACCESS_KEY -Dappdynamics.agent.applicationName=$APPLICATION_NAME”
JAVA_OPTS=”$JAVA_OPTS -Dappdynamics.socket.collection.bci.enable=true”
JAVA_OPTS=”$JAVA_OPTS -Xms64m -Xmx512m -XX:MaxPermSize=256m -Djava.net.preferIPv4Stack=true”
JAVA_OPTS=”$JAVA_OPTS -Djava.security.egd=file:/dev/./urandom”

$JAVA_OPTS -jar myapp.jar

It is embedded into the image and invoked via Docker’s ENTRYPOINT directive when the container starts.

FROM openjdk:8-jdk-alpine
COPY startup.sh startup.sh
RUN chmod +x startup.sh
ADD myapp.jar /usr/src/myapp.jar
EXPOSE 8080
ENTRYPOINT [“/bin/sh”, “startup.sh”]

To make the consumption of startup.sh more flexible and Kubernetes-friendly, we can trim it down to this:

#a more flexible startup.sh
java $JAVA_OPTS -jar myapp.jar

And declare all the necessary Java options in the spec as a single environmental variable.

containers:
       – name: my-app
         image: my-app-image:x.x.x
         imagePullPolicy: IfNotPresent
         securityContext:
           privileged: true
         envFrom:
           – configMapRef:
               name: controller-config
         env:
           – name: ACCOUNT_ACCESS_KEY
             valueFrom:
               secretKeyRef:
                 key: appd-key
name: appd-secret
-name: JAVA_OPTS
  value: “ -javaagent:/sharedFiles/AppServerAgent/javaagent.jar
         -Dappdynamics.agent.accountName=$(ACCOUNT_NAME)
         -Dappdynamics.agent.accountAccessKey=$(ACCOUNT_ACCESS_KEY)
         -Dappdynamics.controller.hostName=$(CONTROLLER_HOST)
         -Xms64m -Xmx512m -XX:MaxPermSize=256m
         -Djava.net.preferIPv4Stack=true
         …”
         ports:
         – containerPort: 8080
         volumeMounts:
           – mountPath: /sharedFiles
             name: shared-files

The dynamic values for the Java options are populated from the ConfigMap. First, we reference the entire configMap, where all shared values are defined:

envFrom:
           – configMapRef:
               name: controller-config

We also reference our secret as a separate environmental variable. Then, using the $() notation, we can reference the individual variables in order to concatenate the value of the JAVA_OPTS variable.

Thanks to these Kubernetes features (init containers, configMaps, secrets), we can add AppDynamics monitoring into an existing app in a noninvasive way, without the need to rebuild the image.

This approach has multiple benefits. The app image remains unchanged in terms of size and encapsulation. From a Kubernetes perspective, no extra processing is added, as the init container exits before the main container starts. There is added flexibility in what can be passed into the application initialization routine without the need to modify the image.

Note that OpenShift does not allow running Docker containers as user root by default. If you must (for whatever good reason), add the service account you use for deployments to the anyuid SCC. Assuming your service account is my-account, as in the provided examples, run this command:

oc adm policy add-scc-to-user anyuid -z myaccount

Here’s an example of a complete app spec with AppD instrumentation:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
 name: my-app
spec:
 replicas: 1
 template:
   metadata:
     labels:
       name: my-app
   spec:
     initContainers:
     – name: agent-repo
       image: agent-repo:x.x.x
       imagePullPolicy: IfNotPresent
       command: [“cp”,  “-r”,  “/sharedFiles/AppServerAgent”,  “/mountPath/AppServerAgent”]
       volumeMounts:
       – mountPath: /mountPath
         name: shared-files
     volumes:
       – name: shared-files
         emptyDir: {}
     serviceAccountName: my-account
     containers:
       – name: my-app
         image: my-service
         imagePullPolicy: IfNotPresent
         envFrom:
           – configMapRef:
               name: controller-config
         env:
           – name: TIER_NAME
             value: WebTier
           – name: ACCOUNT_ACCESS_KEY
             valueFrom:
               secretKeyRef:
                 key: appd-key
                 name: appd-key-secret
           – name: JAVA_OPTS
              value: ”
-javaagent:/sharedFiles/AppServerAgent/javaagent.jar
                  -Dappdynamics.agent.accountName=$(ACCOUNT_NAME)
                  -Dappdynamics.agent.accountAccessKey=$(ACCOUNT_ACCESS_KEY)
                  -Dappdynamics.controller.hostName=$(CONTROLLER_HOST)
                  -Xms64m -Xmx512m -XX:MaxPermSize=256m
                  -Djava.net.preferIPv4Stack=true
                  …”
                          ports:
                          – containerPort: 8080
                          volumeMounts:
                             – mountPath: /sharedFiles
                               name: shared-files
                    restartPolicy: Always

Learn more about how AppDynamics can help monitor your applications on Kubernetes and OpenShift.

Monitoring Kubernetes and OpenShift with AppDynamics

Here at AppDynamics, we build applications for both external and internal consumption. We’re always innovating to make our development and deployment process more efficient. We refactor apps to get the benefits of a microservices architecture, to develop and test faster without stepping on each other, and to fully leverage containerization.

Like many other organizations, we are embracing Kubernetes as a deployment platform. We use both upstream Kubernetes and OpenShift, an enterprise Kubernetes distribution on steroids. The Kubernetes framework is very powerful. It allows massive deployments at scale, simplifies new version rollouts and multi-variant testing, and offers many levers to fine-tune the development and deployment process.

At the same time, this flexibility makes Kubernetes complex in terms of setup, monitoring and maintenance at scale. Each of the Kubernetes core components (api-server, kube-controller-manager, kubelet, kube-scheduler) has quite a few flags that govern how the cluster behaves and performs. The default values may be OK initially for smaller clusters, but as deployments scale up, some adjustments must be made. We have learned to keep these values in mind when monitoring OpenShift clusters—both from our own pain and from published accounts of other community members who have experienced their own hair-pulling discoveries.

It should come as no surprise that we use our own tools to monitor our apps, including those deployed to OpenShift clusters. Kubernetes is just another layer of infrastructure. Along with the server and network visibility data, we are now incorporating Kubernetes and OpenShift metrics into the bigger monitoring picture.

In this blog, we will share what we monitor in OpenShift clusters and give suggestions as to how our strategy might be relevant to your own environments. (For more hands-on advice, read my blog Deploying AppDynamics Agents to OpenShift Using Init Containers.)

OpenShift Cluster Monitoring

For OpenShift cluster monitoring, we use two plug-ins that can be deployed with our standalone machine agent. AppDynamics’ Kubernetes Events Extension, described in our blog on monitoring Kubernetes events, tracks every event in the cluster. Kubernetes Snapshot Extension captures attributes of various cluster resources and publishes them to the AppDynamics Events API. The snapshot extension collects data on all deployments, pods, replica sets, daemon sets and service endpoints. It captures the full extent of the available attributes, including metadata, spec details, metrics and state. Both extensions use the Kubernetes API to retrieve the data, and can be configured to run at desired intervals.

The data these plug-ins provide ends up in our analytics data repository and instantly becomes available for mining, reporting, baselining and visualization. The data retention period is at least 90 days, which offers ample time to go back and perform an exhaustive root cause analysis (RCA). It also allows you to reduce the retention interval of events in the cluster itself. (By default, this is set to one hour.)

We use the collected data to build dynamic baselines, set up health rules and create alerts. The health rules, baselines and aggregate data points can then be displayed on custom dashboards where operators can see the norms and easily spot any deviations.

An example of a customizable Kubernetes dashboard.

What We Monitor and Why

Cluster Nodes

At the foundational level, we want monitoring operators to keep an eye on the health of the nodes where the cluster is deployed. Typically, you would have a cluster of masters, where core Kubernetes components (api-server, controller-manager, kube-schedule, etc.) are deployed, as well as a highly available etcd cluster and a number of worker nodes for guest applications. To paint a complete picture, we combine infrastructure health metrics with the relevant cluster data gathered by our Kubernetes data collectors.

From an infrastructure point of view, we track CPU, memory and disk utilization on all the nodes, and also zoom into the network traffic on etcd. In order to spot bottlenecks, we look at various aspects of the traffic at a granular level (e.g., reads/writes and throughput). Kubernetes and OpenShift clusters may suffer from memory starvation, disks overfilled with logs or spikes in consumption of the API server and, consequently, the etcd. Ironically, it is often monitoring solutions that are known for bringing clusters down by pulling excessive amounts of information from the Kubernetes APIs. It is always a good idea to establish how much monitoring is enough and dial it up when necessary to diagnose issues further. If a high level of monitoring is warranted, you may need to add more masters and etcd nodes. Another useful technique, especially with large-scale implementations, is to have a separate etcd cluster just for storing Kubernetes events. This way, the spikes in event creation and event retrieval for monitoring purposes won’t affect performance of the main etcd instances. This can be accomplished by setting the –etcd-servers-overrides flag of the api-server, for example:

–etcd-servers-overrides =/events#https://etcd1.cluster.com:2379;https://etcd2. cluster.com:2379;https://etcd3. cluster.com:2379

From the cluster perspective we monitor resource utilization across the nodes that allow pod scheduling. We also keep track of the pod counts and visualize how many pods are deployed to each node and how many of them are bad (failed/evicted).

A dashboard widget with infrastructure and cluster metrics combined.

Why is this important? Kubelet, the component responsible for managing pods on a given node, has a setting, –max-pods, which determines the maximum number of pods that can be orchestrated. In Kubernetes the default is 110. In OpenShift it is 250. The value can be changed up or down depending on need. We like to visualize the remaining headroom on each node, which helps with proactive resource planning and to prevent sudden overflows (which could mean an outage). Another data point we add there is the number of evicted pods per node.

Pod Evictions

Evictions are caused by space or memory starvation. We recently had an issue with the disk space on one of our worker nodes due to a runaway log. As a result, the kubelet produced massive evictions of pods from that node. Evictions are bad for many reasons. They will typically affect the quality of service or may even cause an outage. If the evicted pods have an exclusive affinity with the node experiencing disk pressure, and as a result cannot be re-orchestrated elsewhere in the cluster, the evictions will result in an outage. Evictions of core component pods may lead to the meltdown of the cluster.

Long after the incident where pods were evicted, we saw the evicted pods were still lingering. Why was that? Garbage collection of evictions is controlled by a setting in kube-controller-manager called –terminated-pod-gc-threshold.  The default value is set to 12,500, which means that garbage collection won’t occur until you have that many evicted pods. Even in a large implementation it may be a good idea to dial this threshold down to a smaller number.

If you experience a lot of evictions, you may also want to check if kube-scheduler has a custom –policy-config-file defined with no CheckNodeMemoryPressure or CheckNodeDiskPressure predicates.

Following our recent incident, we set up a new dashboard widget that tracks a metric of any threats that may cause a cluster meltdown (e.g., massive evictions). We also associated a health rule with this metric and set up an alert. Specifically, we’re now looking for warning events that tell us when a node is about to experience memory or disk pressure, or when a pod cannot be reallocated (e.g., NodeHasDiskPressure, NodeHasMemoryPressure, ErrorReconciliationRetryTimeout, ExceededGracePeriod, EvictionThresholdMet).

We also look for daemon pod failures (FailedDaemonPod), as they are often associated with cluster health rather than issues with the daemon set app itself.

Pod Issues

Pod crashes are an obvious target for monitoring, but we are also interested in tracking pod kills. Why would someone be killing a pod? There may be good reasons for it, but it may also signal a problem with the application. For similar reasons, we track deployment scale-downs, which we do by inspecting ScalingReplicaSet events. We also like to visualize the scale-down trend along with the app health state. Scale-downs, for example, may happen by design through auto-scaling when the app load subsides. They may also be issued manually or in error, and can expose the application to an excessive load.

Pending state is supposed to be a relatively short stage in the lifecycle of a pod, but sometimes it isn’t. It may be good idea to track pods with a pending time that exceeds a certain, reasonable threshold—one minute, for example. In AppDynamics, we also have the luxury of baselining any metric and then tracking any configurable deviation from the baseline. If you catch a spike in pending state duration, the first thing to check is the size of your images and the speed of image download. One big image may clog the pipe and affect other containers. Kubelet has this flag, –serialize-image-pulls, which is set to “true” by default. It means that images will be loaded one at a time. Change the flag to “false” if you want to load images in parallel and avoid the potential clogging by a monster-sized image. Keep in mind, however, that you have to use Docker’s overlay2 storage driver to make this work. In newer Docker versions this setting is the default. In addition to the Kubelet setting, you may also need to tweak the max-concurrent-downloads flag of the Docker daemon to ensure the desired parallelism.

Large images that take a long time to download may also cause a different type of issue that results in a failed deployment. The Kubelet flag –image-pull-progress-deadline determines the point in time when the image will be deemed “too long to pull or extract.” If you deal with big images, make sure you dial up the value of the flag to fit your needs.

User Errors

Many big issues in the cluster stem from small user errors (human mistakes). A typo in a spec—for example, in the image name—may bring down the entire deployment. Similar effects may occur due to a missing image or insufficient rights to the registry. With that in mind, we track image errors closely and pay attention to excessive image-pulling. Unless it is truly needed, image-pulling is something you want to avoid in order to conserve bandwidth and speed up deployments.

Storage issues also tend to arise due to spec errors, lack of permissions or policy conflicts. We monitor storage issues (e.g., mounting problems) because they may cause crashes. We also pay close attention to resource quota violations because they do not trigger pod failures. They will, however, prevent new deployments from starting and existing deployments from scaling up.

Speaking of quota violations, are you setting resource limits in your deployment specs?

Policing the Cluster

On our OpenShift dashboards, we display a list of potential red flags that are not necessarily a problem yet but may cause serious issues down the road. Among these are pods without resource limits or health probes in the deployment specs.

Resource limits can be enforced by resource quotas across the entire cluster or at a more granular level. Violation of these limits will prevent the deployment. In the absence of a quota, pods can be deployed without defined resource limits. Having no resource limits is bad for multiple reasons. It makes cluster capacity planning challenging. It may also cause an outage. If you create or change a resource quota when there are active pods without limits, any subsequent scale-up or redeployment of these pods will result in failures.

The health probes, readiness and liveness are not enforceable, but it is a best practice to have them defined in the specs. They are the primary mechanism for the pods to tell the kubelet whether the application is ready to accept traffic and is still functioning. If the readiness probe is not defined and the pods takes a long time to initialize (based on the kubelet’s default), the pod will be restarted. This loop may continue for some time, taking up cluster resources for no reason and effectively causing a poor user experience or outage.

The absence of the liveness probe may cause a similar effect if the application is performing a lengthy operation and the pod appears to Kubelet as unresponsive.

We provide easy access to the list of pods with incomplete specs, allowing cluster admins to have a targeted conversation with development teams about corrective action.

Routing and Endpoint Tracking

As part of our OpenShift monitoring, we provide visibility into potential routing and service endpoint issues. We track unused services, including those created by someone in error and those without any pods behind them because the pods failed or were removed.

We also monitor bad endpoints pointing at old (deleted) pods, which effectively cause downtime. This issue may occur during rolling updates when the cluster is under increased load and API request-throttling is lower than it needs to be. To resolve the issue, you may need to increase the –kube-api-burst and –kube-api-qps config values of kube-controller-manager.

Every metric we expose on the dashboard can be viewed and analyzed in the list and further refined with ADQL, the AppDynamics query language. After spotting an anomaly on the dashboard, the operator can drill into the raw data to get to the root cause of the problem.

Application Monitoring

Context plays a significant role in our monitoring philosophy. We always look at application performance through the lens of the end-user experience and desired business outcomes. Unlike specialized cluster-monitoring tools, we are not only interested in cluster health and uptime per se. We’re equally concerned with the impact the cluster may have on application health and, subsequently, on the business objectives of the app.

In addition to having a cluster-level dashboard, we also build specialized dashboards with a more application-centric point of view. There we correlate cluster events and anomalies with application or component availability, end-user experience as reported by real-user monitoring, and business metrics (e.g., conversion of specific user segments).

Leveraging K8s Metadata

Kubernetes makes it super easy to run canary deployments, blue-green deployments, and A/B or multivariate testing. We leverage these conveniences by pulling deployment metadata and using labels to analyze performance of different versions side by side.

Monitoring Kubernetes or OpenShift is just a part of what AppDynamics does for our internal needs and for our clients. AppDynamics covers the entire spectrum of end-to-end monitoring, from the foundational infrastructure to business intelligence. Inherently, AppDynamics is used by many different groups of operators who may have very different skills. For example, we look at the platform as a collaboration tool that helps translate the language of APM to the language of Kubernetes and vice versa.

By bringing these different datasets together under one umbrella, AppDynamics establishes a common ground for diverse groups of operators. On the one hand you have cluster admins, who are experts in Kubernetes but may not know the guest applications in detail. On the other hand, you have DevOps in charge of APM or managers looking at business metrics, both of whom may not be intimately familiar with Kubernetes. These groups can now have a productive monitoring conversation, using terms that are well understood by everyone and a single tool to examine data points on a shared dashboard.

Learn more about how AppDynamics can help you monitor your applications on Kubernetes and OpenShift.

The Serverless Revolution: Why and How The Movement Will Allow Teams to Deploy With More Velocity and Confidence

The serverless movement has recently received another jolt of momentum with Google’s recent release of KNative. Leveraging some of the hottest technologies including Kubernetes and Istio, KNative raises the bar of serverless by supporting a wide set of platforms including Pivotal Cloud Foundry and Red Hat OpenShift. With such a set of heavy hitters, if you are not onboard with serverless, you’ll be left behind, right?

The Rise of Serverless

When building a new application, adding additional features, or preparing for a spike in usage, capacity planning is an important part of the application lifecycle. Part science and part art, determining infrastructure needs while negotiating with stakeholders is not a frivolous activity. From a development standpoint, software engineers strive to make their solution the most efficient—from code reviews, code-coverage tools and APM solutions to help profile their applications to take the least amount of resources.

Building the algorithms/functions is usually just half of the equation for a software engineer. The other half is navigating the application infrastructure. In the Java world, tuning the application server/Java runtime is not uncommon. Having an understanding of what your application is doing is just as important as where/what is running the application.

The other half of where/what is that running your application takes away from the core innovation work that software engineers strive for. Imagine a paradigm where you can just focus on the function or business logic without having to worry too much about where your application is running, or even scaling the function. You are not alone, thus the serverless boom is underway.

The Mighty Lambda

For those beginning their serverless journey, a popular starting point is AWS Lambda. Amazon introduced its Lambda service in November 2014. The premise of a function (in this case, a Lambda function) is triggered by an event; this is an event-driven architecture. Though in Amazon’s case, the underlying compute is managed by an AWS service.

Your First Lambda

Amazon has good documentation on creating your first Lambda function. Another source I used to get up to speed is FreeCodeCamp, which has a detailed step-by-step on making a NodeJS Lambda to determine if a string is a palindrome.

The ecosystem pieces to take into account of a Lambda can be as broken down as:

  1. Trigger/invoke the Lambda: An event needs to trigger a Lambda. AWS provides a SDK in several languages to invoke a Lambda. Infrastructure can be placed in front of the Lambda—for example, a message broker to queue up events to be processed.
  2. Scale/react to Lambda demand: As demand for the function increases, the Lambda has the ability to scale elastically. Understanding how a Lambda runs concurrently, as well as the pricing implications, are important considerations.
  3. Output—what your Lambda is all about: Like any event-driven architecture, a response or another event is expected. Output can be written to a downstream system such as ElastiCache, or returned as a simple response. Logging becomes even more important with Lambdas. This is due to the fact that the application infrastructure logging that engineers are used to on-prem is not there. In the case of a Java Lambda, from an engineering prospective configuring Log4J is not terribly different than in the non-serverless world.

So Why Aren’t Our Servers at the Bottom of a Lake?

Because serverless, like every other technology architecture, has its pros and cons. Using a cloud vendor’s function service can lead to rough edges around portability and observability. The old adage, “The cloud is just someone else’s computer,” is sage advice, especially when it comes to serverless. After all, your function still has to run somewhere.

Portability

After the success of Amazon’s Lambda, other major cloud vendors have rolled out their renditions of function services, including Google Cloud and Azure. One major drawback of investing too heavily into one cloud service SDK is the increased difficulty of running the service somewhere else (i.e., lock-in). One might argue that “Java is Java” and “I did not import anything from com.amazonaws, so pound sand, author.” Salient and good points, but don’t forget the surrounding ecosystem to trigger, scale, and monitor the Lambda. To achieve greater portability, imagine having three sets of deployment scripts, one per major public cloud vendor, with tests/mocks for each and every time there is change.

Observability

One of the biggest objections to serverless focuses on observability. What is observability? Given an output, how well did a system perform. Going back to FreeCodeCamp’s palindrome example, if a string is a palindrome, how efficient was the system in determining the string was a palindrome? As systems become more distributed, we start to run into challenges with observability. Serverless.com has a pretty good roundup of observability tools for the public cloud (or private cloud, in some cases).

The ability to instrument/profile your application infrastructure with the same level of control provided by an on-prem deployment might not be possible on a cloud vendor with multiple tenants. Taking the observability point a step further, there can be inherent challenges with observability; for instance, trying to paint a picture with just logs. We’ve written extensively on AppDynamics’ latest thinking on serverless and Lambda monitoring as well.

Not Only the Public Cloud

Functions are not limited to cloud providers. The enterprise can host its own function-as-a-service infrastructure to provide its internal clients the ability to leverage functions. Open Whisk and Open FaaS are popular alternatives to cloud vendor implementations. Open Whisk can be deployed via Kubernetes and Mesos, too.

KNative, the Holy Grail

There certain has been a lot of buzz around KNative. (Read Google’s initial blog to see how they believe KNative will change the face of serverless computing.) Comparing KNative to other serverless implementations, Google is certainly exposing more of how the sausage is made. By using Kubernetes as the orchestrator and Istio service mesh as the underpinnings of KNative, Google is addressing key concerns involving portability and observability in serverless environments.

How AppDynamics Intersects with KNative

AppDynamics already provides strong integration with Kubernetes-based workloads. And with PaaS providers starting to embrace KNative, AppDynamics has existing integrations with PaaS vendors, too.

All of the potential investment in KNative will allow development teams to deploy with increased velocity and confidence. The ability to have a clear understanding of business/application performance will be crucial for the continued growth of serverless.

(Above: The AppDynamics platform comparing a canary or dark feature release to a Kubernetes-orchestrated service.)

Serverless Revolution Continues

With KNative, the needle of serverless moves further towards enterprise legitimacy. As the project starts to garner more attention, the enterprise will take a closer look at serverless and how to address its biggest hurdles around portability and observability.

I’m a big fan of the “Awesome” lists on GitHub. Check out the Awesome Serverless list, where you can see the advances in the serverless revolution.

As technology marches toward application nirvana, where organizations don’t have to worry about the complexities of scaling applications, serverless will be an important part of the equation. Look to AppDynamics to help you navigate this exciting new world!

Advances In Mesh Technology Make It Easier for the Enterprise to Embrace Containers and Microservices

More enterprises are embracing containers and microservices, which bring along additional networking complexities. So it’s no surprise that service meshes are in the spotlight now. There have been substantial advances recently in service mesh technologies—including Istio’s 1.0, Hashi Corp’s Consul 1.2.1, and Buoyant merging Conduent into LinkerD—and for good reason.

Some background: service meshes are pieces of infrastructure that facilitate service-to-service communication—the backbone of all modern applications. A service mesh allows for codifying more complex networking rules and behaviors such as a circuit breaker pattern. AppDev teams can start to rely on service mesh facilities, and rest assured their applications will perform in a consistent, code-defined manner.

Endpoint Bloom

The more services and replicas you have, the more endpoints you have. And with the container and microservices boom, the number of endpoints is exploding. With the rise of Platform-as-a-Services and container orchestrators, new terms like ingress and egress are becoming part of the AppDev team vernacular. As you go through your containerization journey, multiple questions will arise around the topic of connectivity. Application owners will have to define how and where their services are exposed.

The days of providing the networking team with a context/VIP to add to web infrastructure—such as services.acme.com/shoppingCart over port 443—are fading. Today, AppDev teams are more likely to hand over a Kubernetes YAML to add services.acme.com/shoppingCart to the Ingress controller, and then describe a behavior. Example: the shopping cart Pod needs to talk to the shopping cart validation Pod, which can only be accessed by the shopping cart because the inventory is kept on another set of Reddis Pods, which can’t be exposed to the outside world.

You’re juggling all of this while navigating constraints set by defined and deployed Kubernetes networking. At this point, don’t be alarmed if you’re thinking, “Wow, I thought I was in AppDev—didn’t know I needed a CCNA to get my application deployed!”

The Rise of the Service Mesh

When navigating the “fog of system development,” it’s tricky to know all the moving pieces and connectivity options. With AppDev teams focusing mostly on feature development rather than connectivity, it’s very important to make sure all the services are discoverable to them. Investments in API management are the norm now, with teams registering and representing their services in an API gateway or documenting them in Swagger, for example.

But what about the underlying networking stack? Services might be discoverable, but are they available? Imagine a Venn diagram of AppDev vs. Sys Engineer vs. SRE: Who’s responsible for which task? And with multiple pieces of infrastructure to traverse, what would be a consistent way to describe networking patterns between services?

Service Mesh to the Rescue

Going back to the endpoint bloom, consistency and predictability are king. Over the past few years, service meshes have been maturing and gaining popularity. Here are some great places to learn more about them:

Service Mesh 101

In the Istio model, applications participate in a service mesh. Istio acts as the mesh, and then applications can participate in the mesh via a sidecar proxy—Envoy, in Istio’s case.

Your First Mesh

DZone has a very well-written article about standing up your first Java application in Kubernetes to participate in an Istio-powered service mesh. The article goes into detail about deploying Istio itself in Kubernetes (in this case, MinuKube). For an AppDev team, the new piece would be creating the all-important routing rules, which are deployed to Istio.

Which One of these Meshes?

The New Stack has a very good article comparing the pros and cons of the major service mesh providers. The post lays out the problem in granular format, and discusses which factors you should consider to determine if your organization is even ready for a service mesh.

Increasing Importance of AppDynamics

With the advent of the service mesh, barriers are falling and enabling services to communicate more consistently, especially in production environments.

If tweaks are needed on the routing rules—for example, a time out—it’s best to have the ability to pinpoint which remote calls would make the most sense for this task. AppDynamics has the ability to examine service endpoints, which can provide much-needed data for these tweaks.

For the service mesh itself, AppDynamcs in Kubernetes can even monitor the health of your applications deployed on a Kubernetes cluster.

With the rising velocity of new applications being created or broken into smaller pieces, AppDynamics can help make sure all of these components are humming at their optimal frequency.

Monitor Amazon EKS with AppDynamics

On the heels of announcing the general availability of AppDynamics for Kubernetes at KubeCon Europe, we’ve partnered with Amazon Web Services (AWS) to bring Amazon EKS to the broader Kubernetes community. AppDynamics provides enterprise-grade, end-to-end performance monitoring for applications orchestrated by Kubernetes.

Amazon EKS, AWS’s managed Kubernetes service, shoulders the heavy lifting of installing and operating your own Kubernetes clusters. Beyond the operational agility and simplicity in managing Kubernetes clusters, Amazon EKS brings additional value to enterprises, including the following:

1. Choice and Portability: Built on open source and upstream Kubernetes, EKS passes CNCF’s conformance tests, enabling enterprises to run applications confidently on EKS without having to make changes to the app or learn new Kubernetes tooling. You can choose where to run applications on various Kubernetes deployment venues—on-premises, AWS clusters managed with kops, Amazon EKS, or any other cloud provider.

2. High Availability: EKS deploys the control plane in at least two availability zones, monitors the health of the master nodes, and re-instantiates the master nodes, if needed, automatically. Additionally, it patches and updates Kubernetes versions.

3. Network Isolation and Performance: Worker nodes run in the subnets within your VPC, giving you control over network isolation via security groups.

Amazon EKS brings VPC networking to Kubernetes pods and removes the burden of running and managing overlay networking fabric. CNI plugin runs as a DaemonSet on every node, and allocates an IP address to every pod from the pool of secondary IP addresses attached to the elastic network interface (ENI) of the worker node instance. Communication between control plane and worker nodes occurs over AWS networking backbone, resulting in better performance and security.

Monitoring Amazon EKS with AppDynamics

EKS makes it easier to operate Kubernetes clusters; however, performance monitoring remains one of the top challenges in Kubernetes adoption. In fact, according to a recent CNCF survey, 46% of enterprises reported monitoring as their biggest challenge. Specifically, organizations deploying containers on the public cloud, cite monitoring as a big challenge. Perhaps because cloud providers monitoring tools may not play well with organization’s existing tools which are used to monitor on-premises resources.

We are therefore excited that AppDynamics and AWS have teamed up to accelerate your EKS adoption.

How Does it Work?

AppDynamics seamlessly integrates into EKS environments. The machine agent runs as a DaemonSet on EKS worker nodes, and application agents are deployed alongside your application binaries within the application pods. Out-of-the-box integration gives you the deepest visibility into EKS cluster health, AWS resources and Docker containers, and provides insights into the performance of every microservice deployed—all through a single pane of glass.

 

Unified, end-to-end monitoring helps AppDynamics’ customers expedite root-cause analysis, reduce MTTR, and confidently adopt modern application architectures such as microservices. AppDynamics provides a consistent approach to monitoring applications orchestrated by Kubernetes regardless where the clusters are deployed – on Amazon EKS or on-premises enabling enterprises leverage their existing people, processes and tools.

Correlate Kubernetes performance with business metrics: For deeper visibility into business performance, organizations can create tagged metrics, such as customer conversion rate or revenue per channel correlated with the performance of applications on the Kubernetes platform. Health rules and alerts based on business metrics provide intelligent validation so that every code release can drive business outcomes.

Get Started Today

To get started with enterprise-grade monitoring of EKS follow these easy steps:

1. Sign-up for a free AppDynamics trial and configure the environment’s ConfigMap. Sample configuration and instructions are available on our GitHub page.

2. Create the EKS cluster and worker nodes, and configure kubectl with the EKS control plane endpoint. Deploy your Kubernetes services and deployments.

3. Start end-to-end performance monitoring with AppDynamics!

The AppD Approach: Monitoring Kubernetes Events

Just recently we launched AppDynamics for Kubernetes, giving enterprises end-to-end, unified visibility into their entire Kubernetes stack and Kubernetes-orchestrated applications for both on-prem and public cloud environments. Our industry-leading APM provides visibility into Kubernetes by leveraging labels such as Namespace, Pod or ReplicaSet. And AppDynamics customers can organize, group, query or filter Kubernetes objects or performance metrics based on labels.

Of course, we’re always finding ways to make things better. As a preview of what’s to come, we’re now offering the AppDynamics Kubernetes Events Monitor Extension, which we plan to incorporate into future builds of our core solution.

Events Monitoring

In our 4.4.3 release, built-in Kubernetes capabilities focus on monitoring the containers and applications that run on top of Kubernetes. This new extension adds the additional capability of monitoring metrics provided by the Kubernetes Events API.

By monitoring these events, our extension enables enterprises to troubleshoot everything that goes wrong in the Kubernetes orchestration platform—from scaling up/scaling down, new deployments, deleting applications, creating new applications, and so on. If an event goes to a warning state, users can drill down into the warning to see where it occurred, making troubleshooting easier.

How It Works

Kubernetes usually stores events for a certain amount of time, which by default isn’t very long. After an hour, in fact, the events get purged. The AppDynamics Machine Agent, in addition to being used to report on basic hardware metrics (CPU, memory, disk, etc.) is the hook for custom extensions, including our new Kubernetes Events Monitor Extension.

It’s fairly easy to install our new extension. You’ll find detailed instructions here, but here’s a quick overview:

  • Deploy the AppDynamics Machine Agent as you normally would, and then add the Kubernetes Events Monitor Extension to it. If you’re deploying the Machine Agent using Docker, as a Kubernetes daemonset, simply add the extension to the container.

  • Configuration is simple. The extension just needs to know how to connect to the Kubernetes Cluster (from your kubectl client config), as well as your credentials for logging into the AppDynamics platform.

  • Once setup is complete, you’ll be able to push Kubernetes events to AppDynamics.

Once configured, the Kubernetes Events Monitor Extension will query Kubernetes events every minute.

The extension tracks all events happening in Kubernetes, including time-stamping information and messages. Below is a sample dashboard:

Here’s a closer view:

The Event Details view provides more information. Below, the Message field shows that Kubernetes tried to attach a volume to a running deployment, but couldn’t mount it due to a timeout.

The Events Monitor Extension also shows the Kubernetes Namespace—important information for locating where the event occurred (e.g., the specific host and component) in the Kubernetes cluster.

Like all AppDynamics extensions, the Kubernetes Events Monitor Extension is user-configurable. For example, with a few simple edits to the extension configuration file, you can change the default one-minute query time for Kubernetes events, the duration of timeouts, and other settings.

Seamless Integration with Business iQ

The Kubernetes extension takes full advantage of the Business iQ real-time performance monitoring toolkit, allowing you to create metrics, visualizations and alarms. You can also use Business iQ to analyze a transaction in conjunction with Kubernetes events.

Below are some sample visualizations:​

Our new extension adds the powerful capability of monitoring Kubernetes events to our industry-leading AppDynamics for Kubernetes solution. Get started today!

Future features and functionality are subject to change at the sole discretion of AppDynamics, and AppDynamics will have no liability for delay in the delivery or failure to deliver any of the features and functionality set forth in this document.

Migrating from Docker Compose to Kubernetes

The AppDynamics Demo Platform never sleeps. It is a cloud-based system that hosts a number of applications designed to help our global sales team demonstrate the many value propositions of AppDynamics.

Last fall, we added several new, larger applications to our demo platform. With these additions, our team started to see some performance challenges with our standard Docker Compose application deployment model on a single host. Specifically, we wanted to support multiple host machines as opposed to being limited to a single host machine like Docker Compose. We had been talking about migrating to Kubernetes for several months before this and so we knew it was time to take the leap.

Before this I had extensive experience with dockerized applications and even with some Kubernetes-managed applications. However, I had never taken part in the actual migration of an application from Docker Compose to Kubernetes.

For our first attempt at migrating to Kubernetes, we chose an application that was relatively small, but which contained a variety of different elements—Java, NodeJS, GoLang, MySQL and MongoDB. The application used Docker Compose for container deployment and “orchestration.” I use the term orchestration loosely, because Docker Compose is pretty light when compared to Kubernetes.

Docker Compose

For those who have never used Docker Compose, it’s a framework that allows developers to define container-based applications in a single YAML file. This definition includes the Docker images used, exposed ports, dependencies, networking, etc. Looking at the snippet below, each block of 5 to 20 lines represents a separate service. Docker Compose is a very useful tool and makes application deployment fairly simple and easy.

Figure 1.1 – docker-compose.yaml Snippet

Preparing for the Migration

The first hurdle to converting the project was learning how Kubernetes is different from Docker Compose. One of the most dramatic ways it differs is in container-to-container communication.

In a Docker Compose environment, the containers all run on a single host machine. Docker Compose creates a local network that the containers are all part of. Take this snippet, for example:

This block will create a container called quoteServices with a hostname of quote-services and port 8080. With this definition, any container within the local Docker Compose network can access it using http://quote-services:8080. Anything outside of the local network would have to know the IP address of the container.

By comparison, Kubernetes usually runs on multiple servers called nodes, so it can’t simply create a local network for all the containers. Before we started, I was very concerned that this might lead to many code changes, but those worries would prove to be unfounded.

Creating Kubernetes YAML Files

The best way to understand the conversion from Docker Compose to Kubernetes is to see a real example of what the conversion looks like. Let’s take the above snippet of quoteServices and convert it to a form that Kubernetes can understand.

The first thing to understand is that the above Docker Compose block will get converted into two separate sections, a Deployment and a Service.

As its name implies, the deployment tells Kubernetes most of what it needs to know about how to deploy the containers. This information includes things like what to name the containers, where to pull the images from, how many containers to create, etc. The deployment for quoteServices is shown here:

As we mentioned earlier, networking is done differently in Kubernetes than in Docker Compose. The Service is what enables communication between containers. Here is the service definition for quoteServices:

This service definition tells Kubernetes to take the containers that have a name = quoteServices, as defined under selector, and to make them reachable using quote-services as hostname and port 8080. So again, this service can be reached at http://quote-services:8080 from within the Kubernetes application. The flexibility to define services this way allows us to keep our URLs intact within our application, so no changes are needed due to networking concerns.

By the end, we had taken a single Docker Compose file with about 24 blocks and converted it into about 20 different files, most of which contained a deployment and a service. This conversion was a big part of the migration effort. Initially, to “save” time, we used a tool called Kompose to generate deployment and services files automatically. However, we ended up rewriting all of the files anyway once we knew what we were doing. Using Kompose is sort of like using Word to create webpages. Sure, it works, but you’re probably going to want to re-do most of it once you know what you’re doing because it adds a lot of extra tags that you don’t really want.

Instrumenting AppDynamics

This was the easy part. Most of our applications are dockerized, and we have always monitored these and our underlying Docker infrastructure with AppDynamics. Because our Docker images already had application agents baked in, there was nothing we had to change. If we had wanted, we could have left them the way they were, and they would have worked just fine. However, we decided to take advantage of something that is fairly common in the Kubernetes world: sidecar injection.

We used the sidecar model to “inject” the AppDynamics agents into the containers. The advantage of this is that we can now update our agents without having to rebuild our application images and redeploy them. It is also more fitting with best practices. To update the agent, all we have to do is update our sidecar image, then change the tag used by the application container. Just like that, our application is running with a new agent!

Server Visibility Agent

Incorporating the Server Visibility (SVM) agent was also fairly simple. One difference to note is that Docker Compose runs on a single host, whereas Kubernetes typically uses multiple nodes, which can be added or removed dynamically.

In our Docker Compose model, our SVM agent was deployed to a single container, which monitored both the host machine and the individual containers. With Kubernetes, we would have to run one such container on each node in the cluster. The best way to do this is with a structure called a DaemonSet.

You can see from the snippet below that a DaemonSet looks a lot like a Deployment. In fact, the two are virtually identical. The main difference is how they act. A Deployment typically doesn’t say anything about where in the cluster to run the containers defined within it, but it does state how many containers to create. A DaemonSet, on the other hand, will run a container on each node in the cluster. This is important, because the number of nodes in a cluster can increase or decrease at any time.

Figure: DaemonSet definition

What Works Great

From development and operations perspectives, migrating to Kubernetes involves some extra overhead, but there are definite advantages. I’m not going to list all the advantages here, but I will tell you about my two favorites.

First of all, I love the Kubernetes Dashboard. It shows information on running containers, deployments, services, etc. It also allows you to update/add/delete any of your definitions from the UI. So when I make a change and build a new image, all I have to do is update the image tag in the deployment definition. Kubernetes will then delete the old containers and create new ones using the updated tag. It also gives easy access to log files or a shell to any of the containers.

Figure: Kubernetes Dashboard

Another thing that worked well for us is that we no longer need to keep and maintain the host machines that were running our Docker Compose applications. Part of the idea behind containerizing applications is to treat servers more like cattle than pets. While this is true to an extent, the Docker Compose host machines have become the new pets. We have seen issues with the host machines starting to have problems, needing maintenance, running out of disk space, etc. With Kubernetes, there are no more host machines, and the nodes in the cluster can be spun up and down anytime.

Conclusion

Before starting our Kubernetes journey, I was a little apprehensive about intra-application networking, deployment procedures, and adding extra layers to all of our processes. It is true that we have added a lot of extra configuration, going from a 300-line docker-compose.yaml file to about 1,000 lines spread over 20 files. This is mostly a one-time cost, though. We also had to rewrite some code, but that needed to be rewritten anyway.

In return, we gained all the advantages of a real orchestration tool: scalability, increased visibility of containers, easier server management, and many others. When it comes time to migrate our next application, which won’t be too far away, the process will be much easier and quicker.

Other Resources

The Illustrated Children’s Guide to Kubernetes

Getting Started with Docker

Kubernetes at GitHub

Migrating a Spring Boot service

 

Introducing AppDynamics for Kubernetes

Today we’re excited to announce AppDynamics for Kubernetes, which will give enterprises end-to-end, unified visibility into their entire Kubernetes stack and Kubernetes-orchestrated applications for both on-premises and public cloud environments. Enterprises use Kubernetes to fundamentally transform how they deploy and run applications in distributed, multicloud environments. With AppDynamics for Kubernetes, they will have a production-grade monitoring solution to deliver a flawless end-user experience.

Why is Kubernetes so popular? Because it delivers on the promise of doing more with less. By leveraging the portability, isolation, and immutability provided by containers and Kubernetes, development teams can ship more features faster by simplifying application packaging and deployment—all while keeping the application highly available without downtime. And Kubernetes’ self-healing properties not only enables operations teams to ensure application reliability and hyper-scalability but also boost efficiency through increased resource utilization.

According to the latest survey by the Cloud Native Computing Foundation (CNCF), 69% of respondents said Kubernetes was their top choice for container orchestration. And Gartner recently proclaimed that “Kubernetes has emerged as the de facto standard for container orchestration.” The rapid expansion of Kubernetes is also due to the vibrant community. With over 35,000 GitHub stars and some 1,600 unique contributors spanning every timezone, Kubernetes is the most engaged community on GitHub.

Challenges Emerge

Kubernetes brings, however, new operational workflows and complexities, many involving application performance management. As enterprises expand the use of Kubernetes beyond dev/test and into production environments, these challenges become even more profound.

The CNCF survey reveals that 38% of respondents identified monitoring as one of their biggest Kubernetes-adoption challenges—one that grows even larger to 46% as the size of the enterprise increases.

Shortcomings of Current Monitoring Approaches

When experimenting with Kubernetes in dev/test environments, organizations typically either start with the monitoring tools that come with Kubernetes or use those that are developed, maintained and supported by the community. Examples include the Kubernetes dashboard, kube-state-metrics, cAdvisor or Heapster. While these tools provide information about the current health of Kubernetes, they lack data storage capabilities. So either InfluxDB or Prometheus (two popular time-series databases) is added to provide persistence. For data visualization, open-source tools such as Grafana or Kibana are tacked on. The system still lacks log collection, though, so log collectors are added as well. Quickly, organizations realize that monitoring Kubernetes is much more involved than capturing metrics.

But wait: additional third-party integration may be needed to achieve reliability. By default, monitoring data is stored on the local disk susceptible to failure due to node outages. And to secure access to their data, organizations must develop or integrate additional tools for authentication and role-based access control (RBAC). Bottom line: While this approach may work well for small development or DevOps teams, a production-grade solution is needed, especially as enterprises start to adopt Kubernetes for their mission-critical applications.

Unfortunately, traditional APM tools often aren’t up to the task here, as they fail to address the dynamic nature of application provisioning in Kubernetes, as well as the complexities of microservices architecture.

Introducing AppDynamics for Kubernetes

The all-new AppDynamics for Kubernetes will give organizations the deepest visibility into application and business performance. With it, companies will have unparalleled insights into containerized applications, Kubernetes clusters, Docker containers, and underlying infrastructure metrics—all through a single pane of glass.

To effectively monitor the performance of applications deployed in Kubernetes, organizations must reimagine their monitoring strategies. In Kubernetes, containerized applications are deployed on pods, which are dynamically created on virtual groups or clusters called namespaces. Since Kubernetes decouples developers and operations from deploying to specific machines, it significantly simplifies day-to-day operations by abstracting the underlying infrastructure. However, this results in limited control over which physical machine the pods are deployed to, as shown in Fig. 1 below:

kubernetes_monitoring

Fig. 1: Dynamic deployments of applications across a Kubernetes cluster.

To gather performance metrics for any resource, AppDynamics leverages labels, the identifying metadata and foundation for grouping, searching, filtering and managing Kubernetes objects. This enables organizations to gather performance insights and set intelligent thresholds and alerts for the performance of Pods, Namespace, ReplicaSets, Services, Deployment and other Kubernetes labels.

With AppDynamics for Kubernetes, enterprises can:

  1. Achieve end-to-end visibility: From end-user touch points such as a browser, mobile app, or IoT device, all the way to the Kubernetes platform, AppDynamics provides line-of-code-level detail for every application deployed (either traditional app or microservice), granular metrics on Docker container resources, infrastructure metrics, log analytics, and the performance of every database query—all correlated and within the context of Business Transactions, a logical representation of end-user interaction with applications. AppDynamics for Kubernetes will help enterprises avoid silos, and enables them to leverage existing skill sets and processes to monitor Kubernetes and non-Kubernetes applications from a unified monitoring solution across multiple, hybrid clouds.
  2. Expedite root cause analysis: Cascading failures from microservices can cause alert storms. Triaging the root cause via traditional monitoring tools is often time-consuming, and can lead to finger-pointing in war-room scenarios. By leveraging unique machine learning capabilities, AppDynamics makes it simple to identify the root cause of failure.
  3. Correlate Kubernetes performance with business metrics: For deeper visibility into business performance, organizations can create tagged metrics, such as customer conversion rate or end-user experience correlated with the performance of applications on the Kubernetes platform. Health rules and alerts based on business metrics provide intelligent validation so that every code release can drive business outcomes.
  4. Get a seamless, out-of-the-box experience: AppDynamics’ Machine agent is deployed by Kubernetes as a DaemonSet on all the worker nodes, thereby leveraging Kubernetes’ capability to ensure that the AppDynamics agent is always running and reporting performance data.
  5. Accelerate ‘Shift-Left’: AppDynamics is integrated with Cisco CloudCenter, which creates immutable application profiles with built-in AppDynamics agents. Leveraging the capability, customers can dramatically streamline Day 2 operations of application deployment in various Kubernetes environments, such as dev, test and pre-production. And proactive monitoring enables customers to catch performance-related issues before they impact the user experience. Go here to learn more about Cisco CloudCenter.

AppDynamics at KubeCon Europe

We are excited to be a sponsor of KubeCon + CloudNativeCon Europe 2018, a premier Kubernetes and cloud-native event. Our team will be there in full force to help you get started with production-grade monitoring of your Kubernetes deployments. And don’t forget to load up on cool new AppD schwag at the event.

Stop by AppD booth S-C36 in the expo hall. Additionally, I will be presenting the following sessions at Cisco Lounge in the expo hall:

  1. Introduction to Application Performance Monitoring—Wed-Fri, May 2-4, 12:30 PM
  2. Enterprise-grade Application Performance Monitoring for Kubernetes—Wed-Thu, May 2-3, 3:30 PM, Friday 3:00 PM

We are looking forward to engaging with all of our fellow Kubernauts. See you in Copenhagen!