4 Cluster Management Tools to Compare

Cloud-based infrastructure, containers, microservices, and new programming platforms are dominating the media and sweeping across IT departments around the world. For example, the use of Docker containers has exploded in the last few months. In our Introduction to Docker blog, we noted they have delivered 2 billion image “pulls”; in November of 2015, that total was at 1.2 billion. This is a clear indication of the growth of container technology in organizations ranging from large international companies to smaller start-ups.

Overview of Cluster Management Tools

Clearly, containers are an exciting new advancement in creating and delivering applications. However, controlling a vast deployment of containers presents some complications. Containers must be matched with resources. Failures have to be resolved quickly. These challenges have led to a concurrent demand for cluster management and orchestration tools.

A cluster management tool is a software program that helps you manage a group of clusters through a graphical user interface or by accessing a command line. With this tool, you can monitor nodes in the cluster, configure services and administer the entire cluster server. Cluster management can vary from low-involvement activities such as sending work to a cluster to high-involvement work such as load-balancing and availability. In this article, we’re going to look at Swarm and three other popular cluster management tools and discuss their strengths and challenges.

1. Swarm – Docker

Docker Swarm lets you cluster a number of Docker engines into one virtual engine. In a distributed application environment, as the compute elements must also be distributed. Swarm allows you cluster Docker engines natively. With a single engine, applications can be scaled out faster and more effectively. Swarm can scale up to 50,000 containers and 1,000 nodes with no effect on performance as new containers are added to the cluster.

In addition, Swarm acts as the Docker API. Any tool that can operate with the Docker daemon can tap the power of Docker Swarm to scale across many hosts. These included disparate hosts like Flynn, Compose, Jenkins, and Drone.

Swarm can also be used as a frontend Docker client while running Mesos or Kubernetes in the backend. Swarm is a simple system at its heart: every host runs a Swarm agent and manager. The manager handles the operation and scheduling of containers. You can run it in high-availability situations – it uses Consul, ZooKeeper or etcd to send fail-over events to a backup system.

One of the advantages of Docker Swarm is that it is a native solution – you can implement Docker networking, plugins and volumes using Docker commands. The Swarm manager creates several masters and specific rules for leader election. These regulations are implemented in the event of a primary master failure. The Swarm scheduler features a variety of filters including affinity and node tags. Filters can attach containers to underlying nodes for better resource utilization and enhanced performance.

sticker-swarm.png

2. Fleet – Core OS

CoreOS was created to allow you to scale and manage compute capacity. Rather than installing a package through apt or yum, CoreOS leverages Linux containers to handle services at a higher abstraction level, providing advantages similar to virtual machines, but with the concentration on applications rather than complete virtualized hosts.

 

ceph-docker-heroku-slugs-coreos-and-deis-overview-15-638.jpg

 

Fleet allows you to conceptualize a CoreOS cluster in a scenario where it shares an individual init system. With fleet, every machine has an agent and an engine. A single engine is active at any time in the cluster, but the entire community of engines is active at all times. Fleet also can handle socket activation — containers can be activated to take care of a connection on a specific port. This allows the system to create processes when needed as opposed to waiting for demand.

Your DevOps personnel spend their time focusing on managing containers that are the building blocks of a service without worrying about potential problems that could crop up on single machines. Fleet makes sure containers stay in operation on a cluster. In the event of a machine failure, the containers are automatically moved to healthy machines.

3. Kubernetes – Google

Developed by Google, Kubernetes allows you to manage containerized applications across many different hosts. It gives you the tools to deploy, scale and maintain applications. The developers of Kubernetes focused on keeping it accessible, lightweight and easy to use. It can be used in a number of cloud environments including private, public, multi-cloud and hybrid. Designed to repair itself on the fly, it features auto-replication, auto-restart, and auto-placement. Endlessly extensible, it was built to be hookable, pluggable and modular. Completely open source, Google first announced its development in 2014, and version one was released in the summer of 2015. Despite its recent vintage, Kubernetes was created based on Google’s experience with containers for many years.

image_1.png

Kubernetes uses pods that act as groups of containers and are scheduled and deployed at the same time. Pods are the basic configuration for scheduling because, in contrasting systems, a single container is considered the base unit. Most pods have up to five containers that make up a service. Pods are built and eliminated in real time as demand and requirements change.

Kubernetes is a set of loosely coupled primitives that can operate under many different workloads. It relies heavily on the Kubernetes API for extensibility. The API is used internally, and also externally by containers and extensions running on top of the system. Organizations that have implemented Kubernetes include: Wikimedia Foundation moved from a homegrown set-up to Kubernetes; eBay runs Kubernetes and containers on top of Openstack, and Viacom is building an advanced containerization infrastructure using Kubernetes.

4. Mesos – Apache

Conceived and developed at the Berkeley campus of the University of California, Apache Mesos is a cluster manager that focuses on effective isolation of resources and sharing of applications across distributed networks or frameworks. An open source system, it gives managers the ability to share resources and improve the utilization of clusters. Companies currently using Apache Mesos include Apple, Airbnb, and Twitter.

Apache Mesos is an abstraction layer for computing elements such as CPU, Disk, and RAM. It runs on every machine with one machine designated as the master running all the others. Any Linux program can run on Mesos. One of the advantages of Mesos is providing an extra layer of safeguards against failure.

mesos-docker.1.png

 

Mesos was designed to handle thousands of hosts. It supports workloads from a wide variety of tenants. In a Mesos configuration, you might find Docker running side-by-side with Hadoop. Mesos gained visibility when it became the system supporting the rapid expansion of Twitter several years ago.

Mesos uses a system of agent nodes to run tasks. The agents send a list of available resources to a master. At any one time, there can be hundreds to thousands of agent nodes in operation. In turn, the master distributes tasks to the agents.

Comparing Different Container Orchestration Tools

Kubernetes is a full-bore container management platform with scheduling, upgrades on-the-fly, auto-scaling and constant health monitoring. In comparison, Docker Swarm concentrates on providing a system-wide view of a cluster from a single Docker engine.

Mesos and Kubernetes are similar because they were developed to solve the problems of running applications in clustered environments. Mesos does not concentrate as much as Kubernetes on running clusters, focusing instead on features like its strong scheduling capabilities and its ability to be plugged in a wide variety of schedulers. This is partly because Mesos was developed before the recent rise in popularity of containers — it was modified in certain areas to support containers.

Fleet utilizes etcd, a key-value pair that comes with CoreOS, and systemd, a system and service manager for Linux. Systemd is designed for a single machine, but Fleet expands its capabilities to a cluster of machines. Fleet helps protect against failure by allowing you to run several instances of a service. It can deploy an individual container to any location, run containers on one machine or several and deploy multiple instances of the same container.

On the other hand, Fleet is not as adept at handling some situations that arise in a distributed microservices environment such as service registration, scheduling based on utilization, service discovery or communications between containers. Fleet positions itself among these four tools as a low-profile cluster engine, so it is best situated as a layer where other solutions like Kubernetes can operate on top.

Orchestration Tools Meeting Increasing Demand

Today’s enterprises need  redundant systems that can meet their computing needs without fail. In addition, big data and data mining require massive resources to sift through mountains of information. Unless companies adapt and modify their approach to information systems, they will quickly lose ground to speedier and more flexible competitors.

In this era of high-speed web scale computing,  fixing individual machines is not an effective approach. Distributed systems allow you to quickly dispatch broken machines to the dustbin and reallocate resources to healthy nodes on a cluster. This is why it is important to manage cluster of Docker and other containers.

In this blog, we’ve looked at several powerful cluster management and orchestration tools which can effectively maintain, configure and scale containers in a distributed environment. Choosing the best one is a function of which one best meets the challenges of your computing environment. Use this discussion as a starting point to find the solution that will help position your organization for success in the fast-developing world of containerization and microservices.

A Newbie Guide to APM

Today’s blog post is headed back to the basics. I’ve been using and talking about APM tools for so many years sometimes it’s hard to remember that feeling of not knowing the associated terms and concepts. So for anyone who is looking to learn about APM, this blog is for you.

What does the term APM stand for?

APM is an acronym for Application Performance Management. You’ll also hear the term Application Performance Monitoring used interchangeably and that is just fine. Some will debate the details of monitoring versus management and in reality there is an important difference but from a terminology perspective it’s a bit nit-picky.

What’s the difference between monitoring and management?

Monitoring is a term used when you are collecting data and presenting it to the end user. Management is when you have the ability to take action on your monitored systems. Management tasks can include restarting components, making configuration changes, collecting more information through the execution of scripts, etc… If you want to read more about the management functionality in APM tools click here.

What is APM?

There is a lot of confusion about the term APM. Most of this confusion is caused by software vendors trying to convince people that their software is useful for monitoring applications. In an effort to create a standard definition for grouping software products, Gartner introduced a definition that we will review here.

Gartner lists five key dimensions of APM in their terms glossary found here… http://www.gartner.com/it-glossary/application-performance-monitoring-apm

End user experience monitoringEUM and RUM are the common acronyms for this dimension of monitoring. This type of monitoring provides information about the response times and errors end users are seeing on their device (mobile, browser, etc…). This information is very useful for identifying compatibility issues (website doesn’t work properly with IE8), regional issues (users in northern California are seeing slow response times), and issues with certain pages and functions (the Javascript is throwing an error on the search page).

prod-meuem_a-960x0 (2)

Screen_Shot_2014-08-04_at_4.29.55_PM-960x0 (2)

Runtime application architecture discovery modeling and display – This is a graphical representation of the components in an application or group of applications that communicate with each other to deliver business functionality. APM tools should automatically discover these relationships and update the graphical representation as soon as anything changes. This graphical view is a great starting point for understanding how applications have been deployed and for identifying and troubleshooting problems.

Screen_Shot_2014-07-17_at_3.42.47_PM-960x0 (3)

User-defined transaction profiling – This is functionality that tracks the user activity within your applications across all of the components that service those transactions. A common term associated with transaction profiling is business transactions (BT’s). A BT is very different from a web page. Here’s an example… As a user of a website I go to the login page, type in my username and password, then hit the submit button. As soon as I hit submit a BT is started on the application servers. The app servers may communicate with many different components (LDAP, Database, message queue, etc…) in order to authenticate my credentials. All of this activity is tracked and measured and associated with a single “login” BT. This is a very important concept in APM and is shown in the screenshots below.

Screen_Shot_2014-08-04_at_1.30.15_PM-960x0

Component deep-dive monitoring in application context – Deep dive monitoring is when you record and measure the internal workings of application components. For application servers, this would entail recording the call stack of code execution and the timing associated with each method. For a database server this would entail recording all of the SQL queries, stored procedure executions, and database statistics. This information is used to troubleshoot complex code issues that are responsible for poor performance or errors.

Screen_Shot_2014-08-07_at_11.08.00_AM-960x0

Analytics – This term leaves a lot to be desired since it can be and often is very liberally interpreted. To me, analytics (in the context of APM) means baselining, and correlating data to provide actionable information. To others analytics can be as basic as providing reporting capabilities that simply format the raw data in a more consumable manner. I think analytics should help identify and solve problems and be more than just reporting but that is my personal opinion.

business-impact-analytics2-1-960x0

performance-analytics-960x0

Do I need APM?

APM tools have many use cases. If you provide support for application components or the infrastructure components that service the applications then APM is an invaluable tool for your job. If you are a developer the absolutely yes, APM fits right in with the entire software development lifecycle. If your company is adopting a DevOps philosophy, APM is a tool that is collaborative at it’s core and enables developers and operations staff to work more effectively. Companies that are using APM tools consider them a competitive advantage because they resolve problems faster, solve more issues over time, and provide meaningful business insight.

How can I get started with APM?

First off you need an application to monitor. Assuming you have access to one, you can try AppDynamics for free. If you want to understand more about the process used in most companies to purchase APM tools you can read about it by clicking here.

Hopefully this introduction has provided you with a foundation for starting an APM journey. If there are more related topics that you want me to write about please let me know in the comments section below.

Check out our complementary ebook, Top 10 Java Performance Problems!

Don’t be an “Also-Ran” – Application Runbook Automation for World Class IT

Your applications are the lifeblood of your business–the culmination of countless hours of development, testing, bug fixes, more testing, and finally deployment to production. No matter how good your coding and testing practices are, though, eventually your applications will slow down, start to throw errors, and crash. How quickly and effectively you remediate these business impacting problems is the difference between world class IT organizations and the “also rans.”

Application Runbook Automation from AppDynamics is a game changing technology that can remediate business impact in a matter of seconds. It can represent a savings of hundreds of thousands of dollars per incident. Read more to find out how…

rba revenue risk 2

This table shows the business impact in USD of reducing MTTR from hours to minutes to seconds by taking advantage of AppDynamics and application run book automation.

 

So What’s a Runbook Really?

In case you’ve been locked in an IT dungeon for your whole career, let’s take a minute to talk about runbooks. Runbooks are the cure for insomnia, the crappy part of application support where you document how to stop and restart your awesome application that will never, ever fail. But just in case, some of you are also forced by the evil management regime to document troubleshooting procedures along with remediation steps for the common problems that could cause your application to misbehave.

All of this documentation (Runbook) is handed off to your friendly operations center staff so that it can be stuffed away in a giant repository never to be heard from again. After all, the job of the NOC (network operations center) is just to receive and route alerts to the proper people–so why even write out those runbooks in the first place?

You know how to restart your application when you get paged at 3 AM, no documentation required. You can recover your application within 30-120 minutes of initial impact half asleep with one hand tied behind your back. Unfortunately, during that time your company just lost revenue, customers, and tarnished their reputation. Tough break, but that’s just the way it is today. But there’s a better way!

Hasn’t Runbook Automation (RBA) Been Around for a While?

Yep, RBA is that expensive software sitting on your corporate shelf right now. It can’t help you with the problem described above. It’s become niche software for infrastructure support personnel who know nothing about your applications (and are offended they consume so much of their resources, kidding of course). Traditional RBA is a colossal failure that knows nothing about your user, business transactions, and applications. It’s this lack of application awareness that has relegated RBA to being used by a handful of infrastructure support personnel in your company.

Application Runbook Automation is a Business Differentiator

The time has come to fulfill the promise of automated remediation of business impact. Consider this:

  • AppDynamics understands exactly what components are being used at any given time by your applications.
  • It knows exactly what your end users are doing all the time and tracks and measures that activity through your application components.
  • It knows what transactions are slow or failing and exactly what code, configuration, and nodes are causing the problems.

Using this granular level of problem detection and isolation, AppDynamics application RBA understands business impact and takes the appropriate action to remediate within seconds of detection. There are many problems that get detected automatically and many remediation’s already available out of the box to take advantage of so you don’t have to code your own. You can also re-use your existing RBA processes and scripts to take advantage of all that hard work that has been done in the past. You can even integrate your Puppet and Chef tools directly with AppDynamics. Why re-invent the wheel?!

Application RBA is powerful and flexible. You have the choice to take action automatically or to be prompted for authorization if you’re not ready to trust in the judgement of machines just yet. Your troubleshooting and remediation actions can be executed on only the impacted nodes, on all associated nodes in a tier, or at the overall application level. Application awareness and flexibility empower you to manage your applications the way you need to for your unique business.

Monitoring, Meet Management

Management has long been the missing dimension from the promise of APM. The lack of focus on this problem by the monitoring industry needs to change and AppDynamics is leading the way. When most applications were run on a handful of servers it was acceptable to overlook the management aspect and just focus on pointing out where the problems were. With modern applications scaling to hundreds and thousands of nodes it’s becoming impossible for humans to keep ahead of the management challenges without using machine automation.

If you need to remediate business impact caused by slow or crashed applications as fast as possible, Application Runbook Automation from AppDynamics is just what you need.

Try AppDynamics with Application Runbook Automation for free today.