Deploying at Scale: Chef, Puppet, Ansible, Fabric and SaltStack

The manageability, reliability and powerful technology of remote servers — cloud computing — allows IT managers to deploy hundreds, even thousands of machines. At the same time, the cloud creates a new challenge for sys admins and ops teams: how to maintain and configure all these machines. How do you apply patches, maintain updates and fix security gaps?

The answer is to use powerful tools like Chef, Puppet, Ansible, Fabric or SaltStack for managing Infrastructure As Code (IaC) automation. IaC means deploying and managing infrastructure for computing, including virtual servers and bare-metal servers. Definition files are used instead of physical hardware management. Here is a bit of the history, background, advantages and disadvantages for each of these infrastructure configuration management tools currently on the market.

Puppet

Puppet was founded in 2005 by Luke Kanies, making it one of the earliest infrastructure configuration management tools. It is free software written in Ruby and made available under the Apache Software License 2.0, although it was released using the GNU General Public License up to version 2.7.0. It operates declaratively on Microsoft Windows and UNIX-based systems like AIX, Solaris and Mac OS X. Puppet uses a declarative language to define system configuration. To begin, you set up system resources and relevant state that are stored in files called Puppet Manifests. A resource abstraction layer then lets you use higher-level terms such as packages and services to define configuration.

Because Puppet is model driven, you don’t need an extensive programming background to use it. In a model-driven approach, you can set up how you want the infrastructure and applications to operate. With the model in place, you can then test and evaluate changes you want to deploy across the system. Constant reporting and feedback allows you to improve processes, show compliance and tweak the results as you go. Puppet is perhaps the most popular infrastructure configuration and management tool among those described here, used by a variety of organizations including:

  • Mozilla
  • PayPal
  • Spotify
  • Oracle
  • Rackspace
  • Wikimedia Foundation

Chef

Chef is a configuration management tool Adam Jacob developed to use in his consulting company. Seeing a broader use for managing Amazon Web Services operations, he joined with Nathan Haneysmith, Barry Steinglass and Joshua Timberman to found a firm called Chef to manage the tool.

Chef is based on “recipes” that describe how the software will configure and manage utilities and server apps like MySQL or Hadoop. These recipes can be combined to form a “cookbook.” Each recipe defines resources used in a state such as what services should be operating, what packages need to be installed and what files need to be created. The resources can be modified to make sure programs are installed in a specific order based on dependencies. Industry commenters often suggest that DevOps and developers usually choose Chef while SysAdmin’s prefer Puppet.

There are two versions of Chef: an open-source basic version and a premium enterprise edition. The enterprise offering has both on-premise and hosted versions. Open-source Chef is available at no charge but lacks many of the add-ons in the enterprise edition as well as ongoing support.

Chef began as a Linux product but later added support for Microsoft Windows. It runs on major platforms including

  • Solaris
  • Ubuntu
  • Microsoft Windows
  • FreeBSD

It is used by companies and websites such as:

  • Facebook
  • Airbnb
  • Expedia
  • Citi
  • Disney

Chef and Puppet are two of the largest infrastructure management tools available to you. They both continue to respond to the needs of enterprise companies by providing new features, and they are also busy creating partnerships with major vendors like Microsoft to better integrate with their platforms. Puppet has also aligned with software defined networking (SDN) vendors to stay in the forefront of that technology. Choosing between the two is a matter of determining the core advantages of each and figuring out which align with your requirements.

Ansible

Ansible is an open-source software framework for managing and configuring infrastructure. It offers configuration management, software deployment for multiple nodes and ad hoc task execution. You can manage it using PowerShell or through a secure shell (SSH). This software framework was developed by Michael DeHaan, who was also one of the original developers over the Func framework used for administering systems remotely. Ansible is included in distributions of Fedora, and is also available if you use CentOS, Red Hat Enterprise Linux, Scientific Linux and other operating systems. A company of the same name was created to support the software product and help it grow in business markets. Red Hat acquired the company in 2015.

The name Ansible is derived from a communications system in “Ender’s Game,” a 1985 novel by Orson Scott Card. The fictional system was first invented for the 1966 novel “Rocannon’s World” by Ursula K. Le Guin.

Ansible controls two kinds of servers: nodes and controlling machines. The system is based on a single controlling machine, which configures and manages nodes using SSH. Modules are deployed over SSH to orchestrate notes, which then communicate to the controlling machine using a JSON protocol. Ansible is light on resources because when it is not managing nodes, it does not run any programs or daemons waiting for utilization.

Unlike Puppet and Chef, Ansible has an agentless architecture where nodes need a daemon to talk to the controlling machine. Under this system, nodes do not need to install and operate daemons in the background to communicate. This set-up significantly reduces network overhead because it stops nodes from constantly polling the controlling machine.

Ansible was designed with a minimalist approach, with a focus on making sure managing the system does not create additional dependencies on the system itself. It is secure because it requires OpenSSH. In addition, Ansible playbooks are written in an easy-to-learn, descriptive language. It is used in a variety of private and public clouds including:

  • Google Cloud Platform
  • OpenStack
  • SoftLayer
  • Amazon Web Services
  • XenServer

Ansible works well with Aerospike, Riak and Hadoop, monitoring resource consumption by every node while using few CPU and memory resources. Organizations and companies deploying Ansible include:

  • NASA
  • Weight Watchers
  • Juniper
  • Apple

Its agentless model makes it a popular choice for government divisions such as NASA because it is very secure, a quality highly valued in federal and state governments.

Fabric

Fabric is an open-source command line tool and Python library used to smooth out SSH utilization for system administration and application deployment. It consists of a suite of operations for launching shell commands, either locally or remotely, via sudo or normally; downloading and uploading files; and asking for input from users, stopping execution and other auxiliary functions. While products like Puppet and Chef focus on organizing and handling system libraries and servers, Fabric is more concerned with deployment and other application-level functions.

Developers like Fabric because it is simple, easy to maintain and you can add any type of job quickly. You can execute Python functions using the command line, and launching shell commands on SSH is simplified due to the extensive library of subroutines. Companies using fabric include:

  • Snap
  • Coursera
  • Instagram
  • Sosh
  • FlightAware
  • The Orchard

Fabric development is managed by Jeff Forcier. He is assisted by open-source developers who add suggestions and patches through the Fabric mailing list, on IRC chats or via GitHub.

SaltStack

SaltStack is an open-source platform based on Python, and it is used for managing and configuring cloud infrastructure. It was developed by Thomas S. Hatch using ZeroMQ to create a better tool for collecting and executing data at high speeds. Initially released in 2011, Reliable Queuing Transport (RAET) was added in 2014. The project has subsequently been developed through a partnership that includes several large enterprises. SaltStack was built from the ground up to be highly modular and flexible, and able to adapt to diverse applications. It creates Python modules that each manage a different part of the Salt system. You can detach and modify the modules to fit the needs of your project. Each module is designed to handle a specific action. The six types of modules include:

  • Execution modules which offer functions for directly executing the remote execution engine as well as help manage portability and core API functions.
  • Grains detect system static information and keep it in RAM for fast access.
  • State modules represent the back end, executing code to configure or change a target system.
  • Renderer modules pass information to the state system.
  • Returners modules manage the return locations associated with remote execution calls.
  • Runners are convenience apps.

SaltStack created a buzz early on by capturing the 2014 InfoWorld Technology of the Year Award as well as the 2013 TechCrunch Award for Most Exciting Project. Organizations and companies using SaltStack include Adobe, Jobspring Partners, Dealertrack Holdings, JumpCloud and International Game Technology.

This article covered five of the top infrastructure configuration and management tools available. It’s a highly dynamic area of enterprise computing, with new tools constantly evolving to solve various challenges. Each of these solutions gives you lots of ways configure your infrastructure, allowing you to manage digital transformation at scale easily and efficiently.

Learn more

Find out more about our Infrastructure Monitoring tool.

Gain insight into your infrastructure with Server Visibility

To succeed in today’s hyper-competitive and fast-changing marketplace, enterprises must pursue digital transformation leveraging software to deliver and support their products and services — with the goal of creating an ideal user experience and maximizing business agility and efficiency. To provide excellent end-user experience, enterprises need to manage their applications end-to-end including their dependencies on the underlying server infrastructure to deliver exceptional experience.

AppDynamics Server Visibility, a new module of the AppDynamics Application Intelligence platform, provides an application-centric view of servers in the context of business transactions. This helps IT Ops teams proactively isolate and resolve application performance issues faster with actionable, correlated application-server metrics. It complements end-user monitoring, application performance management, and database visibility modules to provide a comprehensive, end-to-end view of the entire application ecosystem.

Server Visibility provides an entire view into CPU, memory, disk, networking, and running processes metrics for Linux and Windows servers. In my blog announcing the beta of this solution in June 2015, I reviewed some of the key features.

In this blog, I will provide review a couple of use cases where application support personnel or a developer may use the new Server Visibility module in conjunction with the AppDynamics APM solution to quickly isolate and resolve an application performance issue.

Drill down from application flow map to Server Visibility dashboard

Customers can drill down from the application flow map to the server dashboard and see the detailed server metrics in case the server issues are impacting the performance between two application tiers.

Let me review a scenario where the application flow map shows calls between two tiers are taking longer than normal. In Fig 1 below, we can see that the E-Commerce tier is having some problems since half of the tier has turned red. Pretty obvious, right? We can also see on the right-hand, the server health indicator shows critical response time.

Server Monitoring Blog Fig 1.png

Fig 1: Application Flow Map with Server health indicator showing issues

Clicking on the E-Commerce Services tier, you will notice a new tab called “Servers” which will show you the list of servers (as you can see in Fig 2) on which the nodes under this tier are running. You will notice that health indicator for both servers have turned red.

Server Monitoring Blog Fig 2.png

Fig 2: List of servers supporting an application shown in the flow map

Clicking on any server, you will get to a server dashboard (Fig 3) with an indicator of server health, key performance indicators, server properties and top 10 processes consuming CPU and memory.

Server Monitoring Blog Fig 3.png

Fig 3: Server Visibility Dashboard

Here you can clearly see that CPU consumption has often been reaching 100%, and may be contributing to application performance issue at the E-Commerce tier. From the server dashboard, customers can also go to other tabs with detailed information on attached storage, network, and processes for additional details required for troubleshooting.

Troubleshoot server-related performance issue by drilling down from transaction snapshot

AppDynamics enables enterprises to proactively track lagging application performance, troubleshoot the root cause, and resolve them before they impact their customers. Let’s look at the use case of troubleshooting server related application performance issue by drilling down from business transaction snapshot following the same E-Commerce application scenario (Fig 1) discussed in last use case.

As you can see in the transaction scorecard shown in Fig 1, 26 of the transactions are very slow. To troubleshoot the root cause of these very slow transactions, you can simply click on the very slow transactions in the transaction scorecard to go to the list of transaction snapshots.

Server Monitoring Blog Fig 4.png

Fig 4: Transaction Snapshot

Looking at the transaction snapshot taken at 4:12 PM on 01/20/16, you’ll see the snapshot details as shown in Fig 4 above. You can now drill down further into the calls at the E-Commerce server to troubleshoot the root cause. Clicking again will bring up a more granular view and take you to the snapshot details showing a call graph and another tab for server details as shown below in Fig 5.

Server Monitoring Blog Fig 5.png

Fig 5: Transaction Snapshot drill down

Server Monitoring Blog Fig 6.png

Fig 6: Transaction Snapshot drill down

By clicking on the server tab, you will reach the server dashboard (as shown below in Fig 6) with an indicator of server health, key performance indicators, server properties and top 10 processes consuming CPU and memory. Here you can clearly see that CPU consumption has been maxing out, which can be contributing to the very slow transaction.

You can further drill down to see which processes are consuming CPU cycles by scrolling down to take a look at top 10 processes consuming CPU. As you can see in Fig 7 below, the antivirus process is consuming approximately 70 of CPU cycles contributing to the slowness of the business transaction, ultimately affecting the end-user.

Server Monitoring Blog Fig 6.png

Fig 7: Transaction Snapshot

Hopefully, this gives you an overview of use cases how the new Server Visibility module can be used in conjunction with the AppDynamics APM solution to isolate quickly and resolve an application performance issue.

Interested in learning more about Server Visibility? Attend our free webinar here.