Approaches to Configuration Management: Chef, Ansible, and Kubernetes

New container orchestration tools like Kubernetes are changing the DevOps approach to configuration management and deployment at scale. In this post, we’ll take a look at how earlier solutions such as Chef and Ansible approach configuration management and why you should consider the capabilities of the Kubernetes approach.

Chef is a complex and feature-rich framework of one or more central master servers that store your:

  • Cookbooks — recipes with declarative “target state” description of OS packages, files, settings, and anything that can be applied to a target server
  • Templates — usually config files with variables that are replaced during chef-client run
  • Data bags — sensitive information like database passwords

The master server controls all your managed nodes with an installed agent, and you can use push or pull methods against the agent. For example, the node agent will “ask” a master server for updates at regular intervals, or the master will push\trigger an action on the agent only when a configuration change is applied.

With Chef you have the option to assign “environment” membership (basically a set of values and options) to your nodes, causing all recipes applied on that node to fetch variables (“attributes” in Chef) from that particular environment to use in all recipes and templates.

Here is a short example of a random Chef recipe from the official NewRelic agent installation cookbook:

Ansible is a great tool for both small and large environments. You can use Ansible to conveniently keep the configuration state of a few machines and, at the other extreme, you can manage thousands of servers using Ansible and inventory plugins that will discover all your Amazon Web Services (AWS) machines and allow Ansible to apply configurations on them.

Ansible also helps you avoid a “snowflake” state of your fleet of servers. A snowflake state occurs when you’ve configured most of the packages and services manually, resulting in each of your machines having a unique final state with many inconsistent config files and settings amongst servers.

Conceptually, Ansible is very similar to Chef, except that Ansible “playbooks” are typically used without a single master server because they can be run from anywhere. In addition, Ansible uses ssh to login to a target server and configure the server to match the desired state described declaratively in the playbook. All you need is the correct ssh key to login to the target machine, so your laptop can be the master server if needed.

Common practice is to keep all playbooks in a central repository like git. This ensures your “infrastructure as code” is always backed up and kept in sync with all team members so everyone knows which configuration change was applied to a particular server or group of servers.

Ansible uses YAML for all resource definitions: “playbooks”, “roles”, “tasks”, and “handlers”. These correspond to the Chef “cookbooks”, “recipes”, “attributes”, and so on. Ansible uses Jinja templating instead of the ERB (Embedded Ruby) templates in Chef. Jinja templates provide almost the same flexibility as ERB templates, allowing you to add loops and conditionals in your templates and use text manipulation and temporary variables for your convenience.

Here is a short example of an Ansible playbook task taken from the official Elasticsearch playbook:

 


 

Limitations with Chef and Ansible

The goal of using a solution like Chef or Ansible is to automate the target state configuration of a particular machine, VM, or even container. While Chef and Ansible can configure containers, we’ll provide more details later on why this is not a best practice.

Solutions such as Chef and Ansible are not built to handle interactions between different machines and microservices. For example, you cannot force Chef to run only 5 database servers at any given time, and scale them up when CPU usage reached 90%, then scale them back down to 5 when CPU usage is lower than 20% in the last 30 minutes. Similarly, Ansible is designed to install packages, copy configuration files, and provision cloud instances and services through APIs. The “overall cluster state” of many machines and their interaction with each other is out of scope with Ansible. Ansible will not run healthchecks every 10 seconds to see if your database is online or reachable by other services, and Ansible will not autoscale containers or instances based on incoming HTTP requests or the latency of your web services. Neither Chef nor Ansible will attempt to recover or replace in real-time the container or instance that was shut down or malfunctioned.

Kubernetes handles all of these tasks — and much more.

 

Using Kubernetes to manage your fleet of services frees you from the periodic re-configuration of the target state for each machine and server. Instead, you build container images and deploy them (docker or rkt). In addition, support for any “Open Container Initiative”-conformant runtime is under active development to allow Kubernetes to manage other types of containers that are not based on docker or rkt.

After you initially deploy container images, Kubernetes will check their health and run status. The underlying worker nodes are monitored in real-time for compatibility with running the desired workload (for example, ensuring enough CPU/RAM/disk space capacity for a particular service), and the containers are monitored for status and resource utilization. The deployment (a single container or a set of many different containers) can be auto-scaled based on a very flexible set of parameters and metrics. You can plug in your own custom application metrics (for example, cache read latency) if the built-in options of autoscaler do not cover your use case.

With Docker and Kubernetes, you still keep all your infrastructure as code. However, as opposed to Chef and Ansible, you can describe not just the state of a single “target” (in our case, a container, described with Dockerfile and “Pod definition”), but also its dependencies on other services, its scale up and down rules (“HPA” definition, horizontal pod autoscaler), its healthchecks, recovery policy, and many more attributes that help to run a complex set of containers in a reliable and consistent manner.

For those migrating their instance-based (or server-based) infrastructure to containers for the first time, it might seem appropriate to just use the same Chef cookbooks or Ansible playbooks to manage the containers from “inside” during run time. An example of this is installing chef-client inside a container so it will connect to the Chef master and get all configurations to perform the setup, then polling for changes periodically or electing to be notified about needed changes from the master. However, this is not a good idea because containers are built to scale up fast, be initialized fast, and be ready to serve their workloads within seconds. If you add unnecessary agents and initialization steps into a container, you lose the many benefits of using containers.

All Kubernetes resources can be described with YAML or JSON. The spec format is intuitive and easy to understand when you become familiar with a few basic resources like “pod template” and “service”. The pod template is used as a sub-section “inside” many other resources like “deployment”, “replica set”, “stateful set”, and “daemon set”. These resources describe “how” the pod template will be running, the needed replicas, and extra parameters specific to each resource.

Here are a few examples of Pod and Service definitions in Kubernetes.

Redis deployment:

 

Elasticsearch “Service” resource (like internal loadbalancer for containers):

 

If you still want to use the old cookbooks (for example, because they are large and difficult to rewrite as a clean, new Dockerfile), you can use chef-client during the docker build step.

When you build the container image, you can run any playbooks and cookbooks to reach the target state. The image gets pushed to your images repository and is ready to run, with all initialization already completed during build stages so as not to delay the startup of the container when deployed.

If you are simplifying your first-time migration to containers by running “chef-client” in Dockerfile, consider optimizing your docker images later by removing anything legacy that your old cookbooks are pulling into the container that is not actually necessary to run in the new containerized environment. There was an attempt by Chef authors to popularize the approach of running classic cookbooks to setup docker containers, but this was a failure as we can see from all related repositories that were “deprecated” and abandoned 3 years ago. (Here are a few links from the “Chef-boneyard” of legacy projects: chef-container itself, knife-container (for local testing), and chef-init Rubygem, which was intended to be used as PID 1 inside newly built containers.) This approach took advantage of the layered filesystems that Docker can use (AUFS,Devicemapper,OverlayFS) to allow faster builds of base images that include only base recipes. Then, on top of each base image, other separate recipe runs were added to specialize the image for the particular application and release version that needs the build. Incremental container builds is a good approach, but Chef or Ansible do not provide any additional value when using this approach over simple Dockerfiles that we use today to build containers. So despite the use of a great new feature of layered filesystems, those sub-projects were abandoned as not relevant.

You should consider writing clean Dockerfiles (with minimal steps needed to run the microservice) and break up the older “cookbook”-based server that ran 10 applications (all managed by Chef/Ansible/etc’) into 10 different, smaller specialized containers. Each of these smaller specialized containers has its own tiny Dockerfile for build step, and each is independent in terms of base Linux distribution used, rpm or deb packages installed, and config files included.

For example, if you had a Chef-managed bare-metal server that was running MySql (database), Redis (caching), Apache (PHP) or NodeJS (JavaScript) backend, Nginx (serve static content), and rabbitmq (communication between processes), you would separate this massive Chef cookbook into 10 smaller Dockerfiles. Each Dockerfile would build and configure only one service that can be reused in different scenarios and environments within your production Kubernetes cluster. Each container would know its role and purpose and its credentials to other services. Each container would also have access policies based on environment variables passed to the container at deploy step. These variables can be passed in “Pod definition” or even dynamically generated by Jenkins according to a particular situation. If you run the deploy step through Jenkins, this will create your Kubernetes pod definition and deployment or service definitions based on received parameters.

 

When migrating to containers and Kubernetes, you must change your approach to both build and deploy procedures. With Docker, it’s a common practice to pass environment variables to containers during the deploy phase (and because the Docker image is already built, there should be no agents that do initialization or setup during the container startup phase), so the software/application that runs inside the container will adjust its behavior according to these variables. Kubernetes is especially beneficial for environments that run massive stateless microservice-based workloads because it helps to keep all configurations, secrets, access policies, and declarative deployment manifests as simple YAML- or JSON-formatted documents in your git or other SCM repository.

Containers also simplify configuration management when upgrading or replacing some of the running microservices in a live environment because they eliminate downtime and maintenance windows. Imagine a major migration of one of your components on a standalone bare-metal server that forces you to completely export your database and import into another. How would a Chef cookbook or Ansible playbook manage such a migration? Using Chef or Ansible, you would install the new database on the same server, configure its listening ports to allow it to run side by side with the still-active old installation of the database, and try to ensure the configs do not overwrite their data, logs, and cache folders. When you use containers instead, you simply spin up the new version of the database with all needed configs alongside all existing running infrastructure, with no chance of affecting existing running services. You can import the needed data and test by spinning up a few clones of the backend application containers pointed at this new database URL. (Environment variables pass needed URLs and passwords of the needed DB or cache containers. Using the service discovery of Kubernetes it would look like: “my-new-db1.svc.staging.cluster.local”. This can easily be auto completed by your application, when the only values to pass are “my-new-db1” and “staging”; the container can complete all other needed URL parts based on that, and in case it needs like 10–15 other services to interact with, it saves you lots of configuration. Simply pass “environment” like “staging/prod/test”, then “env_id” or “cluster_name” identification, and watch your smart application detect all needed URLs based on that. You should have a strict naming convention for all your DNS names of all services for this to work, and there should be a pattern for all service names.)

The complete run time separation when using containers helps to avoid risky changes and migrations because you always use new containers that run side by side with other existing applications that serve production traffic. New containers do not interfere with any of the existing running production containers until you connect them together under the same load balancing service. Then, you can start decommissioning old containers one by one to safely complete the migration. And the flexibility to easily clone existing microservices with slightly different params applied (like connection strings and URLs to other services and databases) allows for quick testing and canary release deployments. It is very simple to direct a small portion of your traffic to the new microservice by just adding a new label to your “new pod” definition. The production “service” resource will include this new pod in its routing. Kubernetes “service” is an internal load balancer analogy, it will route traffic to particular pods based on pod labels. When you are testing a new backend pod and are ready to introduce real traffic, just add your production label to the pod definition, and the “service” will include it as one of the targets to send real traffic.

Using Ansible playbooks and modules for container management (like “ansible-container”) in the build and local development stages is a workable solution. Similarly, using Chef cookbooks and recipes to build docker images or spin up a few containers at small scale is also a workable solution. But it’s important to understand the benefits of alternative approaches to containerized CI/CD workflows and learn new best practices related to orchestration and cloud-native application deployments and architecture. Here are some specific best practices and benefits to be aware of:

  • Utilize the benefits of layered filesystem and remember that once a base image is downloaded, any container based on that image will “start” within seconds or less. (For a very lightweight container and application, it will take less than a second to start and become operational because the container is just a process running in its unique environment defined by the image.)
  • When planning your Dockerfiles, keep in mind that you’ll have faster builds if you divide the initial environment setup and the application setup into two separate stages. Keep “base images” for every service type with all environment installed (like ruby/python/golang/nodejs dependencies and system packages relevant for build steps), but leave the final application copy step for the “on demand” build stage (submitted by developer, like on git commit hook). This way, when it’s time to build and test another commit or minor version of the application, the developers will not wait for all basic package installation during “docker build” phase. Instead, an appropriate base image will be pulled from your docker repository and only the tiny application layer itself will be written during that stage. This allows for faster builds and happier developers. Those containers are quick and disposable, and you won’t end up with hundreds of gigabytes of “artifacts” in Artifactory, S3, or other artifact storage because docker image repositories use the same “layered” technology to store the images (usually also compressed). If you have a base image with 1gb of dependencies installed once, all other image “versions” of your software uploaded to that repository will store only the delta between the base image and a “previous” one.
  • Beyond the build time and consistency benefits of docker images, a containerized infrastructure allows you to reach higher compute resource utilization, faster scale up and down times (react to traffic spikes within seconds, not minutes), easier maintenance (disposable stateless containers), and other advantages that easily outweigh first-time migration issues or difficulties.

You’ll find it well worthwhile to invest time to learn to operate an orchestrator like Kubernetes to simplify and automate your infrastructure management.

New-generation tools support the entire application lifecycle

If you are looking for a more comprehensive solution, consider Kublr, a new-generation tool that supports your entire application lifecycle. With everything you need out-of-the-box to run your application in production, Kublr reduces the complexity of managing self-hosted software by automating and maintaining application availability while scaling cloud capacity.

 

You May Also Like