Once you get your head around the concept of containers, and subsequently the need for management and orchestration with tools like Kubernetes, what started off as a weekend project suddenly starts to raise more questions than answers.
Kubernetes removes much of the complexity of managing the interaction between applications and the underlying infrastructure. It is designed to let developers focus on the applications and solutions rather than worrying about the complexity of the hosting platform. When configured correctly, resources can be optimized and will scale in or out as the application requires it.
This works well for simple containers that act, for example, like functions — taking in data, processing and returning a result. When your application requires persisting state, deployment scaling may not operate as you want it to. Where Kubernetes and most other orchestration platforms fall down is in provisioning a resilient persistent data layer that works on a native level with applications.
Persistent Volumes and Claims
Containers are ephemeral objects. They are created and destroyed, and each time a container is destroyed, any data that is persisted within the container is lost. This is clearly a problem for applications that need to persist state.
There are many reasons we need to persist state. For example, storing a user’s data, caching frequent data requests, logging, to say nothing of the specialized requirements of databases, queue systems, and so on. The impact of starting, stopping, restarting, and restoring individual containers on an application and the overall solution each container is part of should be a paramount concern when planning a Kubernetes deployment.
The solution to native data persistence in Kubernetes involves two key components: persistent volumes (PVs) and persistent volume claims (PVCs).
A PV is a storage resource created and managed separately from the Kubernetes system itself and any pods that may consume the resource. So, for example, an Amazon EBS volume, Azure Files share, or even network attached storage volumes can be set up as PVs.
PVs can be manually provisioned by system administrators, in which case they are accessible to the Kubernetes environment, and their lifecycle must be managed separately from the Kubernetes cluster: admins set up the storage, provision them as PVs within Kubernetes, remove the PVs from Kubernetes, then manually deprovision the storage.
Kubernetes now supports dynamic volume provisioning, which enables Kubernetes to automatically provision PV storage resources through predefined StorageClass objects. In this scenario, Kubernetes has the information to create the PV when it’s demanded.
When a container wants to access a PV, a PVC is created. This is a request for a storage binding to the PV. When using dynamic volume provisioning, the request for a PVC can trigger the creation of the PV and underlying storage based on the specification of the StorageClass. One of the important things to remember when setting up a PV is its location. In general, depending on whether reading or writing is a priority, the nodes that host the containers making the claim should be located as close as possible to the shared resource (PV).
Another item of note is reclaim policy, which is a property of the StorageClass object underlying a dynamically provisioned PV. The reclaim policy specifies what happens to the volume represented by a PV once a PVC has been deleted, at which point the PV is technically no longer claimed.
There are several reclaim policies that can be specified, including:
- Retain, where the volume is in a released state but the data is retained and can be recovered.
- Delete, where the volume is deleted.
Delete is the default for dynamically provisioned volumes if no policy is otherwise specified for the StorageClass. Data on a retained volume could be managed for state persistence, or the volume could be reclaimed by another resource.
Other Storage Persistence Options
There are specific projects underway in the Open Source community to provide native solutions to data storage and persistence connectivity aligned with the dynamically provisioned philosophy of Kubernetes. Two of the more prominent projects in this space would be Ceph and Rook.
Ceph is a “dynamically managed, horizontally scaled, distributed storage cluster.” It is abstracted logically over storage resources and is designed to have no single point of failure. The key promise of Ceph is that it offers a unified view of storage with discrete access to data on multiple levels including object, block and file. Ceph is not trivial to setup, however, and this is where Rook comes in.
Rook is a storage orchestrator for Kubernetes that automates deployment, management and scaling of storage services. Currently Rook supports several storage services including Ceph, CockroachDB, Cassandra, EdgeFS, Minio, and NFS. Storage resources to be deployed are configured from a YAML file, in the same spirit as Kubernetes. Rook operates in a similar way to Kubernetes, observing, managing, and guaranteeing state of the underlying persistence architecture.
Container deployment and the resulting orchestration using Kubernetes is what generally hits the headlines, but central roles in a successful cloud-native solution are also played by other critical parts of the infrastructure. Members of this club include container network management and persistence integration.
One of the key draws of building a solution using microservices and containers is the promise of not being locked into a single cloud vendor. While this is great, in practice it adds another layer of complexity for enterprises that need to operate across multiple clouds, bare metal, and also on-premises. For these situations, using persistence solutions like Ceph with an orchestrator such as Rook makes a lot of sense, especially when combined with the central control plane provided by Kubernetes and Kublr.
For additional information, our team of SMEs is available to consult and answer questions. Contact us at Kublr.com.