13 January 2020

Yet Another Kubernetes Intro - Part 3 - ReplicaSets and Labels

In my last post, I covered pods. However, I also mentioned that I was a little bit torn about covering pods, and especially the creation of pods. The main reason for this is that we generally do not create pods on their own. There isn’t really anything wrong with creating pods manually as we did in the previous post. However, in doing so, we are missing out on a bit stuff when it comes to the functionality that Kubernetes offers us.

If we create pods manually, as covered in the last post, it means that if we want more than one instance of a pod, we have to manually deploy each one of these instances. This would not only mean that we would need to create multiple, almost identical YAML files, one for each one of the pods, as they need unique names. It would also mean that any changes, or maintenance to be performed, would include a lot of repetition and potential for mistakes.

Seeing that one of the main reasons behind running a Kubernetes cluster is the ability to have a resilient system with multiple load balanced instances of our pods, it makes a lot of sense that K8s would have a better solution built in. And it does! It’s called ReplicaSets. A ReplicaSet is an abstraction put in place specifically to maintain multiple instances of pods. A ReplicaSet continuously monitors the cluster to make sure that the desired number of pods are up and running at any given time. If it can’t see the correct number of pods running, it will make sure to correct this by adding or removing pods. And if we ever want to scale out our pod, all we have to do is to reconfigure the ReplicaSet (RS) and it will automatically reconfigure the cluster to meet the new requirements.

Labels

However, before we can start looking at ReplicaSets, we need to understand another Kubernetes feature called labels. Labels are key/value pairs that can be added as metadata to pretty much any resource in the Kubernetes cluster. These labels can then be used to find specific instances or groups of resources in the cluster for different reasons.

Why is this important? Well, in Kubernetes, resources don’t own other resources. For example, a pod created by a ReplicaSet is not owned by the RS. Instead, the RS uses the labels attached to pods to figure out what pods should be part of its “set”.

This allows for a very flexible configuration that doesn’t tie the user into any particular set up. Instead, it is up to the implementer of the system to figure out what labels and label values that makes sense to the current system. And to be honest, the label structure can vary wildly in different systems, as most systems have very different requirements and needs for grouping of resources.

Labels can be used for a lot of things. Not just for ReplicaSets. They can for example also be used to label nodes, allowing us to target specific nodes when scheduling pods. Some pods might need lots of memory, but very little disk. While some pods might need a lot of fast disks, but very little memory. By having differently configured nodes with labels defining the different types of resources available, we can make sure that we utilize the resources optimally by scheduling the right types of pods on the right kind of hardware.

Let’s have a look at how we can use labels with our pods. Imagine that you have the following two pod definitions

apiVersion: v1
kind: Pod
metadata:
  name: my-pod-v1
  labels:
    app: hello-world
    version: "1.0"
spec:
  containers:
  - name: hello-world
    image: zerokoll/helloworld:1.0
---
apiVersion: v1
kind: Pod
metadata:
  name: my-pod-v2
  labels:
    app: hello-world
    version: "2.0"
spec:
  containers:
  - name: hello-world
    image: zerokoll/helloworld:2.0

As you can see, it creates two pods. Both have the label app with the value hello-world. However, they differ in the second label as they have different version labels based on the fact that the two pods are running two different versions of the image.

Note: All labels are strings, and can contain [a-z0-9A-Z] separated by dash (-), underscore (_) and dot (.). However, since we are using 1.0 and 2.0 as values in this case, they need to be wrapped in quotes to be handled as strings instead of numbers. Otherwise the API complains about the values not being strings.

With these two pods up and running, we can use the labels to filter our result sets from ´kubectl´ by passing the -l operator to our commands. For example, running

kubectl get pods -l app=hello-world,version=1.0

returns only the version 1.0 pod

NAME        READY   STATUS    RESTARTS   AGE
my-pod-v1   1/1     Running   0          7m8s

Using the -l operator, we can either use format used above, which is called equality-based format, or we can use a more expressive format called set-based. The equality-based format is basically just a set of key/value pairs that turn in to an AND-based select, returning only pods containing all the supplied labels with the supplied values. The set-based format on the other hand, allows us to use more complex queries like

kubectl get pods -l 'app in (hello-world), version in (1.0,2.0)'

which returns all pods with the app label set to the value hello-world and the version label set to either 1.0 or 2.0.

Note: The -l operator can be used with pretty much all kubectl commands. For example, you could use it when deleting pods by running kubectl delete pods -l app=hello-world,version=1.0

If you ever want to see what labels are set on resources in the cluster, you can include --show-labels in the get command like this

kubectl get pods --show-labels

NAME        READY   STATUS    RESTARTS   AGE   LABELS
hello-world-v1   1/1     Running   0          7m8s   app=hello-world,version=1.0

ReplicaSet

Ok, with that little sidetrack into the world of labels completed, it is time to get back to our ReplicaSets, to see how they use labels to do their job!

Let’s have a look at a basic ReplicaSet spec

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: hello-world-v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: hello-world
      version: '1.0'
  template:
    metadata:
      labels:
        app: hello-world
        version: '1.0'
    spec:
      containers:
      - name: hello-world
        image: zerokoll/helloworld

As you can see, it uses the apps/v1 API version, the kind is ReplicaSet, and in this case the name is set to hello-world_v1. Besides that, it has a spec that specifies that it should make sure that there is always 3 replicas of pods that match the specified label selector in the cluster. In this case, the selector specifies that it should match pods based on the app and version labels, with the values hello-world and 1.0. If there are less than 3 replicas that fit this selector up and running, we tell the RS how to create new pods by giving it a template element containing a template to be used when scheduling new pods. This template is the same thing that you would normally put in a pod specification file.

Just remember that the labels specified in the pod template metadata obviously has to satisfy the label selector for the ReplicaSet. If they don’t, it would cause the RS to create more and more pods, as the replica count would never increase, no matter how many new pods it created, as they would not match the selector used to get the count.

After deploying this RS to our cluster using the following command

kubectl apply -f .\rs-hello-world.yml

we can go ahead and get all resources in our cluster to see what has actually happened

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-8kh7r   1/1     Running   0          12s
pod/hello-world-v1-gwcdn   1/1     Running   0          12s
pod/hello-world-v1-ht56v   1/1     Running   0          12s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   3         3         3       12s

As you can see in the above (poorly formatted) output, the creation of the hello-world RS causes 4 resources to be created in the cluster. First of all, we get the defined ReplicaSet, but since this RS can’t find 3 pods that match the label selector, it automatically creates 3 new pods in the cluster, to make sure that the replica count matches the desired state.

To see the RS in action, we can try and execute the following command

kubectl delete pod hello-world-v1-8kh7r

This deletes one of the pods that the ReplicaSet created for us. However, if we run

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-fcp8p   1/1     Running   0          37s
pod/hello-world-v1-gwcdn   1/1     Running   0          2m34s
pod/hello-world-v1-ht56v   1/1     Running   0          2m34s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   3         3         3       2m34s

we can see that there are still 3 pods up and running. But…if you look at the “AGE” column, you can see that one of the pods is much newer than the other ones.

Because the RS continuously monitors the cluster to make sure that the desired state is met, it sees that a pod is deleted, and immediately schedules another pod to be created to match the desired state in the cluster.

We can also try to create a new pod using the following spec

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  labels:
    app: hello-world
    version: '1.0'
spec:
  containers:
  - name: my-container
    image: zerokoll/helloworld

kubectl apply -f hello-world-pod.yml

As you can see, this pod specification sets the same set of labels that the RS is monitoring. So, if we run

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-fcp8p   1/1     Running   0          7m28s
pod/hello-world-v1-gwcdn   1/1     Running   0          9m25s
pod/hello-world-v1-ht56v   1/1     Running   0          9m25s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   3         3         3       9m25s

we can see that there is no pod called my-pod in the list of pods. The reason for this is that the ReplicaSet once again notices the addition of the extra pod, and as it sees that there are more than 3 pods matching the specified label selector, it deletes the pod straight away.

If you run the apply and get commands fast enough after each other, you might get lucky and see the following output

kubectl get all

NAME                       READY   STATUS        RESTARTS   AGE
pod/hello-world-v1-fcp8p   1/1     Running       0          9m2s
pod/hello-world-v1-gwcdn   1/1     Running       0          10m
pod/hello-world-v1-ht56v   1/1     Running       0          10m
pod/my-pod                 0/1     Terminating   0          2s

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   3         3         3       10m

As you can see in the output above, the my-pod pod is being terminated as soon as it starts.

A pretty cool effect by having ReplicaSets use label selectors instead of “owning” pods, is that we can have them adopt existing pods. Imagine that you are working on some new containers, and you schedule some pods to make sure that they work as they should. Once you are confident that the pods are behaving as they should, you can deploy a RS with a label selector that corresponds to the pods you just created. This will cause the new RS to automatically adopt the existing pods and start managing them without you having to first remove your pods and then have them being re-created by the newly create ReplicaSet.

It also means that if you for example have a misbehaving pod, you can just update the pod’s labels to make sure it doesn’t match the RS label selector. This will cause the RS to schedule a new pod to replace the misbehaving one that was “removed”, while leaving the misbehaving pod up and running for you to debug.

To change a label to make sure it doesn’t match the required label selector, you can run

kubectl label pod hello-world-v1-ht56v --overwrite app=hello-world-removed

If we fetch all resources after running that command

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-8p2g2   1/1     Running   0          85s
pod/hello-world-v1-fcp8p   1/1     Running   0          28m
pod/hello-world-v1-gwcdn   1/1     Running   0          30m
pod/hello-world-v1-ht56v   1/1     Running   0          30m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   3         3         3       30m

we can see that a 4th pod has been scheduled to keep the cluster state as desired. Leaving the hello-world-v1-ht56v pod up and running, but not managed by the RS anymore.

Note: You can also delete a label adding a dash (-) after the label name. For example, to remove a label called mylabel on a pod called hello-world, you can run kubectl label pods hello-world mylabel-.

Finally, if you delete a ReplicaSet, all the dependent pods are deleted by default as well. However, if you want to delete a RS without deleting the dependent pods, you can add --cascade=false like this

kubectl delete rs hello-world-v1 --cascade=false

This removes the RS, but leaves all the pods it was managing up and running

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-8p2g2   1/1     Running   0          52s
pod/hello-world-v1-fcp8p   1/1     Running   0          52s
pod/hello-world-v1-gwcdn   1/1     Running   0          52s
pod/hello-world-v1-ht56v   1/1     Running   0          30m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

Scaling pods

Once you have a ReplicaSet up and running, you can scale the number of replicas in 2 ways. The fastest way is to run kubectl scale --replicas=<replica count> rs/<RS name>. For example

kubectl scale --replicas=1 rs/hello-world-v1

In this case, that scales the replica count down to 1, leaving us with

kubectl get all

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-gwcdn   1/1     Running   0          37m
pod/hello-world-v1-ht56v   1/1     Running   0          37m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   1         1         1       37m

As you can see, the RS removed all the extra pods, leaving us with 1 pod (except for the hello-world-v1-ht56v pod, which is the one that has mismatching labels).

A word of caution though! Scaling manually like this is not recommended in most cases. However, it is very fast! So if you are in a pinch, you can run this command to scale up (or down) quickly. Just make sure that you update the corresponding YAML-file after doing it.

In most cases, you want to make sure that the YAML-files that you store in source control, represent the current state of the cluster. Otherwise, someone (or maybe a CD-pipeline) might re-deploy the RS spec from source control and overwrite the change you just made.

So a better solution to handle scaling changes is to update the spec.replicas entry in the YAML-file, and re-apply the file.

Automatic horizontal scaling

You can also do automatic horizontal scaling, using something called a HorizontalPodAutoscaler. This allows you to get your pods automatically scaled horizontally based on CPU load. However, this is somewhat out of scope for this part of my introduction. But in short, you can run the following command to create a HorizontalPodAutoscaler

kubectl autoscale rs hello-world-v1 --max=5 --cpu-percent=70

NAME                       READY   STATUS    RESTARTS   AGE
pod/hello-world-v1-gwcdn   1/1     Running   0          49m
pod/hello-world-v1-ht56v   1/1     Running   0          49m

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   45d

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/hello-world-v1   1         1         1       49m

NAME                                                 REFERENCE                   TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
horizontalpodautoscaler.autoscaling/hello-world-v1   ReplicaSet/hello-world-v1   <unknown>/70%   1         5         0          12s

As you can see, that creates a horizontalpodautoscaler.autoscaling that will use the hello-world-v1 RS to scale up to a maximum of 5 pods if the CPU goes above 70% in the existing ones.

If you are curious about this topic, you can read more about it here

Other sets

Kubernetes also supports 2 other types of “sets”. And why would we need different kind of sets? Well, they solve slightly different problems.

The ReplicaSet that we have looked at in this post, will always try to make sure that there are the desired number of pods running in the cluster. Scheduling pods around the cluster based on node selectors and available resources. As we have seen…

But we also have something called a DaemonSet. This type of set will make sure that there is always a defined pod running on each one of the nodes in the cluster. This can be used for a few different reasons. For example, it can be used in scenarios when you need to interact with the actual node. Doing things like gathering node level resource usage etc. It can also be used to make sure that supporting pods, required by other pods in the cluster, is always available locally on the node that they are running. This means that you don’t have to leave the current node when making requests to these pods.

Finally, there is a type called StatefulSet. Pods in stateful set are treated a bit differently when if comes to scheduling, deletion and scaling. It is used specifically for pods that need to be run in stateful manner. However, as we try to run stateless workloads as much as we possibly can, these are not used nearly as much as ReplicaSets. Because of this, I won’t cover them in this intro post. But it can be worth noting that they exist, and that they can be used for running stateful things like databases etc in the cluster.

That’s it for part 3. I think I have covered everything I planned on covering in this post…

The fourth part of this series is available here.