OK I GIVE UP

A Tutorial Introduction to Kubernetes

Published on

Kubernetes is the hottest kid on the block among container orchestration tools right now. I started writing this post when we decided to go with Kubernetes at Twyla a year ago, and since then, the developments in the ecosystem have been simply overwhelming. In my opinion, the attention Kubernetes gets is completely deserved, due to the following reasons:

  • It is a complete solution that is based on a fundamental set of ideas. These ideas are explained in the Borg, Omega and Kubernetes article that compares the consecutive orchestration solutions developed at Google, and the lessons learned.

  • While it is container-native, Kubernetes is not limited to a single container platform, and the container platform is extended with e.g. networking and storage features.

  • It offers an open and well-designed API, in addition to various patterns that suit differing workflows. The wonderful thing is that there is a very well-governed community process whereby the API is constantly developed further. You have to spend effort keeping up, but regularly receive goodies in return.

In this tutorial, I want to document my journey of learning Kubernetes, clear up some points that tripped me as a beginner, and try to explain the most important concepts behind how it works. There is absolutely no claim of completeness; Kubernets is way too big for a blog tutorial like this.

Starting off

The easiest way to start using Kubernetes is Minikube. If you have an account with a cloud provider, and would like to first figure out the details of running a cluster on their platform, this tutorial will still work for you, as the commands work for any recent version of Kubernetes. See here for details on how to get Minikube running on your computer. In order to manipulate the Kubernetes mini-cluster minikube runs, you need the official CLI client named kubectl, which can be installed following the instructions on this page. You will also need Docker to create and push container images. Install Docker on your computer following the instructions here.

Once you have installed everything, make sure they are all available with the following commands:

kubectl version
docker version
minikube version

You can check whether Minikube is running using the following command, which also tells you whether there is an update available:

minikube status

If minikube is not already running, you can start it with minikube start. Normally, when you install minikube, it automatically configures kubectl to access it. You can check whether this is the case with kubectl cluster-info. Its output should be something like the following:

Kubernetes master is running at https://192.168.99.100:8443

If the IP is not in the 192.168.*.* range, or kubectl complains that configuration is invalid or the cluster cannot be contacted, you need to run minikube update-context to have minikube fix your configuration for you.

How is kubectl configured?

I think it is a good idea to shortly mention how kubectl is configured. Which API endpoints and clusters kubectl accesses are defined in the \\~/.kube/config file by default. The file that is accessed can be changed with the KUBECONFIG environment variable, which should specify a list of paths, so if kubectl displays weird behavior whih you suspect might be due to the configuration, don’t forget checking whether this environment variable is set. The kubectl configuration file is in the YAML format, like many other things in Kubernetes. It has two top-level keys that are of immediate relevance: contexts and clusters. The clusters list contains endpoint and certificate information for the different clusters to which the user has access. A context combines one such cluster with the user and namespace values for accessing it. One of these contexts is the currently active one; you can find out which by either looking at the config file, or running kubectl config current-context. You can also run kubectl config view command to show the complete configuration. You can limit the data shown to the current context with this command using the --minify option.

Nodes and namespaces

Two basic concepts that are relatively straightforward and can be explained without a lot of context are nodes and namespace. Nodes are the individual units of a Kubernetes cluster, be it a VM or an actual computer. What makes such a unit a node is the kubelet process that runs on it. This process is responsible for communicating with the Kubernetes master, and running the right containers in the right way. You can get a list of the nodes with kubectl get nodes. If you are using Minikube, and didn’t do anything fancy with the configuration, there will be a single node. Nodes are not particularly interesting. You as a Kubernetes user will not be doing anything fancy with them, and cloud provisioners all have means of automatically or manually scaling the nodes in a Kubernetes cluster.

Namespaces provide a means to separate subclusters conceptually from each other. If you are running different application stacks on the same cluster, for example, you can organize the resources per app by putting them in the same namespace. A resource created without a namespace specified is created in the default namespace. It’s not necessary to use namespaces, but they make certain things much easier, by helping you avoid name clashes, limit ressource allocation, or manage permissions. In case you start working with namespaces, and get annoyed by having to provide the --namespace switch to every command, here is a handy command that will set the default namespace for the current context:

kubectl config set-context $(kubectl config current-context) --namespace=my-namespace

Kubernetes dashboard

Kubernetes comes with a built-in dashboard in which you can click around and discover things. You can find out whether it is running by listing the system pods with the following command:

kubectl get pods -n kube-system

If there is an entry beginning with `kubernetes-dashboard`, it’s running. In order to view the dashboard, first run the command kubectl proxy to proxy to the Kubernetes API. The Kubernetes API should now be available at http://localhost:8001, and the dashboard at this rather complicated URL. It used to be reachable at http://localhost:8001/ui, but this has been changed due to what I gather are security reasons.

Using a locally built image with Minikube

In the following tutorial, we will be deploying various container images in order to demonstrate Kubernetes features. Kubernetes uses Docker to retrieve and run container images, meaning that the usual rules of Docker container pull logic apply. That is, for a container image that is not available, if only a name and a tag are provided, Docker contacts the Docker Hub, otherwise hitting the registry in the container name. The aim of this tutorial is to get you to playing around with services running within a Kubernetes cluster as quickly as possible. Hence, the method I would recommend for accessing the container images from minikube is directing your Docker client to the daemon running inside minikube, instead of the local one. Configuring Docker to do so is straightforward with eval $(minikube docker-env). Now, any image that you create and tag will be available inside minikube. You can make sure that this is the case by running docker ps. If the output contains a list of images from gcr.io/google_containers, you are doing it right. This proxy to the docker service in minikube will be valid only in the current shell; you will be back to using the local docker service when you switch to another shell.

If you are not interested in modifying and building the sample services yourself, you can also pull the sample images from my Docker.io profile. It should be enough to replace the kubetutorial prefix in the image tags with afroisalreadyin.

Running a service

Let’s start off by running our first command to tell us whether there is anything running on the cluster. We will use the above mentioned kubectl client to do so, running the command kubectl get pods. What pods are will be explained in a second. As long as the client is configured correctly, as explained above, you should see only the message No resources found. What kubectl did was to access the Kubernetes cluster running within minikube as specified by the currently active context configuration and present the resulting information. kubectl is just one among many API clients; there are others, such as this Python client which is the other officially supported one. You can view the API requests kubectl is making by increasing the verbosity of the logging with the --v=7 argument, but careful, this will lead to a lot of textual output.

Kubernetes will not figure out for itself what we need to run, so let’s go ahead and tell it to run a very simple application, namely the simple Python application from the Kubernetes demos repository. In order to do so, you need to first clone the repo, navigate to the subfolder simple-python-app, and create a container image by running the following command:

docker build -t kubetutorial/simple-python-app:v0.0.1 .

Once the build runs, you should be able to see it in the list of available images in the result of running docker images. After making sure this is the case, we are finally ready to run our first Kubernetes command, which is the following:

kubectl run simple-python-app \
     --image=kubetutorial/simple-python-app:v0.0.1 \
     --image-pull-policy=IfNotPresent \
     --port=8080

It should be obvious that this command somehow runs the container that we just created, since the tag of the image is passed in with the --image argument. The imagePullPolicy=IfNotPresent argument tells Docker to use an existing local image instead of attempting to pull it. We are also specifying the port 8080 here as the port this deployment is exposing. This has to be the same port the application is binding to. Unless we provide this bit of information, Kubernetes has no way of knowing on which port to contact the application. Small side note: The demo service has to bind to this port on the general interface 0.0.0.0 and not on localhost or 127.0.0.1.

How do we reach into Kubernetes to contact our service? This is the perfect time to introduce the most important abstraction in Kubernetes: The Pod. As with the other abstractions, pods are resources on the Kubernetes API, and we can list and query them using kubectl. Let’s see which pods are now running, with the same command that we ran earlier, kubectl get pods. The output should closely resemble the following:

NAME                               READY     STATUS    RESTARTS   AGE
simple-python-app-68543294-vhj7g   1/1       Running   0          21s

Great, we have a pod running. But what is a pod, actually? A pod is the fundamental application unit in Kubernetes. It is a collection of containers that belong together, and whose lifetimes are managed together. These containers are deployed on the same node, their lifetimes are managed together, and they share operating system namespaces, volumes, and IP address. They can contact each other on localhost and use OS-level IPC mechanisms such as shared memory. The decision of what to include in a pod hinges on what serves as a single unit across the dimensions of deployment, horizontal scaling, and replication. For example, it would not make sense to put the data store and the application containers of a service into the same pod, because these scale and are replicated independently of each other. What does belong together with the application container is a container that hosts the log aggregation process, for example.

Now that we know what a pod is, and can figure out the name of our single pod running, we can query it using the kubectl proxy feature we already used above. Once the proxy is running, you can access the simple-python-app container on the port we specified in the previous command by querying the special URL that Kubernetes makes available for this purpose (don’t forget changing the name of the pod at the end of the URL):

curl http://localhost:8001/api/v1/proxy/namespaces/default/pods/simple-python-app-68543294-vhj7g

We can also see the logs of our brand new pod with kubectl logs simple-python-app-68543294-vhj7g, which should show the stdout of our application. It is also possible to execute a command within the container, similar to the docker exec command, with kubectl exec -ti simple-python-app-68543294-vhj7g CMD. As with Docker, the -ti bit signals that a tty should be allocated, and the command should run interactively. The kubectl exec command allows you to pick which container to run the command in using the -c switch. When ommitted, the default is the only container in the pod, if there is just one, as per the definition of the pod.

Who created the pod?

It’s nice that Kubernetes is running our container inside a pod, but we would still like to know where the pod actually comes from. We didn’t tell Kubernetes to create any pods. In fact, pods are rarely created manually in Kubernetes. If that were the case, Kubernetes would not be offering anything new; the user would still be responsible for orchestrating the individual application units, and ensuring their availability. What the above kubernetes run command did was to create a Deployment. This can be seen by listing the deployments:

$ kubectl get deployments
NAME                DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
simple-python-app   1         1         1            1           1s

Deployments are one of the special kinds of resources in the Kubernetes world, in that they are responsible for managing the lifetime of application containers. These kinds of resources are called controllers, and they are central to the Kubernetes puzzle. You can get more detailed info about the new deployment with kubectl describe deployments simple-python-app. The describe subcommand is a very useful tool for getting detailed information on all resources. It also lists related resources, and events that concern the described resource. For this deployment, you can see a couple of things in the output of kubectl describe. First of all, there is talk of something called a pod template. This is what is used to create the pods when the deployment is being scaled, i.e. new pods are being created to meet the target.

What happens when we delete the pod? In order to view what is happening in real time, I would advise you to open a second terminal, and run the command kubectl get pods -w in it. The -w switch updates the output in regular intervals. Now, delete the existing pod with kubectl delete pod simple-python-app-68543294-vhj7g. In the output of the pod listing terminal, you should temporarily see a state like the following:

NAME                                 READY     STATUS        RESTARTS   AGE
simple-python-app-5c9ccf7f5d-8lbb2   1/1       Running       0          4s
simple-python-app-5c9ccf7f5d-kl77s   1/1       Terminating   0          43s

So as one pod is being deleted, another was already created (the status might also be ContainerCreating instead of Running. The responsibility for this recreation goes to Replica Sets. You can see the replica sets that belong to a deployment using the above mentioned kubectl describe command; the Replica Sets will be listed at the bottom, before the events. You can see that there are two lists: OldReplicaSets and NewReplicaSets. The difference between the two will be explained later in the context of rollouts. You can also list the replica sets with the kubectl get replicasets command.

Looking at the replica set created by our deployment with kubectl describe replicaset $REPLICA_SET_NAME, we can see at a glimpse a number of relevant rows:

# ... snip
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       pod-template-hash=4035281104
                run=simple-python-app-2
  Containers:
   simple-python-app:
    Image:              kubetutorial/simple-python-app:v0.0.1
    Port:               8080/TCP
    Environment:        <none>
    Mounts:             <none>
  Volumes:              <none>
Events:                 <none>

This Replica Set is responsible for keeping one Pod with our simple-python-app container running, and it is doing that successfully, judging from the 1 current / 1 desired row. But as with pods, replica sets are intended to be created by Deployments, so you shouldn’t have to create or manipulate them manually.

Short excursion on networking

As nice and useful as replica sets are, they not much of a help in terms of high availability. When a Pod goes down, another one is started, and it has a different name, a different IP address, and is possibly running on a completely different node. Also, what if we want to load balance these replicas? If Kubernetes were to offer service discovery only based on pod names, the clients of this service would need to do client-side load balancing, and keep an internal list of pods that need to be updated on every pod lifetime event. What about routing incoming traffic to services (ingress)? These are all pesky issues that need simplification. Kubernetes offers much easier mechanisms to achieve HA, load balancing and ingress. The basis for all this is the networking requirements Kubernetes imposes on the nodes and pods. These are the following:

  • All containers can communicate with all other containers without NAT (Network Address Translation).

  • All nodes can communicate with all containers (and vice-versa) without NAT.

  • The IP that a container sees itself as is the same IP that others see it as.

It is possible to use any one of various networking options that fit this model, with kubenet being the default. The above requirements sound relatively straightforward. One would think that each application container gets its IP. That is not the case, however, as it is not the application containers, but the Pods that get the IP addresses. Or in the words of the documentation:

Until now this document has talked about containers. In reality, Kubernetes applies IP addresses at the Pod scope - containers within a Pod share their network namespaces - including their IP address.

You can also verify that pods can be reached by IP on the exposed port by getting the private network IP address of the container with kubectl get pods -o wide. Afterwards, log on to the Minikube node with the command minikube ssh. From within this node, you can query the service with curl $IP_ADDRESS:8080, which should return the response we have already seen.

How are pods that belong to the same replica set organized, in order to provide high availability, load balancing and discovery? The answer to this question is requires introducing another Kubernetes concept.

Services

I have been calling the tiny web application we have been using for demo purposes a service, but service has a totally different meaning in the Kubernetes world. A Kubernetes Service is an abstraction that allows loose coupling of pods to enable load balancing, discovery and routing. Through services, pods can be replaced and rotated without impacting the availability of an application. Let’s start with a very simple example where we turn our simple Python application into a Service, which can be achieved with the following very simple command:

kubectl expose deploy simple-python-app --port 8080

If you now run kubectl get services, you should see a list consisting of two entries: kubernetes and simple-python-app. The kubernetes service is a part of the infrastructure, and you shouldn’t meddle with it. The other service is what we are looking for, especially the IP address, which is listed under the column CLUSTER-IP. We are interested in this IP address because it is something special. It’s a virtual IP Kubernetes has reserved for the new service. In the same output, you can also see that the port 8080 is exposed. We can now log on to the minikube VM (which is a Kubernetes node) with minikube ssh, and query what is now truly a service with curl $IP_ADDRESS:8080, once more returning Hello from the simple python app. The network requirements mentioned above ensure the reachability of the service IP from the node.

Things get much more interesting when there are multiple pods in a replica set. In order to see the effect, let’s use another service that provides more information in its response. This service is in the kubernetes-repository as env-printer-app. When the base path is called, it returns a print of the environment variables. Just like with the previous application, you can go ahead and create a container with the following command:

docker build -t kube-tutorial/env-printer-app:v0.0.1 .

We will start the Deployment with a replica count of 3, which will cause Kubernetes to start 3 pods right away. To do so, use the following command:

kubectl run env-printer-app \
     --image=kube-tutorial/env-printer-app:v0.0.1 \
     --image-pull-policy=Never \
     --replicas=3 \
     --port=8080

Now let’s create a Service by exposing this Deployment with the following command, which is a slight modification of the expose command we used earlier:

kubectl expose deploy env-printer-app --port 8080

A new service env-printer-app should pop up in the output of kubectl get services. Note the IP address for this service under CLUSTER-IP as $IP_ADDRESS, and log on to minikube via ssh again. Afterwards, run the following command a couple of times:

curl -s $IP_ADDRESS:8080 | grep HOSTNAME

This command makes a request to the service endpoint, and filters the HOSTNAME environment variable out of it. You should observe that the hostname alternates between the various pod names. Kubernetes is distributing the requests among the replica pods for us, giving us load balancing out of the box.

This very short demo of services leads to more questions than answers. How does the service know which pods to hit when a request comes in, for example? Why can we contact our service only from within the cluster? How can we enable external access to it? Before we can answer these questions, however, we need to have a look at a better way of specifying deployments, services and other resources.

Using the command line versus manifest files

Until now, we have been using the command line interface to Kubernetes via kubectl. It is possible to get quite far with kubectl, as it is pretty complete, but it can become difficult to read, share with others, and organize in a repository. A much better method for organizing Kubernetes resources which adheres to the infrastructure as code mantra is using manifest files. These are either YAML or JSON files (although YAML is preferred) that specify in a more structured format the resources to be created and actions to be undertaken. A manifest file takes the form of a list of resources of different kinds, together with metadata and a spec. It is also common and recommended practice to specify the version of the API that is targeted with each entry. The different entries must be separated with a triple dash separator, which signifies the start of a new document in YAML. This separator is mandatory; if you leave it out, only the first item in a list will be processed.

The resource specifications are documented in great detail in the Kubernetes API documentation. What’s even better, however, is that the kubectl command is self-documenting. To get documentation on pods, you can use the kubectl explain pods command. This command will print, prefixed by a short description, the various fields a pod manifest can contain. In order to go deeper in this tree, you can run commands such as kubectl explain pod.metadata.labels, which will give more detailed information on individual fields.

If you have a look at the entry for deployment in either the online or command line documentation, you will see that the metadata field is same across all resources, and the name field is required. This field enables us to refer to resources in commands when we want to get detailed information or delete them, or cross-reference from other manifest files. The spec field is required to adhere to the DeploymentSpec configuration, which should have a template field that describes the pod to be deployed. This template, in turn, must have a metadata field itself, and a spec that should contain a list of containers. As per this specification, here is how to create the above deployment example for the env-printer-app, in YAML format:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: env-printer-app
spec:
  replicas: 3
  template:
    metadata:
      labels:
        app: env-printer-app
    spec:
      containers:
      - image: twyla.io/env-printer-app:v.0.0.1
        imagePullPolicy: IfNotPresent
        name: env-printer-app

It is possible to see a common pattern of nested resources that all have metadata which is used to refer to each other, templates that tell Kubernetes what kind of resources to create, and various other kinds of auxiliary information, such as the replicas field. You can now go ahead and use this YAML file, saved into deploy.yaml in the kubernetes-repository/env-printer-app directory, to create a deployment by running kubectl apply -f deploy.yaml. It is possible to create all resources in a directory by kubectl apply -f with the directory path.

You can also use kubectl get KIND NAME -o yaml to get a detailed description of a resource in YAML format. This YAML document might include much more than the information you supplied when creating a resource, as the values for the defaults you omitted, and those calculated or set by Kubernetes are also included. Another really great feature that relies on the YAML representation capabilities of Kubernetes (one of my favorite features) is editing a resource with the command kubectl edit KIND NAME. This command will fetch the resource description in YAML, and load it in the editor defined by the EDITOR (or KUBE_EDITOR, if it’s defined) environment variable. Once you save your changes and exit, the new resource description will be applied to the resource. This is a great way to try things out quickly without having to keep multiple versions of resource definitions.

Services, continued

Alright, where were we? So we have a bunch of containers running in Pods, provisioned and kept alive through Deployments, bundled into a Service that puts them behind a common IP. And we can put all of these into one or more YAML files to recreate them arbitrarily. This is a good point to explain one very interesting and versatile feature of Kubernetes: Selectors. If you go ahead and get the details of the env-printer-app service we have created above with kubectl describe service env-printer-app, you should see a row that begins with ~Selector: ~. This selector configuration tells you how Kubernetes finds the pods it should collect behind the virtual IP of the service. If you didn’t do anything funky in the meanwhile, the value of the selector row should be run=env-printer-app. If you describe the deployment targeted by this service with kubectl describe deploy env-printer-app, you will see exactly the same selector line. Services and deployments use the same mechanism to match the pods that they hit or control. Which pods are these? This question can be answered by filtering a search by label, as in the following command:

kc get pods -l run=env-printer-app

Not surprisingly, these are the three pods created by the original deployment. This selector-based mechanism is used by many components in Kubernetes, and it is very versatile in that it allows custom labels. This opens up a whole lot of possibilities for different patterns, such as A/B deployments, rolling updates (which we will see later) and similar things.

What is thus happening is that a collection of pods, as picked by the spec.selector attribute, is exposed as a service on an IP. This is not the only way to expose a service, however: There are different kinds of Services based on how this exposing happens. The default is the ClusterIP kind, which is what we have now. Other kinds are NodePort, where a service is exposed on the same port on all exposed nodes, LoadBalancer that uses a platform-native load balancer to expose a service to the outer world, and ExternalName which enables you to provide an external service on the local cluster as if it’s an internal one.

These all have their use cases, but the ClusterIP service is the one that covers the most use cases, so we will concentrate on it here. Having multiple pods behind a single IP solves many problems, since Kubernetes also takes care of things like load balancing (done by randomly routing requests; a new proxy mode will introduce more options) or managing modifications in target pod set. One thing it does not solve, however, is the problem of figuring out this IP in the first place. This is another point in which Kubernetes shines: Matching a name to an IP address is done using DNS on the internet, and Kubernetes builds on this common protocol by providing an internal DNS service itself. By default, a DNS A record is created, pointing to the service IP, for each ClusterIP service. Hence, we should be able to refer our env-printer-app under this exact name. To see that this is the case, run the following command to run bash on a container:

kubectl run my-shell --rm -ti --image cfmanteiga/alpine-bash-curl-jq bash

There are quite some arguments to this command, which need some explanining. The --rm switch tells kubectl to delete the deployment and the pod once the command is run, while -ti asks it to attach a tty to the container, and make it connect to the stdin of the container process. The --image argument specifies a lightweight alpine-based image with some debugging utilities, and the last argument is the command to use instead of the entry point of the container. In the shell that starts, you can now run curl http://env-printer-app, and enjoy the environment varliable list delivered by the service.

Ingress

Our service is now humming in the cluster, accepting requests when we hit it at http://env-printer-app. In order to make it available to the outer world, we need to do one last thing: Tell Kubernetes to route HTTP requests from the outside to a certain location to this service. This process is called Ingress, and Kubernetes offers a complete system to handle it. There are two things you need to enable to route requests to the env-printer-app from the outside:

  • An Ingress controller, essentially a reverse proxy running within Kubernetes that can be configured using Kubernetes-native resources. The two built-in solutions are GCE and Nginx-based. In order to use the Nginx-based ingress controller on Minikube, you have to enable the extension with minikube addons enable ingress.

  • Ingress specifications. These are resources just like Pods and Deployments, and contain information on how to map incoming requests to services, serving as configuration for the aforementioned ingress controller.

An Ingress specification for the env-printer-app is included in the sample project repo as ingress.yml. After activating the minikube ingress plugin, you can run kubectl apply -f ingress.yml to create an ingress that maps requests to http://env-printer to the env-printer-app service. In order to test the ingress, you need to first figure out the IP of the minikube VM with minikube ip, and then edit /etc/hosts on your computer, adding the line $IP_ADDRESS env-printer. You should now be able to navigate to http://env-printer in your browser, and see the output of the env-printer-app service.

Rolling updates

Once you have a deployment managing a set of pods, there are a couple of things you can do with it to adapt to new conditions. First of these is scaling the set of containers to meet load conditions. One way of achieving this is using the kubectl scale command, as follows:

kubectl scale deploy env-printer-app --replicas=4

Alternatively, you can use the kubectl edit deploy env-printer-app command to bring up an editor, and change the spec.replicas field to the required number. If you now run kubectl describe deploy env-printer-app, there should be a new scaling event in the Events section. When the number of replicas is changed, Kubernetes simply creates new pods, or terminates existing ones, without any further complications. It’s a different situation when the container spec for a deployment is changed, however. Kubernetes, based on the strategy specified by the user, replaces the pods progressively, to enable a smooth transition from one set of pods to the other. This is called rolling updates.

In order to demo rolling updates, I added another project to the sample Kubernetes services repository, the rollout-app. You can go ahead and create the service by running kubectl apply -f deploy.yml --record in the app’s directory, which will create the deployment, the service, and the ingress. The reason for the --record switch will be explained in a couple of paragraphs. If you edit your /etc/hosts file to add http://rollout-app with the minikube IP, you should be able to navigate to this URL and see a big display of the port’s hostname.

If you open rollout-app/application.py, you can see two peculiar things there. One is the /healthz endpoint that returns a simple OK message and nothing else, and the other is a time.sleep(5) before the app starts. The purpose of the /healthz endpoint might become clearer if you also look at the deploy.yml in the same directory; this endpoint is registered as a readinessProbe on the deployment. The readiness probe is a part of the pod lifecycle system of Kubernetes. Before this probe is valid (for HTTP probes, it must return a status code between 200 and 400), the new pod is not marked as “ready”, and requests will not be routed to it. Due to the sleep of 5 seconds before our application is started, the pods of the rollout-app will not be ready for at least five seconds. Now let’s have a look at how this delay interacts with the rolling updates feature of Kubernetes. Once you have deployed the application, change application.py in some minor way, such as adding a newline. Afterwards, create a new docker container with a new tag with docker build -t kubetutorial/rollout-app:v0.0.2 .. Then go ahead and change the Docker image for the rollout-app deployment to the new version with the following command (again with the --record switch which will be explained later):

kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record

Kubernetes gets to work right away, creating new pods and terminating the ones these are supposed to replace. You can see that this is the case by running kubectl get pods. One peculiar (or actually nice) thing is that Kubernetes does not just pull down the running pods, starting their replacements at the same time. A rollout process is applied, whereby new pods are created as old ones are taken down. You can follow this process by running the command kubectl rollout status deploy rollout-app. This command will hang with a message like Waiting for rollout to finish: 2 of 3 updated replicas are available…. So now the deployment is in the middle of a rollout process. We will see where these numbers come from later. A rollout is actually the process of moving from one replica set to another. You can see that this is the case by running the command kubectl get replicaset (or replace replicaset with rs to make the command shorter). You should see two replica sets that begin with replica-set, one belonging to the old state, and the other belonging to the new state. The DESIRED, CURRENT and READY values of one should decrease, while the other one goes up and approaches required values.

One thing you can do is pause this rollout while it is in progress with kubectl rollout pause deploy rollout-app. This will leave the pod counts the way they are when you run the command, and give you the chance to run checks, to make sure everything is OK. Let’s say that you start a rollout, pause it to run some checks, and discover that you made a mistake, and would like to rever to the previous version to fix the issue. This can be achieved by rolling back the rollout with kubectl rollout undo deploy rollout-app. But let’s say that you want to move back even further in the deployment history. This is where the --record switch to the kubectl apply command comes into play. Thanks to this switch, we can now see the commands that caused a rollout on this deployment, and a version number that we can use to refer to that rollout. After you deploy version 0.0.2 of rollout-app, the output of the kubectl rollout history deploy rollout-app should be similar to the following:

REVISION        CHANGE-CAUSE
1               kubectl apply --filename=deploy.yml --record=true
2               kubectl set image deploy rollout-app rollout-app=kubetutorial/rollout-app:v0.0.2 --record=true

You can switch e.g. to revision 1 with the following command:

kubectl rollout undo deploy rollout-app --to-revision=1

The rollout feature of Kubernetes is very well-designed and feature rich. Other things you can do are precisely control the number of percentage of pods that are replaced, or set conditions on failing rollouts so that they can be rolled back automatically by other tools.

Going further

Until now, I have been singing Kubernetes’ praise, but not everything about it is perfect, unfortunately. We have run into a couple of issues building a Kubernetes cluster. Kubernetes, despite being a relatively young project, is under heavy development, and keeping up with it is not a simple job. The development process is very well-managed, but nevertheless it is a full-time responsibility to keep up with the changes. This situation is mirrored on the provider side of things, as cloud vendors are racing to provide the best hosted Kubernetes solution possible, which also leads to considerable trial-and-error. Azure, for example, started off with a feature called ACS, which was supposed to be a generic container management solution, but quickly recognized how popular Kubernetes was coming, and deprecated ACS in favor of AKS which is directed solely towards Kubernetes, and has extra features such as redundant master nodes. Unfortunately, we are on ACS, and need to make the move to AKS at some point.

Another thing you have to keep in mind when running Kubernetes is that it has significant platform-dependent parts, and these are not uniform in terms of correctness and reliability. A short time after moving to Kubernetes on Azure, we found out that there was a serious bug with Kubernetes on ACS that makes the storage mounting feature of Kubernetes nearly unusable. Our solution is to rely as much as possible on the cloud offerings of Azure such as CosmosDB and managed PostgreSQL, but we will need to use local storage in a service at some point. Fortunately, the bug appears to be fixed in Kubernetes 1.10.

As Kubernetes increases in feature set and complexity, tools built on Kubernetes to simplify workloads and provide more integrated workflows have also started popping up. Kubernetes was never meant as the last application level, meaning that there will be tools that build up on it for specific developer workflows, which is already happening. It looks like Helm is the most popular choice on this front, but there are other alternatives such as OpenShift. So be prepared to learn another tool that runs on top of Kubernetes in the near future.

Bonus: Shell Helpers

There are a couple motions you repeat over and over when you are working on a Kubernetes cluster. One of these is getting the name of a pod. As the pod name is derived from the name of the deployment, you end up running kubectl get pods and either grepping it searching it visually. In the case of single-pod deployments, fetching the name of the pod is very eash with the following bash function:

function podname {
    kc get pods | grep $1 | awk '{print $1}';
}

If you want the name of the simple-python-app pod, for example, you would need to run something as simple as podname simple. You can also use this function as argument to other kubectl commands, e.g. to print the logs with kubectl logs `podname simple`.

Another handy snippet (written by my Bash Jedi Master friend Matthias Krull) is the following, which lets you switch between Kubernetes configurations like between Python virtual environments:

function kubeon {
    if [ "${1}" ]; then
        local config_file="${1}"
    else
        echo "Usage: kubeon <config|config_file>"
        return 1
    fi

    if [ ! -f "${1}" ]; then
        config_file="${HOME}/.kube/${1}"
    fi

    if [ ! -f "${config_file}" ]; then
        echo "No config file found. Tried ${1} and ${config_file}"
        return 1
    fi

    export KUBECONFIG="${HOME}/.kube/${1}"
    export KUBEON_PROMPT="${1}"
    export KUBE_MASTER=$(kubectl config view|grep server:|cut -d/ -f3)

}

Using this function, you can set any one of the configuration files in your \~/.kube directory as the current configuration with kubeon filename. Among the variables set are KUBEON_PROMPT, which you can use in your PS1 to visualize the active Kubernetes configuration, and the KUBE_MASTER URL which might come in handy if you want to SSH to it.