I had the chance to attend KubeCon Europe in Copenhagen last week, and it was a total blast. The attendance was huge, with developers from all over the world, and I had great conversations with many different people. There were countless talks of all levels, and many of them (especially keynotes) by core committers to many projects from CNCF (Cloud Native Containers Foundation). In this post, I would like to gather my impressions on what I think were the main themes, and some tendencies and future directions that I think the CNCF, Kubernetes and the other projects will take. In case you wonder who I am, by the way, I’m the guy who walked around the whole conference in a Kramer hairdo, because his hair gel got confiscated at the airport.
The short name of the conference and the Twitter tag was KubeCon, but it was in fact an umbrella conference by the Cloud Native Foundation. There will apparently now be one in Europe, one in the US and another in China each year. I think this is a great idea, because there are a great many people who either can’t afford or don’t want to go through the hassle of obtaining a Visa, such as myself. The future of Kubernetes was of course a major topic; Aparna Sinha gave a keynote on the state of Kubernetes, especially regarding how it is hosted on GKE. Most of her talk was oriented around how enterprises are accepting Kubernetes, and what kind of developments they expect. Security was a huge topic, with enhancements to authorization, RBAC and pod permissions on the list. A new project from Google named gVisor was released just recently, bringing very simple sandboxed containers to Kubernetes (there was another talk later just on gVisor). On the application front, better support for stateful applications in the form of application operators was mentioned, but I didn’t quite get what was new about this. There is already the operator framework by CoreOS, and it sounded like Sinha was talking about the exact same thing, with common features such as application lifecycle operations, backup, restore, monitoring etc. But maybe I missed something; do let me know in the comments if this is a new feature.
How the enterprise is discovering (or discovered and is now getting involved in more deeply with) Kubernetes, and how Kubernetes is also developing in that direction, was a topic that came up frequently in talks and chats with attendees. There was a very interesting presentation by two developers from a consultancy in China who talked about a project they did for the central Chinese banking authority (The Visa and MasterCard of China, as one presenter said). As one would expect from an organization of that size, they had to come up with a rather complicated setup for security and reliability; there were multiple checks for who could do what, and what could be deployed by whom. Security is obviously one of the things that early adopters may ignore, but enterprises like these care a lot about, but as this talk displayed, Kubernetes has made huge advances in this area.
All the big cloud vendors were at the KubeCon, as one would expect, either advertising or actually revealing their hosted Kubernetes solutions. DigitalOcean announced a hosted Kubernetes solution on the second day of KubeCon; it is yet in early access stage, but will be available soon. The common thing about all these hosted solutions was that they promised to handle the major pains of hosting Kubernetes, such as updates. While the big cloud vendors were targeting the difficulties of running a Kubernetes clusters, other vendors were advertising super easy ways of running an application in a Kubernetes cluster. The presenters of one talk I attended demoed a service called hasura.io where it was possible to simply push code to a Git repo, and have it deployed to a pod in a Kubernetes cluster. The description of the cluster is included as YAML files in a repo, and it is possible to attach these descriptions to a cluster using a CLI client. Once that is done, all git push events deploy to the cluster.
Which brings me to what I think is another trend that has been very obvious in this KubeCon: GitOps. Alexis Richardson mentioned this in his keynote, and he came up with the name as far as I could understand. He also went into more depth in a separate talk on how to implement it on Istio, which I missed and had to watch separately later. One half of GitOps is method-wise the same as the “infrastructure as code” part of devops, in that the system is described in declarative terms and stored in a shared repository. What’s new is a much tighter connection between new code in a Git repository, and its availability in the cluster. The aforementioned hasura.io is a platform for achieving this connection. Weaveworks implemented their own internal version using operators, which were mentioned above. These operators listen to Git repositories, update services and deployments based on changes, and report the current state to observability tools. The originator of the push-is-deploy kind of flow is of course Heroku, which was mentioned every time the topic came up. It looks like GitOps will be the Kubernetes-based, more generic method of achieving the same workflow. The way I have explained it here is kind of an oversimplification; I would advise you to have a look at the presentation. I would also expect more tooling support to appear and also be standardized in the near future.
Kubernetes offers a very straightforward pattern of component integration. Components can be deployed as pods managed by Kubernetes itself, accessing data from application pods, and changing cluster state based on specifications stored in the etcd data store. An interesting example of this pattern could be witnessed in a demo for Fluent Bit, where pods could be annotated according the kind of log they output, and the output would be parsed accordingly. A core part of this integration pattern is Prometheus as the main source of observability. All pods make data available in the format Prometheus understands, from where it is posted mostly to Grafana for visibility and alarms. There is now also a slew of new applications that are, as per the name of the foundation, cloud-native and first-class citizens of Kubernetes. This means that they play well with the pod lifecycle elements of Kubernetes, are Prometheus-observable, and can cluster easily in a container network. Another common feature is that they are relatively simple, just like Prometheus itself, and concentrate on doing one job well. This point was very prominent in one talk I attended on Nats, a new message queue whose developers refrained from implementing many standard features in other message queues (message headers, complex routing logic etc), opting instead for performance and reliability.
These various components make life easier and enable continuous scaling and growth of the cluster. Their interaction in a living and changing cluster can get rather complex, however, and minor mismatches can lead to serious issues. This point was driven home in one excellent keynote by Oliver Beattie, the CTO of Monzo, an online bank. He explained an outage which took 1.5 hours to fix. The post mortem is pretty good reading, and shows how the interplay of various pieces of complex software can have unexpected error cases. In this case, one of the root causes was an incompatibility between specific versions of Linkerd and Kubernetes, stemming from the representation of an empty service being changed from an empty list to null in Kubernetes. On the one hand, this hits a pet peeves of mine, namely preferring the null value of a type (empty list in this case) instead of null or none. On the other hand, more generally speaking, this is an issue that I think will become more and more acute in the near future. As the number of components used in a cluster and the frequency of updates to them grows, the chances of one or more of those components interacting together to cause issues will also increase.
The solution to component combinatorial explosion might be another practice on which there were two talks by Sylvain Hellegouarch: Chaos Engineering. The second of these talks went into more detail on what I think will be a more and more accepted means of improving the understanding and reliability of complex clusters. Chaos engineering is “the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production”. Sylvain explained the usage of the chaos toolkit, which can run pre-planned tests in which load conditions are created, the cluster is “mutilated”, and then the reaction of the cluster is tested against the given criteria. The toolkit then creates a report, replete with graphs and detailed information on whether and how the cluster recovered. A couple of points stressed by Sylvain was that chaos engineering is not an effort to simply break a cluster, but probe it with knowledge of what can actually go wrong. The probing is done with a certain aim, such as the cluster repairing itself or alarms going off. The concrete aim is to unearth weaknesses, which are definitely there, to know what to do in critical situations, and instill more trust in the system by being prepared for difficult situations. Chaos engineering is a practice I definitely intend to introduce into our development team. I think it should be done instead of simple load testing, where optimum working conditions are usually taken as given. Instead, for such an in-depth test to deliver useful and relevant information, proper load conditions need to be combined with changes to system and service pods and failure conditions in various places.
Not all was nice and dandy at the KubeCon. A major point of disagreement between presenters: How to pronounce “kubectl”. To my horror, the majority pronounced it “kube-cuttle”, which is just wrong. kubectl doesn’t have anything to do with cuddling or cuttlefish; it’s for controlling Kubernetes, ergo kube control. I guess I will have to wait until the next KubeCon to settle this point with a talk of my own.
One last note: I’m nearly done with an introductory Kubernetes tutorial, which should be published in a couple of days. Follow me on Twitter to be informed when it’s online.