cluster ops | Rob Hirschfeld

TL;DR: Working on building open governance that is both inclusive and able to make hard decisions.

building-joy-planning-plans Three weeks ago, Kubernetes leaders met for a very busy day to reflect and plan how the community was being growing. I was humbled to be part of the Kubernetes Leadership Summit due to my work as the Cluster Ops SIG co-chair. Please join us every other Thursday at 1 PT to share stories about running or planning to run Kubernetes.

This event had to thread a delicate balance for an open project: we needed to limit attendance to focus discussions while ensuring that the community was represented. Our notes (captured in Google Docs) are being transcribed to markdown here.

Here are some key topics that shaped the day from my perspective:

A consensus that core needed to focus on paying down debt and getting smaller. The core project is seen as a bottleneck to growth. The comes from number of people trying to interact in the repo and from having too much technical debt, As a group, we agreed that paying this debt was very important; however, we did not define or authorize specific action to address it. I felt that just acknowledging this focus by a show of hands was a positive action.
Moving forward on formation of a Steering Committee. The bootstrapping committee reviewed their Steering Committee proposal. The concepts here are to design a governing body that intentionally delegates their authority. I think it’s an interesting approach that will help to empower more people in the project. This design is different than a corporate board that’s focused on supervision. Here’s the draft document we reviewed as input into the next phase proposal.
Continue using SIGs to divide work. A consequence of the governance design is that we are (ab)using special interest groups (SIG) to organize the coding and feature work for Kubernetes. They also carry the load for releases, product management and operations. The push from the meeting was to have all SIGs with specific deliverables. I think that works well for some SIGs, but more user/operator focused groups (like Cluster Ops) will feel that it’s harder to find the right engagement models.

Overall, the event was very positive with lively group discussions. This group is focused on building Kubernetes, so there was very little vendor, marketing, user or operator focus. As the project grows, I believe these other focus areas will be important to manage. Likely, those concerns cannot be addressed until the Steering Committee is formed.

RackN is committed to helping make Kubernetes operable and improve the operator experience. I’m interested in hearing about your remote or local impressions of this event. What items should have gotten more discussion? What is the project missing?

Podcast juxtaposition can be magical. In this case, I heard back-to-back sessions with pragmatic for cluster operations and then how developers are rebelling against infrastructure.

Last week, I was listening to Brian Gracely’s “Automatic DevOps” discussion with John Troyer (CEO at TechReckoning, a community for IT pros) followed by his confusingly titled “operators” talk with Brandon Phillips (CTO at CoreOS).

John’s mid-recording comments really resonated with me:

At 16 minutes: “IT is going to be the master of many environments… If you have an environment is hybrid & multi-cloud, then you still need to care about infrastructure… we are going to be living with that for at least 10 years.”

At 18 minutes: “We need a layer that is cloud-like, devops-like and agile-like that can still be deployed in multiple places. This middle layer, Cluster Ops, is really important because it’s the layer between the infrastructure and the app.”

The conversation with Brandon felt very different where the goal was to package everything “operator” into Kubernetes semantics including Kubernetes running itself. This inception approach to running the cluster is irresistible within the community because the goal of the community is to stop having to worry about infrastructure. [Brian – call me if you want to a do podcast of the counter point to self-hosted].

Infrastructure is hard and complex. There’s good reason to limit how many people have to deal with that, but someone still has to deal with it.

I’m a big fan of container workloads generally and Kubernetes specifically as a way to help isolate application developers from infrastructure; consequently, it’s not designed to handle the messy infrastructure requirements that make Cluster Ops a challenge. This is a good thing because complexity explodes when platforms expose infrastructure details.

For Kubernetes and similar, I believe that injecting too much infrastructure mess undermines the simplicity of the platform.

There’s a different type of platform needed for infrastructure aware cluster operations where automation needs to address complexity via composability. That’s what RackN is building with open Digital Rebar: a the hybrid management layer that can consistently automate around infrastructure variation.

If you want to work with us to create system focused, infrastructure agnostic automation then take a look at the work we’ve been doing on underlay and cluster operations.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Tag Archives: cluster ops

The mess and success of building open leadership (notes from Kubernetes Leadership Summit)

Cloudcast.net gem about Cluster Ops Gap