OpenStack’s Big Pivot: our suggestion to drop everything and focus on being a Kubernetes VM management workload

Posted on May 31, 2017 by Rob H

TL;DR: Sometimes paradigm changes demand a rapid response and I believe unifying OpenStack services under Kubernetes has become an such an urgent priority that we must freeze all other work until this effort has been completed.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive

By design, OpenStack chose to be unopinionated about operations.

pexels-photo-422290 That made sense for a multi-vendor project that was deeply integrated with the physical infrastructure and virtualization technologies. The cost of that decision has been high for everyone because we did not converge to shared practices that would drive ease of operations, upgrade or tuning. We ended up with waves of vendors vying to have the the fastest, simplest and openest version.

Tragically, install became an area of competition instead an area of collaboration.

Containers and microservice architecture (as required for Kubernetes and other container schedulers) is providing an opportunity to correct this course. The community is already moving towards containerized services with significant interest in using Kubernetes as the underlay manager for those services. I’ve laid out the arguments for and challenges ahead of this approach in other places.

These technical challenges involve tuning the services for cloud native configuration and immutable designs. They include making sure the project configurations can be injected into containers securely and the infra-service communication can handle container life-cycles. Adjacent concerns like networking and storage also have to be considered. These are all solvable problems that can be more quickly resolved if the community acts together to target just one open underlay.

The critical fact is that the changes are manageable and unifying the solution makes the project stronger.

Using Kubernetes for OpenStack service management does not eliminate or even solve the challenges of deep integration. OpenStack already has abstractions that manage vendor heterogeneity and those abstractions are a key value for the project. Kubernetes solves a different problem: it manages the application services that run OpenStack with a proven, understood pattern. By adopting this pattern fully, we finally give operators consistent, shared and open upgrade, availability and management tooling.

Having a shared, open operational model would help drive OpenStack faster.

There is a risk to this approach: driving Kubernetes as the underlay for OpenStack will force OpenStack services into a more narrow scope as an infrastructure service (aka IaaS). This is a good thing in my opinion. We need multiple abstractions when we build effective IT systems.

The idea that we can build a universal single abstraction for all uses is a dangerous distraction; instead; we need to build platform layers collaborativity.

While initially resisting, I have become enthusiatic about this approach. RackN has been working hard on the upgradable & highly available Kubernetes on Metal prerequisite. We’ve also created prototypes of the fully integrated stack. We believe strongly that this work should be done as a community effort and not within a distro.

My call for a Kubernetes underlay pivot embraces that collaborative approach. If we can keep these platforms focused on their core value then we can build bridges between what we have and our next innovation. What do you think? Is this a good approach? Contact us if you’d like to work together on making this happen.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive to SREs?

May 19 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on May 19, 2017 by Rob H

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Kargo Ansible Playbooks foster Collaborative Kubernetes Ops
http://blog.kubernetes.io/2017/05/kargo-ansible-collaborative-kubernetes-ops.html

Making Kubernetes operationally strong is a widely held priority and I track many deployment efforts around the project. The incubated Kargo project is of particular interest for me because it uses the popular Ansible toolset to build robust, upgradable clusters on both cloud and physical targets. I believe using tools familiar to operators grows our community.

We’re excited to see the breadth of platforms enabled by Kargo and how well it handles a wide range of options like integrating Ceph for StatefulSet persistence and Helm for easier application uploads. Those additions have allowed us to fully integrate the OpenStack Helm charts (demo video). READ MORE
___________

Cybercrime for Profit? Five reasons why we need to start driving much more dynamic IT Operations
https://rackn.com/2017/05/16/cybercrime-for-profit-five-reasons-why-we-need-to-starting-driving-much-more-dynamic-it-operations/
pexels-photo-169617

There’s a frustrating cyberattack driven security awareness cycle in IT Operations. Exploits and vulnerabilities are neither new nor unexpected; however, there is a new element taking shape that should raise additional alarm.

Cyberattacks are increasingly profit generating and automated. READ MORE
_____________

Building the SRE Culture at LinkedIn
https://engineering.linkedin.com/blog/2017/05/building-the-sre-culture-at-linkedin

Being a Site Reliability Engineer (SRE) means having to talk about hard problems. Site outages, complex failure scenarios, and other technical emergencies are the things we have to be prepared to deal with every day. When we’re not dealing with problems, we’re discussing them. We regularly perform post-mortems and root cause analyses, and we generally dig into complex technical problems in an unflinching way. READ MORE
_____________
Virtual Panel: OpenStack Summit Boston 2017 Debriefing

_____________

SRE vs. DevOps — a False Distinction?
https://devops.com/sre-vs-devops-false-distinction/

Just a few days before he died at the beginning of the 1990s, a wise man taught us that “the show must go on.” Freddie Mercury’s parting words have long provided the guiding light for many, if not all, ops teams. In their eyes, the production environment should be exposed to minimum risk, even at the expense of new features and problem resolution.

About 10 years ago, Google decided to change its approach to production management. It took the company only a few years to realize that while R&D focused on creating new features and pushing them to production, the Operations group was trying to keep production as stable as possible—the two teams were pulling in opposite directions. This tension arose due to the groups’ different backgrounds, skill sets, incentives and metrics by which they were measured. READ MORE
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

Gluecon : May 24 – 25, 2017 in Denver, CO

Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #72

May 12 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on May 12, 2017 by Rob H

SRE Items of the Week

RobatOpenStack

OpenStack on Kubernetes: Will it blend? (OpenStack Summit Session) w/ Rob Hirschfeld

OpenStack on Kubernetes (BOS Summit / May 2017 update) from rhirschfeld

OpenStack and Kubernetes: Combining the Best of Both Worlds (OpenStack Summit Session) w/ Rob Hirschfeld

OpenStack Summit Boston Day 1 Notes by Rob Hirschfeld
https://robhirschfeld.com/2017/05/09/openstack-boston-day-1-notes/

Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday.

In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community. READ MORE

_______
Immutable Infrastructure Webinar

Attendees:

Greg Althaus, Co-Founder and CTO, RackN
Erica Windisch, Founder and CEO, Piston
Christopher MacGown, Advisor, IOpipe
Riyaz Faizullabhoy, Security Engineer, Docker
Sheng Liang, Founder and CEO Rancher Labs
Moderated by Stephen Spector, HPE, Cloud Evangelist

_______
SREies Part1: Configuration Management by Krishelle Hardson-Hurley

SREies is a series on topics related to my job as a Site Reliability Engineer (SRE). About a month ago, I wrote an article about what it means to be an SRE which included a compatibility quiz and resource list to those who were intrigued by the role. If you are unfamiliar with SRE, I would suggest starting there before moving on.

In this series, I will extend my description to include more specific summaries of concepts that I have learned during my first six months at Dropbox. In this edition, I will be discussing Configuration Management. READ MORE

UPCOMING EVENTS

Interop ITX : May 15 – 19, 2017 in Las Vegas, NV

Open Source IT Summit – Tuesday, May 16, 9:00 – 5:00pm : Rob Hirschfeld to speak

Gluecon : May 24 – 25, 2017 in Denver, CO

Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #71

OpenStack Boston Day 1 Notes

Posted on May 9, 2017 by Rob H

Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday.

In my 2011 Boston Summit shirt.

In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community.

Continued Growth in Core Areas

Practical reasons for running dedicated infrastructure (compliance, control and cost) make OpenStack relevant for companies and governments with significant budgets. There is also a healthy shared infrastructure (aka public cloud) market living in the shadow of the big 3 players. It’s still unclear how this ecosystem will make money for the vendors.

What do customers buy? Should the Core be free?

My personal experience is that most customers are reluctant to (but grudgingly do) buy distros for the core open technology. They are much more willing to pay for adjacencies like security, storage and networking.

Emerging Challenges from Adjacent Technologies

Containers and Kubernetes are making a significant impact on the OpenStack community. At points, the OpenStack keynote was more about Kubernetes than OpenStack. It’s also clear that customers want to use containers as an abstraction layer to make infrastructure less visible or locked-in. That opens the market for using servers directly (bare metal) or other clouds. That portability is likely to help OpenStack more than hurt it because customers can exit workloads from the Big 3 players.

Friction for adoption remains a critical hurdle.

Containers, which are cloud first platforms, have much less friction than IaaS platforms. IaaS platforms, even managed ones, require physical infrastructure with the matching complexity and investment.

OpenStack: an open infrastructure software community

Overall, the summit remains an amazing community space for open infrastructure software and cloud alternatives to the Big 3 players. The Foundation’s pivot to embrace Kubernetes and foster several other open technologies helps maintain the central enthusiasm for open source infrastructure that gave birth to the platform in the first place.

A healthy pragmatic vibe

The summit may not have the same heady taking-on-the-world feeling as the early days; instead, it has a healthy pragmatic vibe. Considering how frothy this space remains, that may be a welcome relief.

What are your impressions? I’m looking forward to hearing from you!

Cloud Native PHYSICAL PROVISIONING? Come on! Really?!

Posted on May 4, 2017 by Rob H

We believe Cloud Native development disciplines are essential regardless of the infrastructure.

Today, RackN announce very low entry level support for Digital Rebar Provisioning – the RESTful Cobbler PXE/DHCP replacement. Having a company actually standing behind this core data center function with support is a big deal; however…

We’re making two BIG claims with Provision: breaking DevOps bottlenecks and cloud native physical provisioning. We think both points are critical to SRE and Ops success because our current approaches are not keeping pace with developer productivity and hardware complexity.

I’m going to post more about Provision can help address the political struggles of SRE and DevOps that I’ve been watching in our industry. A hint is in the release, but the Cloud Native comment needs to be addressed.

First, Cloud Native is an architecture, not an infrastructure statement.

There is no requirement that we use VMs or AWS in Cloud Native. From that perspective, “Cloud” is a useful but deceptive adjective. Cloud Native is born from applications that had to succeed in hands-off, lower SLA infrastructure with fast delivery cycles on untrusted systems. These are very hostile environments compared to “legacy” IT.

What makes Digital Rebar Provision Cloud Native? A lot!

The following is a list of key attributes I consider essential for Cloud Native design.

Micro-services Enabled: The larger Digital Rebar project is a micro-services design. Provision reflects a stand-alone bundling of two services: DHCP and Provision. The new Provision service is designed to both stand alone (with embedded UX) and be part of a larger system.

Swagger RESTful API: We designed the APIs first based on years of experience. We spent a lot of time making sure that the API conformed to spec and that includes maintaining the Swagger spec so integration is easy.

Remote CLI: We build and test our CLI extensively. In fact, we expect that to be the primary user interface.

Security Designed In: We are serious about security even in challenging environments like PXE where options are limited by 20 year old protocols. HTTPS is required and user or bearer token authentication is required. That means that even API calls from machines can be secured.

12 Factor & API Config: There is no file configuration for Provision. The system starts with command line flags or environment variables. Deeper configuration is done via API/CLI. That ensures that the system can be fully managed by remote and configured securely becausee credentials are required for configuration.

Fast Start / Golang: Provision is a totally self-contained golang app including the UX. Even so, it’s very small. You can run it on a laptop from nothing in about 2 minutes including download.

CI/CD Coverage: We committed to deep test coverage for Provision and have consistently increased coverage with every commit. It ensures quality and prevents regressions.

Documentation In-project Auto-generated: On-boarding is important since we’re talking about small, API-driven units. A lot of Provisioning documentation is generated directly from the code into the actual project documentation. Also, the written documentation is in Restructured Text in the project with good indexes and cross-references. We regenerate the documentation with every commit.

We believe these development disciplines are essential regardless of the infrastructure. That’s why we made sure the v3 Provision (and ultimately every component of Digital Rebar as we iterate to v3) was built to these standards.

What do you think? Is this Cloud Native? What did we miss?

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

Posted on April 24, 2017 by Rob H

CaaPuccino: A frothy mix of containers and platforms.

Check out Krish Subramanian’s (@krishnan) Modern Enterprise podcast (audio here) today for a surprisingly deep and thoughtful discussion about how frothy new technologies are impacting Modern Enterprise IT. Of course, we also take some time to throw some fire bombs at the end. You can use my notes below to jump to your favorite topics.

The key takeaways are that portability is hard and we’re still working out the impact of container architecture.

The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.

If you are not actively making automation that works against multiple infrastructures then you are building technical debt.

Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.

Krish, thanks for the great discussion!

Rob’s Podcast Notes (39 minutes)

2:37: Rob intros about Digital Rebar & RackN

4:50: Why our Kubernetes is JUST UPSTREAM

5:35: Where are we going in 5 years > why Rob believes in Hybrid

Should not be 1 vendor who owns everything
That’s why we work for portability
Public cloud vision: you should stop caring about infrastructure
Coming to an age when infrastructure can be completely automated
Developer rebellion against infrastructure

8:36: Krish believes that Public cloud will be more decentralized

Public cloud should be part of everyone’s IT plan
It should not be the ONLY thig

9:25: Docker helps create portability, what else creates portability? Will there be a standard

Containers are a huge change, but it’s not just packaging
Smaller units of work is important for portability
Container schedulers & PaaS are very opinionated, that’s what creates portability
Deeper into infrastructure loses portability (RackN helps)
Rob predicts that Lambda and Serverless creates portability too

11:38: Are new standards emerging?

Some APIs become dominate and create de facto APIs
Embedded assumptions break portability – that’s what makes automation fragile
Rob explains why we inject configuration to abstract infrastructure
RackN works to inject attributes instead of allowing scripts to assume settings
For example, networking assumptions break portability
Platforms force people to give up configuration in ways that break portability

14:50: Why did Platform as a Service not take off?

Rob defends PaaS – thinks that it has accomplished a lot
Challenge of PaaS is that it’s very restrictive by design
Calls out Andrew Clay Shafer’s “don’t call it a PaaS” position
Containers provide a less restrictive approach with more options.

17:00: What’s the impact on Enterprise? How are developers being impacted?

Service Orientation is a very important thing to consider
Encapsulation from services is very valuable
Companies don’t own all their IT services any more – it’s not monolithic
IT Service Orientation aligns with Business Processes
Rob says the API economy is a big deal
In machine learning, a business’ data may be more valuable than their product

19:30: Services impact?

Service’s have a business imperative
We’re not ready for all the impacts of a service orientation
Challenge is to mix configuration and services
Magic of Digital Rebar is that it can mix orchestration of both

22:00: We are having issues with simple, how are we going to scale up?

Barriers are very low right now

22:30: Will Kubernetes help us solve governance issues?

Kubernetes is doing a go building an ecosystem
Smart to focus on just being Kubernetes
It will be chaotic as the core is worked out

24:00: Do you think Kubernetes is going in the right direction?

Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
Google has the right balance of control
Kubernetes really is not that complex for what it does
Mesos is also good but harder to understand for users
Swarm is simple but harder to extend for an ecosystem
Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
It’s likely to level the field, not create a Google advantage

27:00: How does Kubernetes fit into the Digital Rebar picture?

We think of Kubernetes as a great infrastructure abstraction that creates portability
We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
Digital Rebar can also then use the Kubernetes abstractions?

30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!

There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
Since OpenStack is just an application and Kubernetes is a good way to manage applications
When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
“I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
Physical environment still has to be injected into the OpenStack on Kubernetes environment

35:05 Does OpenStack have a future?

Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
Don’t “square peg a data center round hole” – find the best fit
OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.

LinuxKit and Three Concerns with Physical Provisioning of Immutable Images

Posted on April 21, 2017 by Rob H

DR Provision At Dockercon this week, Docker announced an immutable operating system called LinuxKit which is powered by a Packer-like utility called Moby that RackN CTO, Greg Althaus, explains in the video below.

For additional conference notes, check out Rob Hirschfeld’s Dockercon retro blog post.

Three Concerns with Immutable O/S on Physical

With a mix of excitement and apprehension, the RackN team has been watching physical deployment of immutable operating systems like CoreOS Container Linux and RancherOS. Overall, we like the idea of a small locked (aka immutable) in-memory image for servers; however, the concept does not map perfectly to hardware.

Note: if you want to provision these operating systems in a production way, we can help you!

These operating systems work on a “less is more” approach that strips everything out of the images to make them small and secure.

This is great for cloud-first approaches where VM size has a material impact in cost. It’s particularly matched for container platforms where VMs are constantly being created and destroyed. In these cases, the immutable image is easy to update and saves money.

So, why does that not work as well on physical?

First: HA DHCP?! It’s not as great a map for physical systems where operating system overhead is pretty minimal. The model requires orchestrated rebooting of your hardware. It also means that you need a highly available (HA) PXE Provisioning infrastructure (like we’re building with Digital Rebar).

Second: Configuration. That means that they must rely on having cloud-init injected configuration. In a physical environment, there is no way to create cloud-init like injections without integrating with the kickstart systems (a feature of Digital Rebar Provision). Further, hardware has a lot more configuration options (like hard drives and network interfaces) than VMs. That means that we need a robust and system-by-system way to manage these configurations.

Third: No SSH. Yes another problem with these minimal images is that they are supposed to eliminate SSH. Ideally, their image and configuration provides everything required to run the image without additional administration. Unfortunately, many applications assume post-boot configuration. That means that people often re-enable SSH to use tools like Ansible. If it did not conflict with the very nature of the “do-not configure-the-server” immutable model, I would suggest that SSH is a perfectly reasonable requirement for operators running physical infrastructure.

In Summary, even with those issues, we are excited about the positive impact this immutable approach can have on data center operations.

With tooling like Digital Rebar, it’s possible to manage the issues above. If this appeals to you, let us know!

Hey Dockercon, let’s get Physical!

Posted on April 21, 2017 by Rob H

IMG_20170419_121918 Overall, Dockercon did a good job connecting Docker users with information. In some ways, it was a very “let’s get down to business” conference without the open source collaboration feel of previous events. For enterprise customers and partners, that may be a welcome change.

Unlike past Dockercons, the event did not have major announcements or a lot of non-Docker ecosystem buzz. That said, I miss that the event did not have major announcements or a lot of non-Docker ecosystem buzz.

One item that got me excited was an immutable operating system called LinuxKit which is powered by a Packer-like utility called Moby (ok, I know it does more but that’s still fuzzy to me).

RackN CTO, Greg Althaus, was able to turn around a working LinuxKit Kubernetes demo (VIDEO) overnight. This short video explains Moby & LinuxKit plus uses the new Digital Rebar Provision in an amazing integration.

Want to hear more about immutable operating systems? Check out our post on RackN’s site about three challenges of running things like LinuxKit, CoreOS Container Linux and RancherOS on metal.

Oh, and YES, that was my 15-year-old daughter giving a presentation at Dockercon about workplace diversity. I’ll link the video when they’ve posted them.

https://www.slideshare.net/KateHirschfeld/slideshelf

April 21 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on April 21, 2017 by Rob H

SRE Items of the Week

DigitalRebar Provision deploy Docker’s LinuxKit Kubernetes

_____________

Install Digital Rebar PXE Provision on a Mac OSX System and Test Boot using Virtual Box

_____________

Packet Pushers 333 Automation & Orchestration in Networking
http://packetpushers.net/podcast/podcasts/show-333-orchestration-vs-automation/

While the discussion is all about NETWORK DevOps, they do a good job of decrying WHY current state of system orchestration is so sad – in a word: heterogeneity. It’s not going away because the alternative is lock-in. They also do a good job of describing the difference between automation and orchestration; however, I think there’s a middle tier of resource “scheduling” that better describes OpenStack and Kubernetes.

Around 5:00 minutes into the podcast, they effectively describe the composable design of Digital Rebar and the rationale for the way that we’ve abstracted interfaces for automation. If you guys really do want to cash in by consulting with it (at 10 minutes), just contact Rob H.
_____________

Digital Magazine Launch: Increment On-Call
https://increment.com/on-call/

Increment is dedicated to covering how teams build and operate software systems at scale, one issue at a time. In this, our inaugural issue, we focus on industry best practices around on-call and incident response.
_____________

Need PXW? Try out this Cobbler Replacement
https://robhirschfeld.com/2017/04/11/provision-preview/

INTRO
We wanted to make open basic provisioning API-driven, secure, scalable and fast. So we carved out the Provision & DHCP services as a stand alone unit from the larger open Digital Rebar project. While this Golang service lacks orchestration, this complete service is part of Digital Rebar infrastructure and supports the discovery boot process, templating, security and extensive image library (Linux, ESX, Windows, … ) from the main project.

TL;DR: FIVE MINUTES TO REPLACE COBBLER? YES.

The project APIs and CLIs are complete for all provisioning functions with good Swagger definitions and docs. After all, it’s third generation capability from the Digital Rebar project. The integrated UX is still evolving.
_____________

UPCOMING EVENTS

DevOpsDays Austin : May 4-5, 2017 in Austin TX

CloudNative vs SRE vs DevOps: The Ultimate Server Cage Match
Not Actually a DevOps Talk with Michael Cote (May 4 at 4:50pm)

OpenStack Summit : May 8 – 11, 2017 in Boston, MA

OpenStack and Kubernetes. Combining the best of both worlds – Kubernetes Day

Interop ITX : May 15 – 19, 2017 in Las Vegas, NV

Open Source IT Summit – Tuesday, May 16, 9:00 – 5:00pm : Rob Hirschfeld to speak

Gluecon : May 24 – 25, 2017 in Denver, CO

Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #68

Open Source Collaboration: The Power of No

Posted on April 11, 2017 by Rob H

TL;DR: The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.

It’s a common misconception that open source collaboration means saying YES to all ideas; however, the reality of successful projects is the opposite.

Permissive open source licenses drive a delicate balance for projects. On one hand, projects that adopt permissive licenses should be accepting of contributions to build community and user base. On the other, maintainers need to adopt a narrow focus to ensure project utility and simplicity. If the project’s maintainers are too permissive, the project bloats and wanders without a clear purpose. If they are too restrictive then the project fails to build community.

It is human nature to say yes to all collaborators, but that can frustrate core developers and users.

For that reason, stronger open source projects have a clear, focused, shared vision. Historically, that vision was enforced by a benevolent dictator for life (BDFL); however, recent large projects have used a consensus of project elders to make the task more sustainable. These roles serve a critical need: they say “no” to work that does not align with the project’s mission and vision. The challenge of defining that vision can be a big one, but without a clear vision, it’s impossible for the community to sustain growth because new contributors can dilute the utility of projects. [author’s note: This is especially true of celebrity projects like OpenStack or Kubernetes that attract “shared glory” contributors]

There is tremendous social and commercial pressure driving this vision vs. implementation balance.

The most critical one is the threat of “forking.” Forking is what happens when the code/collaborator base of a project splits into multiple factions and stops working together on a single deliverable. The result is incompatible products with a shared history. While small forks are required to support releases, and foster development; diverging community forks can have unpredictable impacts for a project.

Forks are not always bad: they provide a control mechanism for communities.

The fundamental nature of open source projects that adopt a permissive license is what allows forks to become the primary governance tool. The nature of permissive licenses allows anyone to create a new line of development that’s different than the original line. Forks can allow special interests in a code base to focus on their needs. That could be new features or simply stabilization. Many times, a major release version of a project evolves into forks where both old and newer versions have independent communities because of deployment inertia. It can also allow new leadership or governance without having to directly displace an entrenched “owner”.

But forking is expensive because it makes it harder for communities to collaborate.

To us, the antidote for forking is not simply vision but a strong focus on interoperability. Interoperability (or interop) means ensuring that different implementations remain compatible for users. A simplified example would be having automation that works on one OpenStack cloud also work on all the others without modification. Strong interop creates an ecosystem for a project by making users confident that their downstream efforts will not be disrupted by implementation variance or version changes.

Good Interop relieves the pressure of forking.

Interop can only work when a project defines what is expected behavior and creates tests that enforce those standards. That activity forces project contributors to agree on project priorities and scope. Projects that refuse to define interop expectations end up disrupting their user and collaborator base in frustrating ways that lead to forking (Rob’s commentary on the potential Docker fork of 2016).

Unfortunately, Interop is not a generally a developer priority.

In the end, interoperability is a user feature that competes with other features. Sadly, it is often seen as hurting feature development because new features must work to maintain existing interop standards. For that reason, new contributors may see interop demands as a impediment to forward progress; however, it’s a strong driver for user adoption and growth.

The challenge is that those users are typically more focused on their own implementation and less visible to the project leadership. Vendors have similar disincentives to do work that benefits other vendors in the community. These tensions will undermine the health of communities that do not have strong BDFL or Elders leadership. So, who then provides the adult supervision?

Ultimately, users must demand interop and provide commercial preference for vendors that invest in interop.

Open source has definitely had an enormous impact on the software industry; generally, a change for the better. But, that change comes at a cost – the need for involvement, not just of vendors and individual developers, but, ultimately it demands the participation of consumers/users.

Interop isn’t naturally a vendor priority because it levels the playing field for all vendors; however, vendors do prioritize what their customers want.

Ideally, customer needs translate into new features that have a broad base of consumer interest. Interop ensure that features can be used broadly. Thus interop is an important attribute to consumers not only for vendors, but by the open source communities building the software. This alignment then serves as the foundation upon which (increasingly) that vendor software is based.

Customers should be actively and publicly supportive of interop efforts of projects on which their vendor’s offerings depend. If there isn’t such an initiative in those projects, then they should demand one be started through their vendor partners and in the public forums for the project.

Further, if consumers of an open source project sense that it lacks a strong, focused, vision and is wandering off course, they need to get involved and say so, either directly and/or through their vendor partners.

While open source has changing the IT industry, it also has a cost. The days of using software passively from vendors are past, users need to have a voice and opinion. The need to ensure that their chosen vendors are also supporting the health of the community.

What do you think? Reach out to Rob (@zehicle) and Chris (@christo4ferris) and let us know!

Note: Cross posted on IBM OpenTech site.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Category Archives: Clouds

OpenStack’s Big Pivot: our suggestion to drop everything and focus on being a Kubernetes VM management workload

By design, OpenStack chose to be unopinionated about operations.

May 19 – Weekly Recap of All Things Site Reliability Engineering (SRE)

May 12 – Weekly Recap of All Things Site Reliability Engineering (SRE)

OpenStack Boston Day 1 Notes

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

Rob’s Podcast Notes (39 minutes)

LinuxKit and Three Concerns with Physical Provisioning of Immutable Images

Three Concerns with Immutable O/S on Physical

Hey Dockercon, let’s get Physical!

April 21 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Open Source Collaboration: The Power of No