Open Source Collaboration: The Power of No

TL;DR: The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.

nullIt’s a common misconception that open source collaboration means saying YES to all ideas; however, the reality of successful projects is the opposite.

Permissive open source licenses drive a delicate balance for projects. On one hand, projects that adopt permissive licenses should be accepting of contributions to build community and user base. On the other, maintainers need to adopt a narrow focus to ensure project utility and simplicity. If the project’s maintainers are too permissive, the project bloats and wanders without a clear purpose. If they are too restrictive then the project fails to build community.

It is human nature to say yes to all collaborators, but that can frustrate core developers and users.

For that reason, stronger open source projects have a clear, focused, shared vision. Historically, that vision was enforced by a benevolent dictator for life (BDFL); however, recent large projects have used a consensus of project elders to make the task more sustainable. These roles serve a critical need: they say “no” to work that does not align with the project’s mission and vision. The challenge of defining that vision can be a big one, but without a clear vision, it’s impossible for the community to sustain growth because new contributors can dilute the utility of projects. [author’s note: This is especially true of celebrity projects like OpenStack or Kubernetes that attract “shared glory” contributors]

There is tremendous social and commercial pressure driving this vision vs. implementation balance.

The most critical one is the threat of “forking.” Forking is what happens when the code/collaborator base of a project splits into multiple factions and stops working together on a single deliverable. The result is incompatible products with a shared history. While small forks are required to support releases, and foster development; diverging community forks can have unpredictable impacts for a project.

Forks are not always bad: they provide a control mechanism for communities.

The fundamental nature of open source projects that adopt a permissive license is what allows forks to become the primary governance tool. The nature of permissive licenses allows anyone to create a new line of development that’s different than the original line. Forks can allow special interests in a code base to focus on their needs. That could be new features or simply stabilization. Many times, a major release version of a project evolves into forks where both old and newer versions have independent communities because of deployment inertia. It can also allow new leadership or governance without having to directly displace an entrenched “owner”.

But forking is expensive because it makes it harder for communities to collaborate.

To us, the antidote for forking is not simply vision but a strong focus on interoperability. Interoperability (or interop) means ensuring that different implementations remain compatible for users. A simplified example would be having automation that works on one OpenStack cloud also work on all the others without modification. Strong interop creates an ecosystem for a project by making users confident that their downstream efforts will not be disrupted by implementation variance or version changes.

Good Interop relieves the pressure of forking.

Interop can only work when a project defines what is expected behavior and creates tests that enforce those standards. That activity forces project contributors to agree on project priorities and scope. Projects that refuse to define interop expectations end up disrupting their user and collaborator base in frustrating ways that lead to forking (Rob’s commentary on the potential Docker fork of 2016).

nullUnfortunately, Interop is not a generally a developer priority.

In the end, interoperability is a user feature that competes with other features. Sadly, it is often seen as hurting feature development because new features must work to maintain existing interop standards. For that reason, new contributors may see interop demands as a impediment to forward progress; however, it’s a strong driver for user adoption and growth.

The challenge is that those users are typically more focused on their own implementation and less visible to the project leadership. Vendors have similar disincentives to do work that benefits other vendors in the community. These tensions will undermine the health of communities that do not have strong BDFL or Elders leadership. So, who then provides the adult supervision?

Ultimately, users must demand interop and provide commercial preference for vendors that invest in interop.

Open source has definitely had an enormous impact on the software industry; generally, a change for the better. But, that change comes at a cost – the need for involvement, not just of vendors and individual developers, but, ultimately it demands the participation of consumers/users.

Interop isn’t naturally a vendor priority because it levels the playing field for all vendors; however, vendors do prioritize what their customers want.

Ideally, customer needs translate into new features that have a broad base of consumer interest. Interop ensure that features can be used broadly. Thus interop is an important attribute to consumers not only for vendors, but by the open source communities building the software. This alignment then serves as the foundation upon which (increasingly) that vendor software is based.

Customers should be actively and publicly supportive of interop efforts of projects on which their vendor’s offerings depend. If there isn’t such an initiative in those projects, then they should demand one be started through their vendor partners and in the public forums for the project.

Further, if consumers of an open source project sense that it lacks a strong, focused, vision and is wandering off course, they need to get involved and say so, either directly and/or through their vendor partners.

While open source has changing the IT industry, it also has a cost. The days of using software passively from vendors are past, users need to have a voice and opinion. The need to ensure that their chosen vendors are also supporting the health of the community.

What do you think? Reach out to Rob (@zehicle) and Chris (@christo4ferris) and let us know!

Note: Cross posted on IBM OpenTech site.

10x Faster Today but 10x Harder to Maintain Tomorrow: the Cul-De-Sac problem

I’ve been digging into what it means to be a site reliability engineer (SRE) and thinking about my experience trying to automate infrastructure in a way to scales dramatically better.  I’m not thinking about scale in number of nodes, but in operator efficiency.  The primary way to create that efficiency is limit site customization and to improve reuse.  Those changes need to start before the first install.

As an industry, we must address the “day 2” problem in collaboratively developed open software before users’ first install.

Recently, RackN asked the question “Shouldn’t we have Shared Automation for Commodity Infrastructure?” which talked about fact that we, as an industry, keep writing custom automation for what should be commodity servers.  This “snow flaking” happens because there’s enough variation at the data center system level that it’s very difficult to share and reuse automation on an ongoing basis.

Since variation enables innovation, we need to solve this problem without limiting diversity of choice.

 

Happily, platforms like Kubernetes are designed to hide these infrastructure variations for developers.  That means we can expect a productivity explosion for the huge number of applications that can narrowly target platforms.  Unfortunately, that does nothing for the platforms or infrastructure bound applications.  For this lower level software, we need to accept that operations environments are heterogeneous.

 

I realized that we’re looking at a multidimensional problem after watching communities like OpenStack struggle to evolve operations practice.

It’s multidimensional because we are building the operations practice simultaneously with the software itself.  To make things even harder, the infrastructure and dependencies are also constantly changing.  Since this degree of rapid multi-factor innovation is the new normal, we have to plan that our operations automation itself must be as upgradable.

If we upgrade both the software AND the related deployment automation then each deployment will become a cul-de-sac after day 1.

For open communities, that cul-de-sac challenge limits projects’ ability to feed operational improvements back into the user base and makes it harder for early users to stay current.  These challenges limit the virtuous feedback cycles that help communities grow.  

The solution is to approach shared project deployment automation as also being continuously deployed.

This is a deceptively hard problem.

This is a hard problem because each deployment is unique and those differences make it hard to absorb community advances without being constantly broken.  That is one of the reasons why company opt out of the community and into vendor distributions. While Vendors are critical to the ecosystem, the practice ultimately limits the growth and health of the community.

Our approach at RackN, as reflected in open Digital Rebar, is to create management abstractions that isolate deployment variables based on system level concerns.  Unlike project generated templates, this approach absorbs heterogeneity and brings in the external information that often complicate project deployment automation.  

We believe that this is a general way to solve the broader problem and invite you to participate in helping us solve the Day 2 problems that limit our open communities.

What does it take to Operate Open Platforms? Answers in Datanauts 72

Did I just let OpenStack ops off the hook….?  Kubernetes production challenges…?  

ix34grhy_400x400I had a lot of fun in this Datanauts wide ranging discussion with unicorn herders Chris Wahl and Ethan Banks.  I like the three section format because it gives us a chance to deep dive into distinct topics and includes some out-of-band analysis by the hosts; however, that means you need to keep listening through the commercial breaks to hear the full podcast.

Three parts?  Yes, Chris and Ethan like to save the best questions for last.

In Part 1, we went deep into the industry operational and business challenges uncovered by the OpenStack project. Particularly, Chris and I go into “platform underlay” issues which I laid out in my “please stop the turtles” post. This was part of the build-up to my SRE series.

In Part 2, we explore my operations-focused view of the latest developments in container schedulers with a focus on Kubernetes. Part of the operational discussion goes into architecture “conceits” (or compromises) that allow developers to get the most from cloud native design patterns. I also make a pitch for using proven tools to run the underlay.

In Part 3, we go deep into DevOps automation topics of configuration and orchestration. We talk about the design principles that help drive “day 2” automation and why getting in-place upgrades should be an industry priority.  Of course, we do cover some Digital Rebar design too.

Take a listen and let me know what you think!

On Twitter, we’ve already started a discussion about how much developers should care about infrastructure. My opinion (posted here) is that one DevOps idea where developers “own” infrastructure caused a partial rebellion towards containers.

Why is RackN advancing OpenStack on Kubernetes?

Yesterday, RackN CEO, Rob Hirschfeld, described the remarkable progress in OpenStack on Kubernetes using Helm (article link).  Until now, RackN had not been willing to officially support OpenStack deployments; however, we now believe that this approach is a game changer for OpenStack operators even if they are not actively looking at Kubernetes.

We are looking for companies that want to join in this work and fast-track it into production. If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan. This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Here is Rob’s Demo

Rob’s Original Blog Post

RackN revisits OpenStack deployments with an eye on ongoing operations. I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment because I felt that the technical hurdles of cloud native architecture would prove challenging.

I was wrong: I underestimated how fast these issues could be addressed.

… read the rest at Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar) — Rob Hirschfeld

Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

“Why SRE?” Discussion with Eric @Discoposse Wright

sre-series My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

ericewrightI was a guest on Eric “@discoposse” Wright of the Green Circle Community #42 Podcast (my previous appearance).

LISTEN NOW: Podcast #42 (transcript)

In this action-packed 30 minute conversation, we discuss the industry forces putting pressure on operations teams.  These pressures require operators to be investing much more heavily on reusable automation.

That leads us towards why Kubernetes is interesting and what went wrong with OpenStack (I actually use the phrase “dumpster fire”).  We ultimately talk about how those lessons embedded in Digital Rebar architecture.

(re)OpenStack for 2017 – board voting week starts this Monday

[1/19 Update: I placed 9th in the results (or 6th if you go only by popular vote instead of total votes).  There are 8 seats, so I was not elected.]

The OpenStack Project needs a course correction and I’m asking for your community vote to put me back on the 2017 Board to help drive it.  As a start-up CEO, I’m neutral, yet I also have the right technical, commercial and community influence to make this a reality.

Vote Now!Your support is critical because OpenStack fills a very real need and should have a solid future; however, it needs to adapt to market realities to achieve that.

I want the Board to acknowledge and adapt to stumbles in ecosystem success including being dropped or re-prioritized by key sponsors.  This should include tightening the mission so the project can collaborate more freely with both open and proprietary platforms.  In 2016, I’ve been deeply involved OpenStack alternatives including Kubernetes and hybrid Cloud automation with Amazon and Google.

OpenStack must adjust to being one of several alternatives including AWS, Google and container platforms like Kubernetes.

That means focusing on our IaaS strengths and being unambiguous about core function like SDN and storage integration.   It also means ensuring that commercial members of the ecosystem can both profit and compete.  The Board has both the responsibility and authority to make these changes if the members are willing to act.

What’s my background?  I’ve been an active and vocal member of the OpenStack community since the very beginning of the project especially around Operator and Product Management issues.  I was elected to the board four times and played critical roles including launching the DefCore efforts and pushing for more definition of the Big Tent concept (which I believe has hurt the project).

In a great field of candidates!  Like other years, there are many very strong candidates whom I have worked with in a number of roles.  I always recommend distributing your eight votes to multiple people and limited “affinity voting” for your own company or geography.   While all candidates would serve the board, this year, I’d like to call attention specifically to  Shamail Tahair as a candidate who has invested significant time in helping with Product Management and Enterprise Readiness for OpenStack.

Can we control Hype & Over-Vendoring?

Q: Is over-vendoring when you’ve had to much to drink?
A: Yes, too much Kool Aid.

There’s a lot of information here – skip to the bottom if you want to see my recommendation.

Last week on TheNewStack, I offered eight ways to keep Kubernetes on the right track (abridged list here) and felt that item #6 needed more explanation and some concrete solutions.

  1. DO: Focus on a Tight Core
  2. DO: Build a Diverse Community
  3. DO: Multi-cloud and Hybrid
  4. DO: Be Humble and Honest
  5. AVOID: “The One Ring” Universal Solution Hubris
  6. AVOID: Over-Vendoring (discussed here)
  7. AVOID: Coupling Installers, Brokers and Providers to the core
  8. AVOID: Fast Release Cycles without LTS Releases

kool-aid-manWhat is Over-Vendoring?  It’s when vendors’ drive their companies’ brands ahead of the health of the project.  Generally by driving an aggressive hype cycle where vendors are trying to jump on the hype bandwagon.

Hype can be very dangerous for projects (David Cassel’s TNS article) because it is easy to bypass the user needs and boring scale/stabilization processes to focus on vendor differentiation.  Unfortunately, common use-cases do not drive differentiation and are invisible when it comes to company marketing budgets.  That boring common core has the effect creating tragedy of the commons which undermines collaboration on shared code bases.

The solution is to aggressively keep the project core small so that vendors have specific and limited areas of coopetition.  

A small core means we do not compel collaboration in many areas of project.  This drives competition and diversity that can be confusing.  The temptation to endorse or nominate companion projects is risky due to the hype cycle.  Endorsements can create a bias that actually hurts innovation because early or loud vendors do not generally create the best long term approaches.  I’ve heard this described as “people doing the real work don’t necessarily have time to brag about it.”

Keeping a small core mantra drives a healthy plug-in model where vendors can differentiate.  It also ensures that projects can succeed with a bounded set of core contributors and support infrastructure.  That means that we should not measure success by commits, committers or lines of code because these will drop as projects successfully modularize.  My recommendation for a key success metric is to the ratio of committers to ecosystem members and users.

Tracking improving ratio of core to ecosystem shows that improving efficiency of investment.  That’s a better sign of health than project growth.

It’s important to note that there is also a serious risk of under-vendoring too!  

We must recognize and support vendors in open source communities because they sustain the project via direct contributions and bringing users.  For a healthy ecosystem, we need to ensure that vendors can fairly profit.  That means they must be able to use their brand in combination with the project’s brand.  Apache Project is the anti-pattern because they have very strict “no vendor” trademark marketing guidelines that can strand projects without good corporate support.

I’ve come to believe that it’s important to allow vendors to market open source projects brands; however, they also need to have some limits on how they position the project.

How should this co-branding work?  My thinking is that vendor claims about a project should be managed in a consistent and common way.  Since we’re keeping the project core small, that should help limit the scope of the claims.  Vendors that want to make ecosystem claims should be given clear spaces for marketing their own brand in participation with the project brand.

I don’t pretend that this is easy!  Vendor marketing is planned quarters ahead of when open source projects are ready for them: that’s part of what feeds the hype cycle. That means that projects will be saying no to some free marketing from their ecosystem.  Ideally, we’re saying yes to the right parts at the same time.

Ultimately, hype control means saying no to free marketing.  For an open source project, that’s a hard but essential decision.

 

Will OpenStack Go Supernova? It’s Time to Refocus on Core.

There’s no gentle way to put this but everyone (and I mean everyone) I’ve talked with thinks that this position should be heard.

OpenStack is bleeding off development resources (Networkworld) and that may be a good thing if the community responds by refocusing.

cv4paiiwyaegopu

#AfterStack Crowd

I spent a fantastic week in Barcelona catching-up with many old and new friends at the OpenStack summit. The community continues to grow and welcome new participants. As one of the “project elders,” I was on the hallway track checking-in on both public and private plans around the project.

One trend was common: companies are scaling back or redirecting resources away from the project.  While there are many reasons for this; the negative impact to development and test velocity is very clear.

When a sun goes nova, it blows off excess mass and is left with a dense energetic core. That would be better than going supernova in which the star burns intensely and then dies.

For OpenStack, a similar process would involve clearly redirecting technical efforts to the integrated Core from an increasingly frothy list of “big tent” extensions. This would both help focus resources and improve ecosystem collaboration.  I believe OpenStack is facing a choice between going nova (core focus) and supernova (burning out).

I am highly in favor of a strong and diverse ecosystem around OpenStack as demonstrated by my personal investments in OpenStack Interoperability (aka DefCore). However, when I moved out of the OpenStack echo chamber; I heard clearly that users have a much broader desire for interoperability. They need tools that are both hybrid and multi-cloud because their businesses are not limited to single infrastructures.

The community needs to embrace multi-cloud tools because that is the reality for its users.

Building an OpenStack specific ecosystem (as per “big tent”) undermines an essential need for OpenStack users. Now is the time for OpenStack for course correct to a narrower mission that focuses on the integrated functional platform that is already widely adopted. Now is the time for OpenStack live up to its original name and go “Nova.”

Breaking Up is Hard To Do – Why I Believe Ops Decomposition (pt 1)

Over the summer, the RackN team took a radical step with our previous Ansible Kubernetes workload install: we broke it into pieces.  Why?  We wanted to eliminate all “magic happens here” steps in the deployment.

320px-dominos_fallingThe result, DR Kompos8, is a faster, leaner, transparent and parallelized installation that allows for pluggable extensions and upgrades (video tour). We also chose the operationally simplest configuration choice: Golang binaries managed by SystemDGolang binaries managed by SystemD.

Why decompose and simplify? Let’s talk about our hard earned ops automation battle scars that let to composability as a core value:

Back in the early OpenStack days, when the project was actually much simpler, we were part of a community writing Chef Cookbooks to install it. These scripts are just a sequence of programmable steps (roles in Ops-speak) that drive the configuration of services on each node in the cluster. There is an ability to find cross-cluster information and lookup local inventory so we were able to inject specific details before the process began. However, once the process started, it was pretty much like starting a dominoes chain. If anything went wrong anywhere in the installation, we had to reset all the dominoes and start over.

Like a dominoes train, it is really fun to watch when it works. Also, like dominoes, it is frustrating to set up and fix. Often we literally were holding our breath during installation hoping that we’d anticipated every variation in the software, hardware and environment. It is no surprise that the first and must critical feature we’d created was a redeploy command.

It turned out the the ability to successfully redeploy was the critical measure for success. We would not consider a deployment complete until we could wipe the systems and rebuild it automatically at least twice.

What made cluster construction so hard? There were a three key things: cross-node dependencies (linking), a lack of service configuration (services) and isolating attribute chains (configuration).

We’ll explore these three reasons in detail for part 2 of this post tomorrow.

Even without the details, it easy to understand that we want to avoid all magic in a deployment.

For scale operations, there should never be a “push and prey” step where we are counting on timing or unknown configuration for it to succeed. Likewise, we need to eliminate “it worked from my desktop” automation too.  Those systems are impossible to maintain, share and scale. Composed cluster operations addresses this problem by making work modular, predictable and transparent.