Open Source Collaboration: The Power of No

TL;DR: The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.

nullIt’s a common misconception that open source collaboration means saying YES to all ideas; however, the reality of successful projects is the opposite.

Permissive open source licenses drive a delicate balance for projects. On one hand, projects that adopt permissive licenses should be accepting of contributions to build community and user base. On the other, maintainers need to adopt a narrow focus to ensure project utility and simplicity. If the project’s maintainers are too permissive, the project bloats and wanders without a clear purpose. If they are too restrictive then the project fails to build community.

It is human nature to say yes to all collaborators, but that can frustrate core developers and users.

For that reason, stronger open source projects have a clear, focused, shared vision. Historically, that vision was enforced by a benevolent dictator for life (BDFL); however, recent large projects have used a consensus of project elders to make the task more sustainable. These roles serve a critical need: they say “no” to work that does not align with the project’s mission and vision. The challenge of defining that vision can be a big one, but without a clear vision, it’s impossible for the community to sustain growth because new contributors can dilute the utility of projects. [author’s note: This is especially true of celebrity projects like OpenStack or Kubernetes that attract “shared glory” contributors]

There is tremendous social and commercial pressure driving this vision vs. implementation balance.

The most critical one is the threat of “forking.” Forking is what happens when the code/collaborator base of a project splits into multiple factions and stops working together on a single deliverable. The result is incompatible products with a shared history. While small forks are required to support releases, and foster development; diverging community forks can have unpredictable impacts for a project.

Forks are not always bad: they provide a control mechanism for communities.

The fundamental nature of open source projects that adopt a permissive license is what allows forks to become the primary governance tool. The nature of permissive licenses allows anyone to create a new line of development that’s different than the original line. Forks can allow special interests in a code base to focus on their needs. That could be new features or simply stabilization. Many times, a major release version of a project evolves into forks where both old and newer versions have independent communities because of deployment inertia. It can also allow new leadership or governance without having to directly displace an entrenched “owner”.

But forking is expensive because it makes it harder for communities to collaborate.

To us, the antidote for forking is not simply vision but a strong focus on interoperability. Interoperability (or interop) means ensuring that different implementations remain compatible for users. A simplified example would be having automation that works on one OpenStack cloud also work on all the others without modification. Strong interop creates an ecosystem for a project by making users confident that their downstream efforts will not be disrupted by implementation variance or version changes.

Good Interop relieves the pressure of forking.

Interop can only work when a project defines what is expected behavior and creates tests that enforce those standards. That activity forces project contributors to agree on project priorities and scope. Projects that refuse to define interop expectations end up disrupting their user and collaborator base in frustrating ways that lead to forking (Rob’s commentary on the potential Docker fork of 2016).

nullUnfortunately, Interop is not a generally a developer priority.

In the end, interoperability is a user feature that competes with other features. Sadly, it is often seen as hurting feature development because new features must work to maintain existing interop standards. For that reason, new contributors may see interop demands as a impediment to forward progress; however, it’s a strong driver for user adoption and growth.

The challenge is that those users are typically more focused on their own implementation and less visible to the project leadership. Vendors have similar disincentives to do work that benefits other vendors in the community. These tensions will undermine the health of communities that do not have strong BDFL or Elders leadership. So, who then provides the adult supervision?

Ultimately, users must demand interop and provide commercial preference for vendors that invest in interop.

Open source has definitely had an enormous impact on the software industry; generally, a change for the better. But, that change comes at a cost – the need for involvement, not just of vendors and individual developers, but, ultimately it demands the participation of consumers/users.

Interop isn’t naturally a vendor priority because it levels the playing field for all vendors; however, vendors do prioritize what their customers want.

Ideally, customer needs translate into new features that have a broad base of consumer interest. Interop ensure that features can be used broadly. Thus interop is an important attribute to consumers not only for vendors, but by the open source communities building the software. This alignment then serves as the foundation upon which (increasingly) that vendor software is based.

Customers should be actively and publicly supportive of interop efforts of projects on which their vendor’s offerings depend. If there isn’t such an initiative in those projects, then they should demand one be started through their vendor partners and in the public forums for the project.

Further, if consumers of an open source project sense that it lacks a strong, focused, vision and is wandering off course, they need to get involved and say so, either directly and/or through their vendor partners.

While open source has changing the IT industry, it also has a cost. The days of using software passively from vendors are past, users need to have a voice and opinion. The need to ensure that their chosen vendors are also supporting the health of the community.

What do you think? Reach out to Rob (@zehicle) and Chris (@christo4ferris) and let us know!

Note: Cross posted on IBM OpenTech site.

Open Source as Reality TV and Burning Data Centers [gcOnDemand podcast notes]

During the OpenStack summit, Eric Wright (@discoposse) and I talked about a wide range of topics from scoring success of OpenStack early goals to burning down traditional data centers.

Why burn down your data center (and move to public cloud)? Because your ops process are too hard to change. Rob talks about how hybrid provides a path if we can made ops more composable.

Here are my notes from the audio podcast (source):

1:30 Why “zehicle” as a handle? Portmanteau from electrics cars… zero + vehicle

Let’s talk about OpenStack & Cloud…

  • OpenStack History
    • 2:15 Rob’s OpenStack history from Dell and Hyperscale
    • 3:20 Early thoughts of a Cloud API that could be reused
    • 3:40 The practical danger of Vendor lock-in
    • 4:30 How we implemented “no main corporate owner” by choice
  • About the Open in OpenStack
    • 5:20 Rob decomposes what “open” means because there are multiple meanings
    • 6:10 Price of having all open tools for “always open” choice and process
    • 7:10 Observation that OpenStack values having open over delivering product
    • 8:15 Community is great but a trade off. We prioritize it over implementation.
  • Q: 9:10 What if we started later? Would Docker make an impact?
    • Part of challenge for OpenStack was teaching vendors & corporate consumers “how to open source”
  • Q: 10:40 Did we accomplish what we wanted from the first summit?
    • Mixed results – some things we exceeded (like growing community) while some are behind (product adoption & interoperability).
  • 13:30 Interop, Refstack and Defcore Challenges. Rob is disappointed on interop based on implementations.
  • Q: 15:00 Who completes with OpenStack?
    • There are real alternatives. APIs do not matter as much as we thought.
    • 15:50 OpenStack vendor support is powerful
  • Q: 16:20 What makes OpenStack successful?
    • Big tent confuses the ecosystem & push the goal posts out
    • “Big community” is not a good definition of success for the project.
  • 18:10 Reality TV of open source – people like watching train wrecks
  • 18:45 Hybrid is the reality for IT users
  • 20:10 We have a need to define core and focus on composability. Rob has been focused on the link between hybrid and composability.
  • 22:10 Rob’s preference is that OpenStack would be smaller. Big tent is really ecosystem projects and we want that ecosystem to be multi-cloud.

Now, about RackN, bare metal, Crowbar and Digital Rebar….

  • 23:30 (re)Intro
  • 24:30 VC market is not metal friendly even though everything runs on metal!
  • 25:00 Lack of consistency translates into lack of shared ops
  • 25:30 Crowbar was an MVP – the key is to understand what we learned from it
  • 26:00 Digital Rebar started with composability and focus on operations
  • 27:00 What is hybrid now? Not just private to public.
  • 30:00 How do we make infrastructure not matter? Multi-dimensional hybrid.
  • 31:00 Digital Rebar is orchestration for composable infrastructure.
  • Q: 31:40 Do people get it?
    • Yes. Automation is moving to hybrid devops – “ops is ops” and it should not matter if it’s cloud or metal.
  • 32:15 “I don’t want to burn down my data center” – can you bring cloud ops to my private data center?

my 8 steps that would improve OpenStack Interop w/ AWS

I’ve been talking with a lot of OpenStack people about frustrating my attempted hybrid work on seven OpenStack clouds [OpenStack Session Wed 2:40].  This post documents the behavior Digital Rebar expects from the multiple clouds that we have integrated with so far.  At RackN, we use this pattern for both cloud and physical automation.

Sunday, I found myself back in front of the the Board talking about the challenge that implementation variation creates for users.  Ultimately, the question “does this harm users?” is answered by “no, they just leave for Amazon.”

I can’t stress this enough: it’s not about APIs!  The challenge is twofold: implementation variance between OpenStack clouds and variance between OpenStack and AWS.

The obvious and simplest answer is that OpenStack implementers need to conform more closely to AWS patterns (once again, NOT the APIs).

Here are the eight Digital Rebar node allocation steps [and my notes about general availability on OpenStack clouds]:

  1. Add node specific SSH key [YES]
  2. Get Metadata on Networks, Flavors and Images [YES]
  3. Pick correct network, flavors and images [NO, each site is distinct]
  4. Request node [YES]
  5. Get node PUBLIC address for node [NO, most OpenStack clouds do not have external access by default]
  6. Login into system using node SSH key [PARTIAL, the account name varies]
  7. Add root account with Rebar SSH key(s) and remove password login [PARTIAL, does not work on some systems]
  8. Remove node specific SSH key [YES]

These steps work on every other cloud infrastructure that we’ve used.  And they are achievable on OpenStack – DreamHost delivered this experience on their new DreamCompute infrastructure.

I think that this is very achievable for OpenStack, but we’re doing to have to drive conformance and figure out an alternative to the Floating IP (FIP) pattern (IPv6, port forwarding, or adding FIPs by default) would all work as part of the solution.

For Digital Rebar, the quick answer is to simply allocate a FIP for every node.  We can easily make this a configuration option; however, it feels like a pattern fail to me.  It’s certainly not a requirement from other clouds.

I hope this post provides specifics about delivering a more portable hybrid experience.  What critical items do you want as part of your cloud ops process?

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.

 

At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!

 

One Cloud, Many Providers: The OpenStack Interop Challenge

We want OpenStack to work as a universal cloud API but it’s hard!  What’s the problem? 

Clouds DownThis post, written before the Tokyo Summit but not published, talks about how we got here without a common standard and offers some pointers.  At the Austin Summit, I’ve got a talk on hybrid Open Infrastructure Wednesday @ 2:40 where I talk specifically about solutions.  I’ve been working on multi-infrastructure hybrid – that means making ops portable between OpenStack, Google, Amazon, Physical and other options.

The Problem: At a fundamental level, OpenStack has yet to decide if it’s an infrastructure (IaaS) product or a open software movement.

Is there a path to be both? There are many vendors who are eager to sell you their distinct flavor of OpenStack; however, lack of consistency has frustrated users and operators. OpenStack faces internal and external competition if we do not address this fragmentation. Over the next few paragraphs, we’ll explore the path the Foundation has planned to offer users a consistent core product while fostering its innovative community.

How did we get down this path?  Here’s some background how how we got here.

Before we can discuss interoperability (interop), we need to define success for OpenStack because interop is the means, not the end. My mark for success is when OpenStack has created a sustainable market for products that rely on the platform. In business plan speak, we’d call that a serviceable available market (SAM). In practical terms, OpenStack is successful when businesses targets the platform as the first priority when building integrations over cloud behemoths: Amazon and VMware.

The apparent dominance of OpenStack in terms of corporate contribution and brand position does not translate into automatic long term success.  

While apparently united under a single brand, intentional technical diversity in OpenStack has lead to incompatibilities between different public and private implementations. While some of these issues are accidents of miscommunication, others were created by structural choices inherent to the project’s formation. No matter the causes, they frustrate users and limit the network effect of widespread adoption.

Technical diversity was both a business imperative and design objective for OpenStack during formation.

In order to quickly achieve critical mass, the project needed to welcome a diverse and competitive set of corporate sponsors. The commitment to support operating systems, multiple hypervisors, storage platforms and networking providers has been essential to the project’s growth and popularity. Unfortunately, it also creates combinatorial complexity and political headaches.

With all those variables, it’s best to think of interop as a spectrum.  

At the top of that spectrum is basic API compatibility and the boom is fully integrated operation where an application could run site unaware in multiple clouds simultaneously.  Experience shows that basic API compatibility is not sufficient: there are significant behavioral impacts due to implementation details just below the API layer that must also be part of any interop requirement.  Variations like how IPs are assigned and machines are initialized matter to both users and tools.  Any effort to ensure consistency must go beyond simple API checks to validate that these behaviors are consistent.

OpenStack enforces interop using a process known as DefCore which vendors are required to follow in order to use the trademark “OpenStack” in their product name.

The process is test driven – vendors are required to pass a suite of capability tests defined in DefCore Guidelines to get Foundation approval. Guidelines are published on a 6 month cadence and cover only a “core” part of OpenStack that DefCore has defined as the required minimum set. Vendors are encouraged to add and extend beyond that base which then leads for DefCore to expand the core based on seeing widespread adoption.

What is DefCore?  Here’s some background about that too!

By design, DefCore started with a very small set of OpenStack functionality.  So small in fact, that there were critical missing pieces like networking APIs from the initial guideline.  The goal for DefCore is to work through the coabout mmunity process to expand based identified best practices and needed capabilities.  Since OpenStack encourages variation, there will be times when we have to either accept or limit variation.  Like any shared requirements guideline, DefCore becomes a multi-vendor contract between the project and its users.

Can this work?  The reality is that Foundation enforcement of the Brand using DefCore is really a very weak lever. The real power of DefCore comes when customers use it to select vendors.

Your responsibility in this process is to demand compliance from your vendors. OpenStack interoperability efforts will fail if we rely on the Foundation to enforce compliance because it’s simply too late at that point. Think of the healthy multi-vendor WiFi environment: vendors often introduce products on preliminary specifications to get ahead of market. For success, OpenStack vendors also need to be racing to comply with the upcoming guidelines. Only customers can create that type of pressure.

From that perspective, OpenStack will only be as interoperable as the market demands.

That creates a difficult conundrum: without common standards, there’s no market and OpenStack will simply become vertical vendor islands with limited reach. Success requires putting shared interests ahead of product.

That brings us full circle: does OpenStack need to be both a product and a community?  Yes, it clearly does.  

The significant distinction for interop is that we are talking about a focus on the user community voice over the vendor and developer community.  For that, OpenStack needs to focus on product definition to grow users.

I want to thank Egle Sigler and Shamail Tahir for their early review of this post.  Even beyond the specific content, they have helped shape my views on this topics.  Now, I’d like to hear your thoughts about this!  We need to work together to address Interoperability – it’s a community thing.

OpenStack Brief Histories: Austin 2011 and DefCore

These two short items are sidebars for my “One Cloud, Many Providers: The OpenStack Interp Challenge” post.  They provide additional context for the more focused question in the post: “At a fundamental level, OpenStack has yet to decide if it’s an infrastructure product or a open software movement. Is there a path to be both?” 

Background 1: OpenStack, The Early Days

How did we get here?  It’s worth noting that 2011 OpenStack was structured as a heterogenous vendor playground.  At the inaugural OpenStack summit in Austin when the project was just forming around NASA’s Nova and Rackspace’s Swift projects, monolithic cloud stacks were a very real threat.  VMware and Amazon were the de facto standards but closed and proprietary.  The open alternatives, CloudStack (Cloud.com), Eucalyptus and OpenNebula were too tied to single vendors or lacking in scale.  Having a multi-vendor, multi-contributor project without a dictatorial owner was a critical imperative for the community and it continues to be one of the most distinctive OpenStack traits.

Background 2:  DefCore, The Community Interoperability Process

What is DefCore?  The name DefCore is a portmanteau of the committee’s job to “define core” functions of OpenStack.  The official explanation says “DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack.”  Fundamentally, it’s an OpenStack Board committee with membership open to the community.  In very practical terms, DefCore picks which features and implementation details of OpenStack are required by the vendors; consequently, we’ve designed a governance process to ensure transparency and, hopefully, prevent individual vendors from exerting too much influence.

Post-OpenStack DefCore, I’m Chasing “open infrastructure” via cross-platform Interop

Like my previous DefCore interop windmill tilting, this is not something that can be done alone. Open infrastructure is a collaborative effort and I’m looking for your help and support. I believe solving this problem benefits us as an industry and individually as IT professionals.

2013-09-13_18-56-39_197So, what is open infrastructure?   It’s not about running on open source software. It’s about creating platform choice and control. In my experience, that’s what defines open for users (and developers are not users).

I’ve spent several years helping lead OpenStack interoperability (aka DefCore) efforts to ensure that OpenStack cloud APIs are consistent between vendors. I strongly believe that effort is essential to build an ecosystem around the project; however, in talking to enterprise users, I’ve learned that that their  real  interoperability gap is between that many platforms, AWS, Google, VMware, OpenStack and Metal, that they use everyday.

Instead of focusing inward to one platform, I believe the bigger enterprise need is to address automation across platforms. It is something I’m starting to call hybrid DevOps because it allows users to mix platforms, service APIs and tools.

Open infrastructure in that context is being able to work across platforms without being tied into one platform choice even when that platform is based on open source software. API duplication is not sufficient: the operational characteristics of each platform are different enough that we need a different abstraction approach.

We have to be able to compose automation in a way that tolerates substitution based on infrastructure characteristics. This is required for metal because of variation between hardware vendors and data center networking and services. It is equally essential for cloud because of variation between IaaS capabilities and service delivery models. Basically, those  minor  differences between clouds create significant challenges in interoperability at the operational level.

Rationalizing APIs does little to address these more structural differences.

The problem is compounded because the differences are not nicely segmented behind abstraction layers. If you work to build and sustain a fully integrated application, you must account for site specific needs throughout your application stack including networking, storage, access and security. I’ve described this as all deployments have 80% of the work common but the remaining 20% is mixed in with the 80% instead of being nicely layers. So, ops is cookie dough not vinaigrette.

Getting past this problem for initial provisioning on a single platform is a false victory. The real need is portable and upgrade-ready automation that can be reused and shared. Critically, we also need to build upon the existing foundations instead of requiring a blank slate. There is openness value in heterogeneous infrastructure so we need to embrace variation and design accordingly.

This is the vision the RackN team has been working towards with open source Digital Rebar project. We now able to showcase workload deployments (Docker, Kubernetes, Ceph, etc) on multiple cloud platforms that also translate to full bare metal deployments. Unlike previous generations of this tooling (some will remember Crowbar), we’ve been careful to avoid injecting external dependencies into the DevOps scripts.

While we’re able to demonstrate a high degree of portability (or fidelity) across multiple platforms, this is just the beginning. We are looking for users and collaborators who want to want to build open infrastructure from an operational perspective.

You are invited to join us in making open cross-platform operations a reality.