One Cloud, Many Providers: The OpenStack Interop Challenge

We want OpenStack to work as a universal cloud API but it’s hard!  What’s the problem? 

Clouds DownThis post, written before the Tokyo Summit but not published, talks about how we got here without a common standard and offers some pointers.  At the Austin Summit, I’ve got a talk on hybrid Open Infrastructure Wednesday @ 2:40 where I talk specifically about solutions.  I’ve been working on multi-infrastructure hybrid – that means making ops portable between OpenStack, Google, Amazon, Physical and other options.

The Problem: At a fundamental level, OpenStack has yet to decide if it’s an infrastructure (IaaS) product or a open software movement.

Is there a path to be both? There are many vendors who are eager to sell you their distinct flavor of OpenStack; however, lack of consistency has frustrated users and operators. OpenStack faces internal and external competition if we do not address this fragmentation. Over the next few paragraphs, we’ll explore the path the Foundation has planned to offer users a consistent core product while fostering its innovative community.

How did we get down this path?  Here’s some background how how we got here.

Before we can discuss interoperability (interop), we need to define success for OpenStack because interop is the means, not the end. My mark for success is when OpenStack has created a sustainable market for products that rely on the platform. In business plan speak, we’d call that a serviceable available market (SAM). In practical terms, OpenStack is successful when businesses targets the platform as the first priority when building integrations over cloud behemoths: Amazon and VMware.

The apparent dominance of OpenStack in terms of corporate contribution and brand position does not translate into automatic long term success.  

While apparently united under a single brand, intentional technical diversity in OpenStack has lead to incompatibilities between different public and private implementations. While some of these issues are accidents of miscommunication, others were created by structural choices inherent to the project’s formation. No matter the causes, they frustrate users and limit the network effect of widespread adoption.

Technical diversity was both a business imperative and design objective for OpenStack during formation.

In order to quickly achieve critical mass, the project needed to welcome a diverse and competitive set of corporate sponsors. The commitment to support operating systems, multiple hypervisors, storage platforms and networking providers has been essential to the project’s growth and popularity. Unfortunately, it also creates combinatorial complexity and political headaches.

With all those variables, it’s best to think of interop as a spectrum.  

At the top of that spectrum is basic API compatibility and the boom is fully integrated operation where an application could run site unaware in multiple clouds simultaneously.  Experience shows that basic API compatibility is not sufficient: there are significant behavioral impacts due to implementation details just below the API layer that must also be part of any interop requirement.  Variations like how IPs are assigned and machines are initialized matter to both users and tools.  Any effort to ensure consistency must go beyond simple API checks to validate that these behaviors are consistent.

OpenStack enforces interop using a process known as DefCore which vendors are required to follow in order to use the trademark “OpenStack” in their product name.

The process is test driven – vendors are required to pass a suite of capability tests defined in DefCore Guidelines to get Foundation approval. Guidelines are published on a 6 month cadence and cover only a “core” part of OpenStack that DefCore has defined as the required minimum set. Vendors are encouraged to add and extend beyond that base which then leads for DefCore to expand the core based on seeing widespread adoption.

What is DefCore?  Here’s some background about that too!

By design, DefCore started with a very small set of OpenStack functionality.  So small in fact, that there were critical missing pieces like networking APIs from the initial guideline.  The goal for DefCore is to work through the coabout mmunity process to expand based identified best practices and needed capabilities.  Since OpenStack encourages variation, there will be times when we have to either accept or limit variation.  Like any shared requirements guideline, DefCore becomes a multi-vendor contract between the project and its users.

Can this work?  The reality is that Foundation enforcement of the Brand using DefCore is really a very weak lever. The real power of DefCore comes when customers use it to select vendors.

Your responsibility in this process is to demand compliance from your vendors. OpenStack interoperability efforts will fail if we rely on the Foundation to enforce compliance because it’s simply too late at that point. Think of the healthy multi-vendor WiFi environment: vendors often introduce products on preliminary specifications to get ahead of market. For success, OpenStack vendors also need to be racing to comply with the upcoming guidelines. Only customers can create that type of pressure.

From that perspective, OpenStack will only be as interoperable as the market demands.

That creates a difficult conundrum: without common standards, there’s no market and OpenStack will simply become vertical vendor islands with limited reach. Success requires putting shared interests ahead of product.

That brings us full circle: does OpenStack need to be both a product and a community?  Yes, it clearly does.  

The significant distinction for interop is that we are talking about a focus on the user community voice over the vendor and developer community.  For that, OpenStack needs to focus on product definition to grow users.

I want to thank Egle Sigler and Shamail Tahir for their early review of this post.  Even beyond the specific content, they have helped shape my views on this topics.  Now, I’d like to hear your thoughts about this!  We need to work together to address Interoperability – it’s a community thing.

OpenStack Brief Histories: Austin 2011 and DefCore

These two short items are sidebars for my “One Cloud, Many Providers: The OpenStack Interp Challenge” post.  They provide additional context for the more focused question in the post: “At a fundamental level, OpenStack has yet to decide if it’s an infrastructure product or a open software movement. Is there a path to be both?” 

Background 1: OpenStack, The Early Days

How did we get here?  It’s worth noting that 2011 OpenStack was structured as a heterogenous vendor playground.  At the inaugural OpenStack summit in Austin when the project was just forming around NASA’s Nova and Rackspace’s Swift projects, monolithic cloud stacks were a very real threat.  VMware and Amazon were the de facto standards but closed and proprietary.  The open alternatives, CloudStack (Cloud.com), Eucalyptus and OpenNebula were too tied to single vendors or lacking in scale.  Having a multi-vendor, multi-contributor project without a dictatorial owner was a critical imperative for the community and it continues to be one of the most distinctive OpenStack traits.

Background 2:  DefCore, The Community Interoperability Process

What is DefCore?  The name DefCore is a portmanteau of the committee’s job to “define core” functions of OpenStack.  The official explanation says “DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack.”  Fundamentally, it’s an OpenStack Board committee with membership open to the community.  In very practical terms, DefCore picks which features and implementation details of OpenStack are required by the vendors; consequently, we’ve designed a governance process to ensure transparency and, hopefully, prevent individual vendors from exerting too much influence.

APIs and Implementations collide at OpenStack Interop: The Oracle Zones vs VMs Debate

I strive to stay neutral as OpenStack DefCore co-chair; however, as someone asking for another Board term, it’s important to review my thinking so that you can make an informed voting decision.

DefCore, while always on the edge of controversy, recently became ground zero for the “what is OpenStack” debate [discussion write up]. My preferred small core “it’s an IaaS product” answer is only one side. Others favor “it’s an open cloud community” while another faction champions an “open cloud platform.” I’m struggling to find a way that it can be all together at the same time.

The TL;DR is that, today, OpenStack vendors are required to implement a system that can run Linux guests. This is an example of an implementation over API bias because there’s nothing in the API that drives that specific requirement.

From a pragmatic “get it done” perspective, OpenStack needs to remain implementation driven for now. That means that we care that “OpenStack” clouds run VMs.

While there are pragmatic reasons for this, I think that long term success will require OpenStack to become an API specification. So today’s “right answer” actually undermines the long term community value. This has been a long standing paradox in OpenStack.

Breaking the API to implementation link allows an ecosystem to grow with truly alternate implementations (not just plug-ins). This is a threat to the community “upstream first” mentality.  OpenStack needs to be confident enough in the quality and utility of the shared code base that it can allow competitive implementations. Open communities should not need walls to win but they do need clear API definition.

What is my posture for this specific issue?  It’s complicated.

First, I think that the user and ecosystem expectations are being largely ignored in these discussions. Many of the controversial items here are vendor initiatives, not user needs. Right now, I’ve heard clearly that those expectations are for OpenStack to be an IaaS the runs VMs. OpenStack really needs to focus on delivering a reliably operable VM based IaaS experience. Until that’s solid, the other efforts are vendor noise.

Second, I think that there are serious test gaps that jeopardize the standard. The fundamental premise of DefCore is that we can use the development tests for API and behavior validation. We chose this path instead of creating an independent test suite. We either need to address tests for interop within the current body of tests or discuss splitting the efforts. Both require more investment than we’ve been willing to make.

We have mechanisms in place to collects data from test results and expand the test base.  Instead of creating new rules or guidelines, I think we can work within the current framework.

The simple answer would be to block non-VM implementations; however, I trust that cloud consumers will make good decisions when given sufficient information.  I think we need to fix the tests and accept non-VM clouds if they pass the corrected tests.

For this and other reasons, I want OpenStack vendors to be specific about the configurations that they test and support. We took steps to address this in DefCore last year but pulled back from being specific about requirements.  In this particular case, I believe we should require the official OpenStack vendor to state clear details about their supported implementation.  Customers will continue vote with their wallet about which configuration details are important.

This is a complex issue and we need community input.  That means that we need to hear from you!  Here’s the TC Position and the DefCore Patch.

My OpenStack 2016 Analysis: Continue Core, Stop Confusing Ecosystem, Change Hybrid Approach

Note: I’ve served on the OpenStack Foundation board since its formation.  There I’ve led the “define the core” DefCore efforts.  I’m on the 2016 ballot for another term.

I love using end-of-year posts to reflect (2015, I got 6 of 7!) and try to set direction (OpenStack needed to prioritize).  This year, I wanted to use a simple “Continue, Stop, Change” format that I’ve used for employee reviews in the past.  These three items reflect how I think OpenStack needs to respond to the industry in 206.

Continue: Focus on Core

OpenStack adoption continues around the legacy projects that traditionally define it for most users.  A lot of work and focus is needed around those projects including better representation of user, operator and product interests.

Towards that end, we’ve made amazing progress on DefCore implementation and I’m excited about the discussions that it’s been generating.  It’s driving pragmatic decisions about what is required (running a vm?) and how to verify compliance.  It’s also driving conceptual thinking around OpenStack principles and ecosystem priorities.

DefCore’s focus on using community tests to define OpenStack creates a very concrete and defensible standard.  Ultimately, it comes back to users and operators demanding compliance for the work to remain meaningful.

Overall, To focus on core function, OpenStack needs to empower new groups within the community.  Expanding the role of the Product Group, Operators, and User Committee are key to giving a voice to these constituents.

OpenStack core must transition into a consistent platform or it risks becoming irrelevant.

Stop: Confusing The Ecosystem

I’m concerned about the “big tent” governance change puts OpenStack into conflict with both community vendors and the larger cloud market.  I believe we’re creating an echo chamber of OpenStack on OpenStack focus that forces adjacent efforts (like software defined network, storage and container orchestration) to be either inside or outside the community circle.  While that artificially grows the apparent contributor base, it creates artificial walls between OpenStack and the dominate cloud platforms.

Let me illustrate using my own company, RackN.  We create cross-platform devops orchestration based on an open source project, Digital Rebar.  We consider ourselves to be part of the OpenStack community and have supported deploying the core.  We also provision bare metal and deploy Kubernetes, Docker Swarm and Cloud Foundry.  That has apparent conflicts with big tent Ironic and Magnum projects.  Does that make RackN competitive with OpenStack or not?

It hurts OpenStack when competitive alignment is unclear because vendors, users and operators are uncertain about where to make investments.  In the end, users will choose simpler alternatives.

I believe the Board needs to define the OpenStack ecosystem strategy in a clear and actionable way.  If re-elected, that will be my Board priority for 2016.

Change: Hybrid Approach

My top 2016 prediction (post coming) is that we accept “hybrid IT as the new normal.”  That means that we stop driving towards an IT mono-culture and start working towards tools that embrace heterogeneity.  Along those lines, OpenStack needs to evaluate our relative position and strengths in a hybrid cloud landscape.

Interoperability between OpenStack implementations is important because it reduces friction; however, we need to expand our thinking to ensure interoperability with other platforms.  That does not mean simply cloning the AWS APIs!  It means that we need to consider users and operator needs against a spectrum of private and public infrastructures.

A broader hybrid approach also suggests that duplicating cloud-locked adjacent services (e.g. Cloud Formation vs. Heat) does not address user needs.

I am advocating that OpenStack encourage a cloud-neutral ecosystem, outside of the OpenStack tent, that work across a wide range of platforms.  That leads to user choice and creates a truly open platform. 

And, of course, more Community Discussion!

I want to thank the many people who participated in a heated twitter discussion in advance of this post.  There are many great ideas and counter-points covered in that lengthy dialog.

Do you have an opinion about what to OpenStack should stop, accelerate or change?  I’d love to hear it!

DefCore Update – slowly taming the Interop hydra.

Last month, the OpenStack board charged the DefCore committee to tighten the specification. That means adding more required capabilities to the guidelines and reducing the number of exceptions (“flags”).  Read the official report by Chris Hoge.

Cartography by Dave McAlister is licensed under a. Creative Commons Attribution 4.0 International License.

It turns out interoperability is really, really hard in heterogenous environments because it’s not just about API – implementation choices change behavior.

I see this in both the cloud and physical layers. Since OpenStack is setup as a multi-vendor and multi-implementation (private/public) ecosystem, getting us back to a shared least common denominator is a monumental challenge. I also see a similar legacy in physical ops with OpenCrowbar where each environment is a snowflake and operators constantly reinvent the same tooling instead of sharing expertise.

Lack of commonality means the industry wastes significant effort recreating operational knowledge for marginal return. Increasing interop means reducing variations which, in turn, increases the stakes for vendors seeking differentiation.

We’ve been working on DefCore for years so that we could get to this point. Our first real Guideline, 2015.03, was an intentionally low bar with nearly half of the expected tests flagged as non-required. While the latest guidelines do not add new capabilities, they substantially reduce the number of exceptions granted. Further, we are in process of adding networking capabilities for the planned 2016.01 guideline (ready for community review at the Tokyo summit).

Even though these changes take a long time to become fully required for vendors, we can start testing interoperability of clouds using them immediately.

While, the DefCore guidelines via Foundation licensing policy does have teeth, vendors can take up to three years [1] to comply. That may sounds slow, but the real authority of the program comes from customer and vendor participation not enforcement [2].

For that reason, I’m proud that DefCore has become a truly diverse and broad initiative.

I’m further delighted by the leadership demonstrated by Egle Sigler, my co-chair, and Chris Hoge, the Foundation staff leading DefCore implementation.  Happily, their enthusiasm is also shared by many other people with long term DefCore investments including mid-cycle attendees Mark Volker (VMware), Catherine Deip (IBM) who is also a RefStack PTL, Shamail Tahir (EMC), Carol Barrett (Intel), Rocky Grober (Huawei), Van Lindberg (Rackspace), Mark Atwood (HP), Todd Moore (IBM), Vince Brunssen (IBM). We also had four DefCore related project PTLs join our mid-cycle: Kyle Mestery (Neutron), Nikhil Komawar (Glance),  John Dickinson (Swift), and Matthew Treinish (Tempest).

Thank you all for helping keep DefCore rolling and working together to tame the interoperability hydra!

[1] On the current schedule – changes will now take 1 year to become required – vendors have a three year tail! Three years? Since the last two Guideline are active, the fastest networking capabilities will be a required option is after 2016.01 is superseded in January 2017. Vendors who (re)license just before that can use the mark for 12 months (until January 2018!)

[2] How can we make this faster? Simple, consumers need to demand that their vendor pass the latest guidelines. DefCore provides Guidelines, but consumers checkbooks are the real power in the ecosystem.

OpenStack Vancouver six observations: partners, metal, tents, defore, brands & breakage

As always, OpenStack conferences/summits are packed with talks and discussions.  Any one of these six points could be a full post; however, I would rather post now and start discussions.  Let me know what you think!

1. Partnering Everywhere – it’s froth, not milk

Everyone is partnering with everyone! It’s a good way to appear to cover more around and appear more open. Right now, I believe these partnerships are for show and very shallow. There will be blood when money is flowing and both partners want the lion’s share.

2. Metal is Hot! attention on Ironic & MaaS

Metal is very hot topic. No surprise, but I do not think that either MaaS or Ironic have the right architecture to deal with the real complexity of automating metal in a generalized way. The consequence is that they are limited and hard to operate.

Container talks were also very hot and I believe are ultimately disruptive.  The very fact that all the container talks were overflowing is an indication of the challenges facing virtualization.

3. DefCore – Just in the Nick of Time

I think that the press and analysts were ready to proclaim that OpenStack was fragmenting and being unable to deliver the “one cloud, multiple vendors” vision. DefCore (presented as Interopability by Jonathan Bryce, DefCore shout out!) came in on the buzzer to buy us more time.

4. Big Tent Concerns – what is ecosystem & release?

Big Tent is shorthand for project governance changes that make it easier for new projects to become OpenStack projects and removes the concept of integrated releases.  The exact definition is still a work in progress.

The top concerns I have are:

  1. We cannot tell difference between community & ecosystem. We’re back to anointed projects because we’re now telling projects they have to join OpenStack to work with OpenStack.
  2. We’re changing the definition of the release but have not defined how it will change. I acknowledge that continuous release is ideal but we’re confusing people again.

5. Brands are battling – will they destroy the city?

OpenStack is hard for startups – read the full post here.  The short version is that big companies are taking up all the air.

While some are leading, others they are learning how to collaborate.  Those new to open source are slow to trust and uncertain about where to invest.  Unfortunately, we’ve created a visible contributions economy that does not reward doing the scut work so it’s no surprise that there are concerns that some of the bigger companies are free riding.

6. OpenStack is broken talks – could we reboot?  no.

It’s a sign of OpenStack’s age that Bias, Termie and others suggested we need clean slate.  Frankly, I think that OpenStack would be irrelevant by the time a rewrite was completed and it not helpful to suggest it.

What would I suggest?  I’d promote a strong core (doing!), ensure big companies collaborate on roadmap (doing!) and stop having a single node install as gate and dev reference (I’d happily help use OCB for this with partners)

PS: Apparently Neutron is not broken.

I’m very excited about the “just give me a network” work to make Neutron duplicate Nova-Net functionality.  Finally.

OpenSource.com Interview on DefCore, project management, and the future of OpenStack

Reposted from My Interview with RedHat’s OpenSource.com Jason Baker

Rob Hirschfeld has been involved with OpenStack since before the project was even officially formed, and so he brings a rich perspective as to the project’s history, its organization, and where it may be headed next. Recently, he has focused primarily on the physical infrastructure automation space, working with an an enterprise version of OpenCrowbar, an “API-driven metal” project which started as an OpenStack installer and moved to a generic workload underlay.

Rob is speaking on two panels at the upcoming OpenStack Summit in Vancouver, including DefCore 2015 and the State of OpenStack Project Management. We caught up with Rob to get updates about these two topics and what else lies ahead for OpenStack.

We asked you to help walk us through DefCore as it was being developed last year; just as a reminder, what is DefCore and why should people care about it?

DefCore creates a minimal definition for OpenStack vendors to help ensure interoperability and stability for the user community. While DefCore definitions apply only to vendors asking to use the OpenStack trademark, there are technical impacts on the tests and APIs that we select as required. We’ve worked hard to make sure the that selection process for picking “core” is transparent and fair.

What did the changes approved by the OpenStack Foundation membership earlier this year mean for DefCore?

The by-laws changes approved by the community were important to allow us to use DefCore more granular definition of Core. The previous by-laws were much more project focused. The changes allow us to select specific APIs and code components from a project as required instead of picking everything blindly. That allows projects to have both stable and new innovative components.

What can we expect from OpenStack’s structure and organization as we move forward towards the next release?

There are a lot of changes still to come. The technical leadership is making it easier to become part of the OpenStack code base. I’ve written about this change having potentially both positive and negative impacts on OpenStack to make it appear more like a suite of projects than a tightly integrate product. In many ways, DefCore helps vendors define OpenStack as a product as the community is expanding to include more capabilities. In my discussions, this is a good balance.

Switching gears a bit, you’ve also been heavily involved in the OpenStack project management working group. How has that group been progressing since they convened at the Paris Summit?

This group has made a lot of progress. We’ve seen non-board leadership step in and lead the group. That leadership is more organic and based in the companies that are directly contributing. I think that’s resulted in a lot of good ideas and documentation from the group. We’ll see some excellent results in Vancouver from them. It’s going to come back to the community and technical leadership to leverage that work. I think that’s the real test: we have to share ownership of direction between multiple perspectives. The first step in doing that is writing it down (which is what they have been doing).

Aside from the organization, let’s talk about the software itself. What are you hoping to see from the Liberty release?

I’m hoping to see adoption of Neutron accelerate. Having two network approaches makes it impossible to really have an interoperability story. That means Neutron has to be working technically, but also for operators and users. To be brutally honest, it also has to overcome its own reputation. If Neutron does not become the dominate choice, we are going to effectively have two major flavors of OpenStack. From the DefCore, vendor, or user perspective, that’s a very challenging position.

Anything else you’d like to add?

We’ve accomplished a lot together. In some ways, chasing too many targets is our biggest threat. I think that container workloads and orchestration are already being very disruptive for OpenStack. I’m hoping that we focus on delivering a stable core infrastructure. That’s why I’ve been working so hard on DefCore. Looking forward, there’s an increasing risk of trying to chase too many targets and losing the core of what users want.

This article is part of the Speaker Interview Series for OpenStack Summit Vancouver, a five-day conference for developers, users, and administrators of OpenStack Cloud Software.