APIs and Implementations collide at OpenStack Interop: The Oracle Zones vs VMs Debate

I strive to stay neutral as OpenStack DefCore co-chair; however, as someone asking for another Board term, it’s important to review my thinking so that you can make an informed voting decision.

DefCore, while always on the edge of controversy, recently became ground zero for the “what is OpenStack” debate [discussion write up]. My preferred small core “it’s an IaaS product” answer is only one side. Others favor “it’s an open cloud community” while another faction champions an “open cloud platform.” I’m struggling to find a way that it can be all together at the same time.

The TL;DR is that, today, OpenStack vendors are required to implement a system that can run Linux guests. This is an example of an implementation over API bias because there’s nothing in the API that drives that specific requirement.

From a pragmatic “get it done” perspective, OpenStack needs to remain implementation driven for now. That means that we care that “OpenStack” clouds run VMs.

While there are pragmatic reasons for this, I think that long term success will require OpenStack to become an API specification. So today’s “right answer” actually undermines the long term community value. This has been a long standing paradox in OpenStack.

Breaking the API to implementation link allows an ecosystem to grow with truly alternate implementations (not just plug-ins). This is a threat to the community “upstream first” mentality.  OpenStack needs to be confident enough in the quality and utility of the shared code base that it can allow competitive implementations. Open communities should not need walls to win but they do need clear API definition.

What is my posture for this specific issue?  It’s complicated.

First, I think that the user and ecosystem expectations are being largely ignored in these discussions. Many of the controversial items here are vendor initiatives, not user needs. Right now, I’ve heard clearly that those expectations are for OpenStack to be an IaaS the runs VMs. OpenStack really needs to focus on delivering a reliably operable VM based IaaS experience. Until that’s solid, the other efforts are vendor noise.

Second, I think that there are serious test gaps that jeopardize the standard. The fundamental premise of DefCore is that we can use the development tests for API and behavior validation. We chose this path instead of creating an independent test suite. We either need to address tests for interop within the current body of tests or discuss splitting the efforts. Both require more investment than we’ve been willing to make.

We have mechanisms in place to collects data from test results and expand the test base.  Instead of creating new rules or guidelines, I think we can work within the current framework.

The simple answer would be to block non-VM implementations; however, I trust that cloud consumers will make good decisions when given sufficient information.  I think we need to fix the tests and accept non-VM clouds if they pass the corrected tests.

For this and other reasons, I want OpenStack vendors to be specific about the configurations that they test and support. We took steps to address this in DefCore last year but pulled back from being specific about requirements.  In this particular case, I believe we should require the official OpenStack vendor to state clear details about their supported implementation.  Customers will continue vote with their wallet about which configuration details are important.

This is a complex issue and we need community input.  That means that we need to hear from you!  Here’s the TC Position and the DefCore Patch.

8 thoughts on “APIs and Implementations collide at OpenStack Interop: The Oracle Zones vs VMs Debate

  1. I’ll weigh in a bit. As a developer of software I want to be able to give out to others to run on OpenStack clouds, I do need some stability/commonness across clouds.

    At minimum, this means, they provide a set of OpenStack Compatible services in their cloud, the user can see if all the required services are there, and if so, they can run the application on it. This means testing that cloud services match up against service level standards. This may mean some services won’t be there and that might be ok for some apps. Its burdensome on users and app developers though.

    A more useful to developers/users, but more work for ops definition would be to mandate particular services. that makes it easier to assume, say, neutron will be there, or heat will be there, so I can write software much more easily, and users only have to look for one OpenStack seal, rather then many.

    The big problem is, for the second case to be successful, OpenStack MUST be easy to deploy for all of the common pieces. As we add more services, it shouldn’t be overly burdensome to add the required pieces for an operator. Unfortunately, this is very much not the case today. So that’s why there is such a big push for the former definition. If the latter can be made very easy to manage, then its worth assembling the services into a required set, much as the linux kernel no longer lets you easily drop whole subsystems. Your linux programs can now assume, shmem or unix sockets will always be there, rather then being optional, making for a better experience.

    Like

    • Kevin, thanks. You bring in something I ran out of space to discuss. There are services inside the project that assume Linux VMs. If we are going to count on those services then we’re going to have to require certain base capabilities.

      RE: Install complexity. That’s been an ongoing complaint that’s hard to solve in a multi-vendor way.

      Like

  2. This is particularly interesting challenge. At what point does the market drive the change in approach, and can we (the royal we, as in community/TC/DefCore team) safely branch out to alternate workloads as “certified” given the right set of tests. Creating the right criteria to do so will be as important as any potential non-VM workload.

    It would be a shame to dampen any potential innovation, but it also needs to be carefully implemented. This is not unlike the need for financial regulations in the markets. While we want limited government, it is necessary to maintain some vision/strategy and create the right way to ensure success within the bounds of the goals of the overall project ecosystem.

    I’ve been really interested in helping more on this sort of area. Wondering how the community can be more helpful in showing the real need and measuring the risk versus reward on some of this. I fear that many of the folks who think OpenStack should only be a path to the PaaS are all (or nearly all) PaaS vendors. Is the active OpenStack consumer community really driving that hard to open up other workloads?

    Like

    • Eric thanks for the comments. As part of DefCore, we defined criteria to help make decision and even a way to score them. I’ve had discussions with people who are looking to an even higher set of “guiding principles” to help answer directional questions. While I think it would help, the diversity of the community makes it a challenge to define them.

      Like

  3. Pingback: What is an OpenStack Powered Compute? | | ][ stefano maffulli

  4. Pingback: What is an OpenStack Powered Compute? | GREENSTACK

  5. Pingback: OpenStack Shared Community Values? Here’s my seven, let’s compare | Rob Hirschfeld

  6. Pingback: OpenStack Shared Community Values? Here’s my seven, let’s compare | GREENSTACK

Leave a comment