APIs and Implementations collide at OpenStack Interop: The Oracle Zones vs VMs Debate

I strive to stay neutral as OpenStack DefCore co-chair; however, as someone asking for another Board term, it’s important to review my thinking so that you can make an informed voting decision.

DefCore, while always on the edge of controversy, recently became ground zero for the “what is OpenStack” debate [discussion write up]. My preferred small core “it’s an IaaS product” answer is only one side. Others favor “it’s an open cloud community” while another faction champions an “open cloud platform.” I’m struggling to find a way that it can be all together at the same time.

The TL;DR is that, today, OpenStack vendors are required to implement a system that can run Linux guests. This is an example of an implementation over API bias because there’s nothing in the API that drives that specific requirement.

From a pragmatic “get it done” perspective, OpenStack needs to remain implementation driven for now. That means that we care that “OpenStack” clouds run VMs.

While there are pragmatic reasons for this, I think that long term success will require OpenStack to become an API specification. So today’s “right answer” actually undermines the long term community value. This has been a long standing paradox in OpenStack.

Breaking the API to implementation link allows an ecosystem to grow with truly alternate implementations (not just plug-ins). This is a threat to the community “upstream first” mentality.  OpenStack needs to be confident enough in the quality and utility of the shared code base that it can allow competitive implementations. Open communities should not need walls to win but they do need clear API definition.

What is my posture for this specific issue?  It’s complicated.

First, I think that the user and ecosystem expectations are being largely ignored in these discussions. Many of the controversial items here are vendor initiatives, not user needs. Right now, I’ve heard clearly that those expectations are for OpenStack to be an IaaS the runs VMs. OpenStack really needs to focus on delivering a reliably operable VM based IaaS experience. Until that’s solid, the other efforts are vendor noise.

Second, I think that there are serious test gaps that jeopardize the standard. The fundamental premise of DefCore is that we can use the development tests for API and behavior validation. We chose this path instead of creating an independent test suite. We either need to address tests for interop within the current body of tests or discuss splitting the efforts. Both require more investment than we’ve been willing to make.

We have mechanisms in place to collects data from test results and expand the test base.  Instead of creating new rules or guidelines, I think we can work within the current framework.

The simple answer would be to block non-VM implementations; however, I trust that cloud consumers will make good decisions when given sufficient information.  I think we need to fix the tests and accept non-VM clouds if they pass the corrected tests.

For this and other reasons, I want OpenStack vendors to be specific about the configurations that they test and support. We took steps to address this in DefCore last year but pulled back from being specific about requirements.  In this particular case, I believe we should require the official OpenStack vendor to state clear details about their supported implementation.  Customers will continue vote with their wallet about which configuration details are important.

This is a complex issue and we need community input.  That means that we need to hear from you!  Here’s the TC Position and the DefCore Patch.

Understanding OpenStack Designated Code Sections – Three critical questions

A collaboration with Michael Still (TC Member from Rackspace) & Joshua McKenty and Cross posted by Rackspace.

After nearly a year of discussion, the OpenStack board launched the DefCore process with 10 principles that set us on path towards a validated interoperability standard.   We created the concept of “designated sections” to address concerns that using API tests to determine core would undermine commercial and community investment in a working, shared upstream implementation.

Designated SectionsDesignated sections provides the “you must include this” part of the core definition.  Having common code as part of core is a central part of how DefCore is driving OpenStack operability.

So, why do we need this?

From our very formation, OpenStack has valued implementation over specification; consequently, there is a fairly strong community bias to ensure contributions are upstreamed. This bias is codified into the very structure of the GNU General Public License (GPL) but intentionally missing in the Apache Public License (APL v2) that OpenStack follows.  The choice of Apache2 was important for OpenStack to attract commercial interests, who often consider GPL a “poison pill” because of the upstream requirements.

Nothing in the Apache license requires consumers of the code to share their changes; however, the OpenStack foundation does have control of how the OpenStack™ brand is used.   Thus it’s possible for someone to fork and reuse OpenStack code without permission, but they cannot called it “OpenStack” code.  This restriction only has strength if the OpenStack brand has value (protecting that value is the primary duty of the Foundation).

This intersection between License and Brand is the essence of why the Board has created the DefCore process.

Ok, how are we going to pick the designated code?

Figuring out which code should be designated is highly project specific and ultimately subjective; however, it’s also important to the community that we have a consistent and predictable strategy.  While the work falls to the project technical leads (with ratification by the Technical Committee), the DefCore and Technical committees worked together to define a set of principles to guide the selection.

This Technical Committee resolution formally approves the general selection principles for “designated sections” of code, as part of the DefCore effort.  We’ve taken the liberty to create a graphical representation (above) that visualizes this table using white for designated and black for non-designated sections.  We’ve also included the DefCore principle of having an official “reference implementation.”

Here is the text from the resolution presented as a table:

Should be DESIGNATED: Should NOT be DESIGNATED:
  • code provides the project external REST API, or
  • code is shared and provides common functionality for all options, or
  • code implements logic that is critical for cross-platform operation
  • code interfaces to vendor-specific functions, or
  • project design explicitly intended this section to be replaceable, or
  • code extends the project external REST API in a new or different way, or
  • code is being deprecated

The resolution includes the expectation that “code that is not clearly designated is assumed to be designated unless determined otherwise. The default assumption will be to consider code designated.”

This definition is a starting point.  Our next step is to apply these rules to projects and make sure that they provide meaningful results.

Wow, isn’t that a lot of code?

Not really.  Its important to remember that designated sections alone do not define core: the must-pass tests are also a critical component.   Consequently, designated code in projects that do not have must-pass tests is not actually required for OpenStack licensed implementation.

Visualizing the OpenStack Core discussion points

THIS POST IS #6 IN A SERIES ABOUT “WHAT IS CORE.”

As we take the OpenStack Core discussion to a larger audience, I was asked to create the summary version the discussion points.  We needed a quick visual way to understand how these consensus statements interconnect and help provide context.  To address this need, I based it on a refined 10 core positions to create the following OpenStack Core flowchart.

core flow

The flow diagram below is grouped into three main areas: core definition (green), technical requirements (blue), and testing impacts (orange).

  1. Core Definition (green) walks through the fundamental scope and premise of the “what is core” discussion.  We are looking for the essential OpenStack: the parts that everyone needs and nothing more.  While OpenStack can be something much larger, core lives at the heart of the use-case venn diagram.  It’s the magical ice cream flavor that everyone loves like Triple Unicorn Rainbow Crunch.
  2. Technical Requirements (blue) covers some of the most contentious parts of the dialog.  This section states the expectation that OpenStack™ implementations must use parts the OpenStack code (you can’t just provide a compatible API).  It goes further to expect that we will maintain an open reference implementation and also identify places where parts of the code can be substituted with alternate implementations.  Examples of alternate implementations are plug-ins, API extensions, different hypervisors, and alternate libraries.
  3. Testing Impacts (orange) reviews some of the important new thinking around Core.  These points focus on the use of OpenStack community tests (e.g.: Tempest) to validate the total code base.  We expect users to be able to self-administer these tests or rely on an external validation.  Either way, we do not expect all tests to pass for all configurations; instead, the Foundation will identify a subset of the tests as required or must-pass.  The current thinking is that these must-pass tests will become the effective definition of OpenStack™ Core.

I hope this helps connect the dots on the core discussions so far.

I’d like to clean-up the positions to match the flow chart and cross reference.  Stay tuned!  This flowchart is a work in process – updates and suggestions are welcome!

READ POST IS #7: WHERE IS THIS GOING?

Community dialogue around “What is Core” positions

THIS POST IS #5 IN A SERIES ABOUT “WHAT IS CORE.”

by Rob Hirschfeld (cc) w/ attribution

The OpenStack Foundation Board has been having a broadening conservation about this topic.  Feeling left out?  Please don’t be!  Now is the time to start getting involved: we had to start very narrowly focused to avoid having the discussion continue to go in circles.  As we’ve expanding the dialog, we have incorporated significant feedback to drive consensus.

No matter where I go, people are passionate about the subject of OpenStack Core.

Overall, there is confusion of scope covered by “what is core” because people bring in their perspective from public, private solution, ecosystem or internal deployment objectives.  In discussion, everyone sees that we have to deal with issues around the OpenStack mark and projects first, but they are impatient to get into the deep issues.  Personally, we can get consensus on core and will always have a degree of healthy tension between user types.

The following are my notes, not my opinions.  I strive to faithfully represent a wide range of positions here.  Clarifications, comments and feedback are welcome!

Consensus Topics:

  • Reference/Alternate Implementation (not plug-in): Not using “plug-ins” to describe the idea that OpenStack projects should have a shared API with required code and clearly specified areas where code is replaceable.  It is the Technical Committee (TC) that makes these decisions.  The most meaningful language around this point is to say that OpenStack will have an open reference implementation with allowable alternate implementations.
  • Alternate implementations are useful:  We want to ensure upstream contribution and collaboration on the code base.  Reference implementations ensure that there’s a reason to keep open source OpenStack strong.  Alternate Implementations are important to innovation.
  • Small vs. Large Core: This is an ongoing debate about if OpenStack should have a lot of projects as part of core.  We don’t have an answer but people feel like we’re heading in a direction that resolves this question.
  • Everyone likes tests: We’re heading towards a definition of core that relies heavily on tests.  Everyone expresses concerns that this will place a lot of stress on Tempest (or another framework) and that needs to be addressed as we move forward.

Open Topics:

  • Monolithic vs. Granular Trademark:  We did not discuss if vendors will be able to claim OpenStack trademarks on subcomponents of the whole.  This is related to core but wide considered secondary.
  • API vs. implementation tension:  We accept that OpenStack will lead with implementation.   There’s no official policy that “we are not a standards body” but we may also have to state that tests are not a specification.  There’s a danger that tests will be considered more than they are.  What are they?  “They are an implementation and a source of information.  They are not the definition.”   We expect to have a working model that drives the API not vice versa.
  • Brouhaha about EC2 APIs:  It’s not clear if defining core helps address the OpenStack API discussion.  I hope it will but have not tested it.
  • Usability as core: I had many people insist that usability and ease of use should be as requirements for core because it supports adoption.  Our current positions do not have any statements to support this view.
  • Toxic neighbors: We have not discussed if use of the mark and criteria could be limited by what else you put in your product.  Are there implementation options that we’d consider toxic and automatically violate the mark?  Right now, the positions are worded that if you pass then you play even if you otherwise stink.
  • Which tests are required?  It appears that we’re moving towards using must-pass tests to define the core.  Moving towards tests determining core, we want actual field data to drive which tests are required. That will allow actual user experience to shape which tests are important rather than having it be a theoretical decision.  There’s some interest in asking the User Committee (UC) to recommend which tests are required.  This would be an added responsibility for the UC and needs more discussion.
  • Need visualization:  With 12 positions so far, it’s getting hard to keep it all together.  I’ve taken on an action item to create a diagram that shows which statements apply to which projects against the roles of ownership.

I’ve had some great discussions about core and am looking forward to many more.  I hope these notes help bring you up to speed.   As always, comments and discussion are welcome!

READ POST #6: VISUALIZING CORE

Ballistic Release Cycles: Tracking the Trajectory of OpenStack Milestones

I’ve been watching a pattern emerge on the semiannual OpenStack release cycles for a while now. There is a hidden but crucial development phase that accelerates projects faster than many observers realize. In fact, I believe that substantial work is happening outside of the “normal” design cycle during what I call “free fall” development.

Understanding when the cool, innovative stuff happens is essential to getting (and giving) the most from OpenStack.

The published release cycle looms like a 6 stage ballistic trajectory. Launching at the design summit, the release features change and progress the most in the first 3 milestones. At the apogee of the release, maximum velocity is reached just as we start having to decide which features are complete enough to include in the release. Since many are not ready, we have to jettison (really, defer) partial work to ensure that we can land the release on schedule.

I think of the period where we lose potential features as free fall because thing can go in any direction. The release literally reverses course: instead of expanding, it is contracting. This process is very healthy for OpenStack. It favors code stability and “long” hardening times. For operators, this means that the code stops changing early enough that we have more time to test and operationalize the release.

But what happens to the jettisoned work? In free fall, objects in motion stay in motion. The code does not just disappear! It continues on its original upward trajectory.

The developers who invested time in the code do not simply take a 3 month sabbatical, nor do they stop their work and start testing the code that was kept. No, after the short in/out sorting pause, the free fall work continues onward with rockets blasting. The challenge is that it is now getting outside of the orbit of the release plan and beyond the radar of many people who are tracking the release.

The consequence of this ongoing development is that developers (and the features they are working on) show up at the summit with 3 extra months of work completed. It also means that OpenStack starts each release cycle with a bucket of operationally ready code. Wow, that’s a huge advantage for the project in terms of delivered work, feature velocity and innovation. Even better, it means that the design summit can focus on practical discussions of real prototypes and functional features.

Unfortunately, this free fall work has hidden costs:

  • It is relatively hidden because it is outside of the normal release cycle.
  • It makes true design discussions less productive because the implemented code is more likely to make the next release cycle
  • Integration for the work is postponed because it continues before branching
  • Teams that are busy hardening a core feature can be left out of work on the next iteration of the same feature
  • Forking can make it hard to capture bugs caught during hardening

I think OpenStack greatly benefits from free fall development; consequently, I think we need to acknowledge and embrace it to reduce its costs. A more explicit mid-release design synchronization when or before we fork may help make this hidden work more transparent.