Tweaking DefCore to subdivide OpenStack platform (proposal for review)

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsFor nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.

OpenStack Goldilocks’ Syndrome: three questions to help us find our bearings

Goldilocks Atlas

Action: Please join Stefano. Allison, Sean and me in Paris on Monday, November 3rd, in the afternoon (schedule link)

If wishes were fishes, OpenStack’s rapid developer and user rise would include graceful process and commercial transitions too.  As a Foundation board member, it’s my responsibility to help ensure that we’re building a sustainable ecosystem for the project.  That’s a Goldilock’s challenge because adding either too much or too little controls and process will harm the project.

In discussions with the community, that challenge seems to breaks down into three key questions:

After last summit, a few of us started a dialog around Hidden Influencers that helps to frame these questions in an actionable way.  Now, it’s time for us to come together and talk in Paris in the hallways and specifically on Monday, November 3rd, in the afternoon (schedule link).   From there, we’ll figure out about next steps using these three questions as a baseline.

If you’ve got opinions about these questions, don’t wait for Paris!  I’d love to start the discussion here in the comments, on twitter (@zehicle), by phone, with email or via carrier pidgins.

To improve flow, we must view OpenStack community as a Software Factory

This post was sparked by a conversation at OpenStack Atlanta between OpenStack Foundation board members Todd Moore (IBM) and Rob Hirschfeld (Dell/Community).  We share a background in industrial and software process and felt that sharing lean manufacturing translates directly to helping face OpenStack challenges.

While OpenStack has done an amazing job of growing contributors, scale has caused our code flow processes to be bottlenecked at the review stage.  This blocks flow throughout the entire system and presents a significant risk to both stability and feature addition.  Flow failures can ultimately lead to vendor forking.

Fundamentally, Todd and I felt that OpenStack needs to address system flows to build an integrated product.  The post expands on the “hidden influencers” issue and adds an additional challenge because improving flow requires that the community influences better understands the need to optimize work inter-project in a more systematic way.

Let’s start by visualizing the “OpenStack Factory”

Factory Floor

Factory Floor from Alpha Industries Wikipedia page

Imagine all of OpenStack’s 1000s of developers working together in a single giant start-up warehouse.  Each project in its own floor area with appropriate fooz tables, break areas and coffee bars.  It’s easy to visualize clusters of intent developers talking around tables or coding in dark corners while PTLs and TC members dash between groups coordinating work.

Expand the visualization so that we can actually see the code flowing between teams as little colored boxes.  Giving project has a unique color allows us to quickly see dependencies between teams.  Some features are piled up waiting for review inside teams while others are waiting on pallets between projects waiting on needed cross features have not completed.  At release time, we’d be able to see PTLs sorting through stacks of completed boxes to pick which ones were ready to ship.

Watching a factory floor from above is a humbling experience and a key feature of systems thinking enlightenment in both The Phoenix Project and The Goal.  It’s very easy to be caught up in a single project (local optimization) and miss the broader system implications of local choices.

There is a large body of work about Lean Process for Manufacturing

You’ve already visualized OpenStack code creation as a manufacturing floor: it’s a small step to accept that we can use the same proven processes for software and physical manufacturing.

As features move between teams (work centers), it becomes obvious that we’ve created a very highly interlocked sequence of component steps needed to deliver product; unfortunately, we have minimal coordination between the owners of the work centers.  If a feature is needs a critical resource (think programmer) to progress then we rely on the resource to allocate time to the work.  Since that person’s manager may not agree to the priority, we have a conflict between system flow and individual optimization.

That conflict destroys flow in the system.

The number #1 lesson from lean manufacturing is that putting individual optimization over system optimization reduces throughput.  Since our product and people managers are often competitors, we need to work doubly hard to address system concerns.  Worse yet our inventory of work in process and the interdependencies between projects is harder to discern.  Unlike the manufacturing floor, our developers and project leads cannot look down upon it and see the physical work as it progresses from station to station in one single holistic view.  The bottlenecks that throttle the OpenStack workflow are harder to see but we can find them, as can be demonstrated later in this post.

Until we can engage the resource owners in balancing system flow, OpenStack’s throughput will decline as we add resources.  This same principle is at play in the famous aphorism: adding developers makes a late project later.

Is there a solution?

There are lessons from Lean Manufacturing that can be applied

  1. Make quality a priority (expand tests from function to integration)
  2. Ensure integration from station to station (prioritize working together over features)
  3. Make sure that owners of work are coordinating (expose hidden influencers)
  4. Find and mange from the bottleneck (classic Lean says find the bottleneck and improve that)
  5. Create and monitor a system view
  6. Have everyone value finished product, not workstation output

Added Subscript: I highly recommend reading Daniel Berrange’s email about this.

VMware Integrated OpenStack (VIO) is smart move, it’s like using a Volvo to tow your ski boat

I’m impressed with VMware’s VIO (beta) play and believe it will have a meaningful positive impact in the OpenStack ecosystem.  In the short-term, it paradoxically both helps enterprises stay on VMware and accelerates adoption of OpenStack.  The long term benefit to VMware is less clear.

From VWVortex

Sure, you can use a Volvo to tow a boat

Why do I think it’s good tactics?  Let’s explore an analogy….

My kids think owning a boat will be super fun with images of ski parties and lazy days drifting at anchor with PG13 umbrella drinks; however, I’ve got concerns about maintenance, cost and how much we’d really use it.  The problem is not the boat: it’s all of the stuff that goes along with ownership.  In addition to the boat, I’d need a trailer, a new car to pull the boat and driveway upgrades for parking.  Looking at that, the boat’s the easiest part of the story.

The smart move for me is to rent a boat and trailer for a few months to test my kids interest.  In that case, I’m going to be towing the boat using my Volvo instead of going “all in” and buying that new Ferd 15000 (you know you want it).  As a compromise, I’ll install a hitch in my trusty sedan and use it gently to tow the boat.  It’s not ideal and causes extra wear to the transmission but it’s a very low risk way to explore the boat owning life style.

Enterprise IT already has the Volvo (VMware vCenter) and likely sees calls for OpenStack as the illusion of cool ski parties without regard for the realities of owning the boat.  Pulling the boat for a while (using OpenStack on VMware) makes a lot of sense to these users.  If the boat gets used then they will buy the truck and accessories (move off VMware).  Until then, their still learning about the open source boating life style.

Putting open source concerns aside.  This helps VMware lead the OpenStack play for enterprises but may ultimately backfire if they have not setup their long game to keep the customers.

OpenStack DefCore Process Flow: Community Feedback Cycles for Core [6 points + chart]

If you’ve been following my DefCore posts, then you already know that DefCore is an OpenStack Foundation Board managed process “that sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack™ products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack™.”

In this post, I’m going to be very specific about what we think “community resources and involvement” entails.

The draft process flow chart was provided to the Board at our OSCON meeting without additional review.  It below boils down to a few key points:

  1. We are using the documents in the Gerrit review process to ensure that we work within the community processes.
  2. Going forward, we want to rely on the technical leadership to create, cluster and describe capabilities.  DefCore bootstrapped this process for Havana.  Further, Capabilities are defined by tests in Tempest so test coverage gaps (like Keystone v2) translate into Core gaps.
  3. We are investing in data driven and community involved feedback (via Refstack) to engage the largest possible base for core decisions.
  4. There is a “safety valve” for vendors to deal with test scenarios that are difficult to recreate in the field.
  5. The Board is responsible for approving the final artifacts based on the recommendations.  By having a transparent process, community input is expected in advance of that approval.
  6. The process is time sensitive.  There’s a need for the Board to produce Core definition in a timely way after each release and then feed that into the next one.  Ideally, the definitions will be approved at the Board meeting immediately following the release.

DefCore Process Draft

Process shows how the key components: designated sections and capabilities start from the previous release’s version and the DefCore committee manages the update process.  Community input is a vital part of the cycle.  This is especially true for identifying actual use of the capabilities through the Refstack data collection site.

  • Blue is for Board activities
  • Yellow is or user/vendor community activities
  • Green is for technical community activities
  • White is for process artifacts

This process is very much in draft form and any input or discussion is welcome!  I expect DefCore to take up formal review of the process in October.

Your baby is ugly! Picking which code is required for Commercial Core.

babyThere’s no point in sugar-coating this: selecting API and code sections for core requires making hard choices and saying no.  DefCore makes this fair by 1) defining principles for selection, 2) going slooooowly to limit surprises and 3) being transparent in operation.  When you’re telling someone who their baby is not handsome enough you’d better be able to explain why.

The truth is that from DefCore’s perspective, all babies are ugly.  If we are seeking stability and interoperability, then we’re looking for adults not babies or adolescents.

Explaining why is exactly what DefCore does by defining criteria and principles for our decisions.  When we do it right, it also drives a positive feedback loop in the community because the purpose of designated sections is to give clear guidance to commercial contributors where we expect them to be contributing upstream.  By making this code required for Core, we are incenting OpenStack vendors to collaborate on the features and quality of these sections.

This does not lessen the undesignated sections!  Contributions in those areas are vital to innovation; however, they are, by design, more dynamic, specialized or single vendor than the designated areas.

Designated SectionsThe seven principles of designated sections (see my post with TC member Michael Still) as defined by the Technical Committee are:

Should be DESIGNATED:

  1. code provides the project external REST API, or
  2. code is shared and provides common functionality for all options, or
  3. code implements logic that is critical for cross-platform operation

Should NOT be DESIGNATED:

  1. code interfaces to vendor-specific functions, or
  2. project design explicitly intended this section to be replaceable, or
  3. code extends the project external REST API in a new or different way, or
  4. code is being deprecated

While the seven principles inform our choices, DefCore needs some clarifications to ensure we can complete the work in a timely, fair and practical way.  Here are our additions:

8.     UNdesignated by Default

  • Unless code is designated, it is assumed to be undesignated.
  • This aligns with the Apache license.
  • We have a preference for smaller core.

9.      Designated by Consensus

  • If the community cannot reach a consensus about designation then it is considered undesignated.
  • Time to reach consensus will be short: days, not months
  • Except obvious trolling, this prevents endless wrangling.
  • If there’s a difference of opinion then the safe choice is undesignated.

10.      Designated is Guidance

  • Loose descriptions of designated sections are acceptable.
  • The goal is guidance on where we want upstream contributions not a code inspection police state.
  • Guidance will be revised per release as part of the DefCore process.

In my next DefCore post, I’ll review how these 10 principles are applied to the Havana release that is going through community review before Board approval.

Patchwork Onion delivers stability & innovation: the graphics that explains how we determine OpenStack Core

This post was coauthored by the DefCore chairs, Rob Hirschfeld & Joshua McKenty.

The OpenStack board, through the DefCore committee, has been working to define “core” for commercial users using a combination of minimum required capabilities (APIs) and code (Designated Sections).  These minimums are decided on a per project basis so it can be difficult to visualize the impact on the overall effect on the Integrated Release.

Patchwork OnionWe’ve created the patchwork onion graphic to help illustrate how core relates to the integrated release.  While this graphic is pretty complex, it was important to find a visual way to show how different DefCore identifies distinct subsets of APIs and code from each project.  This graphic tries to show how that some projects have no core APIs and/or code.

For OpenStack to grow, we need to have BOTH stability and innovation.  We need to give clear guidance to the community what is stable foundation and what is exciting sandbox.  Without that guidance, OpenStack is perceived as risky and unstable by users and vendors. The purpose of defining “Core” is to be specific in addressing that need so we can move towards interoperability.

Interoperability enables an ecosystem with multiple commercial vendors which is one of the primary goals of the OpenStack Foundation.

Ecosystem OnionOriginally, we thought OpenStack would have “core” and “non-core” projects and we baked that expectation into the bylaws.  As we’ve progressed, it’s clear that we need a less binary definition.  Projects themselves have a maturity cycle (ecosystem -> incubated -> integrated) and within the project some APIs are robust and stable while others are innovative and fluctuating.

Encouraging this mix of stabilization and innovation has been an important factor in our discussions about DefCore.  Growing the user base requires encouraging stability and growing the developer base requires enabling innovation within the same projects.

The consequence is that we are required to clearly define subsets of capabilities (APIs) and implementation (code) that are required within each project.  Designating 100% of the API or code as Core stifles innovation because stability dictates limiting changes while designating 0% of the code (being API only) lessens the need to upstream.  Core reflects the stability and foundational nature of the code; unfortunately, many people incorrectly equate “being core” with the importance of the code, and politics ensues.

To combat the politics, DefCore has taken a transparent, principles-based approach to selecting core.   You can read about in Rob’s upcoming “Ugly Babies” post (check back on 8/14) .