OpenStack DefCore Process Draft Posted for Review [major milestone]

OpenStack DefCore Committee is looking for community feedback about the proposed DefCore Process.

Golden PathMarch has been a month for OpenStack DefCore milestones.  At the March Board meeting, we approved the first official DefCore Guideline (called DefCore 2015.03) and we are poised to commit the first DefCore Process draft.

Once this initial commit is approved by the DefCore Committee (expected at DefCore Scale.8 Meeting 3/25 @ 9 PT), we’ll be ready for broader input by the community using the standard OpenStack Gerrit review process.  If you are not comfortable with Gerrit, we’ll take your input anyway that you want to give it except via telepathy (we’ve already got a lot on our minds).

Note: We’re also looking for input on the 2015.next Guideline targeted for 2015.04,

The DefCore Process documents the rules (who, what, when and where) that will govern how we create the DefCore Guidelines.  By design, it has to be detailed and specific without adding complexity and confusion.  The why of DefCore is all that work we did on principles that shape the process.

This process reflects nearly a year of gestation starting from the June 2014 DefCore face-to-face.  Once of the notable recent refinements was to organize material into time phases and to be more specific about who is responsible for specific actions.

To make review easier, I’ve reposted the draft.  Comments are welcome here and on the patch (and here after it lands).

DRAFT: OpenStack DefCore Process 2015A (reposted from OpenStack/DefCore)

This document describes the DefCore process required by the OpenStack bylaws and approved by the OpenStack Technical Committee and Board.

Expected Time line:

Time Frame Milestone Activities Lead By
-3 months S-3 “preliminary” draft (from current) DefCore
-2 months S-2 ID new Capabilities Community
-1 month S-1 Score capabilities DefCore
Summit S “solid” draft Community
Advisory items selected DefCore
+1 month S+1 self-testing Vendors
+2 months S+2 Test Flagging DefCore
+3 months S+3 Approve Guidance Board

Note: DefCore may accelerate the process to correct errors and omissions.

Process Definition

Continue reading

To improve flow, we must view OpenStack community as a Software Factory

This post was sparked by a conversation at OpenStack Atlanta between OpenStack Foundation board members Todd Moore (IBM) and Rob Hirschfeld (Dell/Community).  We share a background in industrial and software process and felt that sharing lean manufacturing translates directly to helping face OpenStack challenges.

While OpenStack has done an amazing job of growing contributors, scale has caused our code flow processes to be bottlenecked at the review stage.  This blocks flow throughout the entire system and presents a significant risk to both stability and feature addition.  Flow failures can ultimately lead to vendor forking.

Fundamentally, Todd and I felt that OpenStack needs to address system flows to build an integrated product.  The post expands on the “hidden influencers” issue and adds an additional challenge because improving flow requires that the community influences better understands the need to optimize work inter-project in a more systematic way.

Let’s start by visualizing the “OpenStack Factory”

Factory Floor

Factory Floor from Alpha Industries Wikipedia page

Imagine all of OpenStack’s 1000s of developers working together in a single giant start-up warehouse.  Each project in its own floor area with appropriate fooz tables, break areas and coffee bars.  It’s easy to visualize clusters of intent developers talking around tables or coding in dark corners while PTLs and TC members dash between groups coordinating work.

Expand the visualization so that we can actually see the code flowing between teams as little colored boxes.  Giving project has a unique color allows us to quickly see dependencies between teams.  Some features are piled up waiting for review inside teams while others are waiting on pallets between projects waiting on needed cross features have not completed.  At release time, we’d be able to see PTLs sorting through stacks of completed boxes to pick which ones were ready to ship.

Watching a factory floor from above is a humbling experience and a key feature of systems thinking enlightenment in both The Phoenix Project and The Goal.  It’s very easy to be caught up in a single project (local optimization) and miss the broader system implications of local choices.

There is a large body of work about Lean Process for Manufacturing

You’ve already visualized OpenStack code creation as a manufacturing floor: it’s a small step to accept that we can use the same proven processes for software and physical manufacturing.

As features move between teams (work centers), it becomes obvious that we’ve created a very highly interlocked sequence of component steps needed to deliver product; unfortunately, we have minimal coordination between the owners of the work centers.  If a feature is needs a critical resource (think programmer) to progress then we rely on the resource to allocate time to the work.  Since that person’s manager may not agree to the priority, we have a conflict between system flow and individual optimization.

That conflict destroys flow in the system.

The number #1 lesson from lean manufacturing is that putting individual optimization over system optimization reduces throughput.  Since our product and people managers are often competitors, we need to work doubly hard to address system concerns.  Worse yet our inventory of work in process and the interdependencies between projects is harder to discern.  Unlike the manufacturing floor, our developers and project leads cannot look down upon it and see the physical work as it progresses from station to station in one single holistic view.  The bottlenecks that throttle the OpenStack workflow are harder to see but we can find them, as can be demonstrated later in this post.

Until we can engage the resource owners in balancing system flow, OpenStack’s throughput will decline as we add resources.  This same principle is at play in the famous aphorism: adding developers makes a late project later.

Is there a solution?

There are lessons from Lean Manufacturing that can be applied

  1. Make quality a priority (expand tests from function to integration)
  2. Ensure integration from station to station (prioritize working together over features)
  3. Make sure that owners of work are coordinating (expose hidden influencers)
  4. Find and mange from the bottleneck (classic Lean says find the bottleneck and improve that)
  5. Create and monitor a system view
  6. Have everyone value finished product, not workstation output

Added Subscript: I highly recommend reading Daniel Berrange’s email about this.

OpenStack ATL Recap to the 11s: the danger of drama + 5 challenges & 5 successes

HallwayI’ve come to accept that the “Hallway Track” is my primary session at OpenStack events.  I want to thank the many people in the community who make that the best track.  It’s not only full of deep technical content; there are also healthy doses of intrigue, politics and “let’s fix that” in the halls.

I think honest reflection is critical to OpenStack growth (reflections from last year).  My role as a Board member must not translate into pom-pom waving robot cheerleader.

 

What I heard that’s working:

  1. Foundation event team did a great job on the logistics and many appreciate the user and operator focus.  There’s is no doubt that OpenStack is being deployed at scale and helping transform cloud infrastructure.  I think that’s a great message.
  2. DefCore criteria were approved by the Board.  The overall process and impact was talked about positively at the summit.  To accelerate, we need +1s and feedback because “crickets” means we need to go slower.  I’ll have to dedicate a future post to next steps and “designated sections.”
  3. Marketplace!  Great turn out by vendors of all types, but I’m not hearing about them making a lot of money from OpenStack (which is needed for them to survive).  I like the diversity of the marketplace: consulting, aaServices, installers, networking, more networking, new distros, and ecosystem tools.
  4. There’s some real growth in aaS services for openstack (database, load balancer, dns, etc).   This is the ecosystem that many want OpenStack to drive because it helps displace Amazon cloud.  I also heard concerns that to be sure they are pluggable so companies can complete on implementation.
  5. Lots of process changes to adapt to growing pains.  People felt that the community is adapting (yeah!) but were concerned having to re-invent tooling (meh).

There are also challenges that people brought to me:

  1. Our #1 danger is drama.  Users and operators want collaboration and friendly competition.  They are turned off by vendor conflict or strong-arming in the community (e.g.: the WSJ Red Hat article and fallout).  I’d encourage everyone to breathe more and react less.
  2. Lack of product management is risking a tragedy of the commons.  Helping companies work together and across projects is needed for our collaboration processes to work.  I’ll be exploring this with Sean Roberts in future posts.
  3. Making sure there’s profit being generated from shared code.  We need to remember that most of the development is corporate funded so we need to make sure that companies generate revenue.  The trend of everyone creating unique distros may indicate a problem.
  4. We need to be more operator friendly.  I know we’re trying but we create distance with operators when we insist on creating new tools instead of using the existing ecosystem.  That also slows down dealing with upgrades, resilient architecture and other operational concerns.
  5. Anointed projects concerns have expanded since Hong Kong.  There’s a perception that Heat (orchestration), Triple0 (provisioning), Solum (platform) are considered THE only way OpenStack solves those problems and other approaches are not welcome.  While that encourages collaboration, it also chills competition and discussion.
  6. There’s a lot of whispering about the status of challenged projects: neutron (works with proprietary backends but not open, may not stay integrated) and openstack boot-strap (state of TripleO/Ironic/Heat mix).  The issue here is NOT if they are challenged but finding ways to discuss concerns openly (see anointed projects concern).

I’d enjoy hearing more about success and deeper discussion around concerns.  I use community feedback to influence my work in the community and on the board.  If you think I’ve got it right or wrong then please let me know.

Forward-looking Reviews: Feedback loops essential for Agile success

To keep pace with cloud innovations, my team at Dell drives aggressively forward.  Agile is essential to our success because it provides critical organization, control and feedback for our projects.  One repeating challenge I’ve had with the Agile decorations (aka meetings) is confusion between the name of the meeting and the process objectives.

The Agile process is very simple:  get feedback -> decide -> act -> repeat

People miss the intent of our process because of their predisposition about what’s supposed to happen in a meeting based on it’s name. 

Some examples of names I avoid:

  • Demo – implies a one-way communication instead of a feedback loop
  • Post-mortem – implies it’s too late to fix problems
  • Retrospective – implies we are talking about the past instead of looking forward
  • Schedule – assumes that we can make promises about the future (not bad, but limits flexibility)
  • Person-Weeks – focuses on time frame, not on the use cases we want to accomplish

Names that work well with Agile

  • Planning – we’re working together to figure out what we’re going to do.
  • Review – talking over work that’s been done with input expected.
  • Roadmap – implies a journey in which we have to achieve certain landmarks before we reach our destination.
  • Story Points – avoids time references in favor of relative weights and something that can be traded.
  • Velocity – conveys working quickly and making progress.  Works well with roadmaps.

We have recognize the powerful influence of semantics for people participating in any process.   If people arrive with the wrong mindset, we face significant danger (IMHO, soul numbing meetings are murder) from completely missing critical opportunities to get feedback and drive decisions.  Since We rarely review WHY we are meeting, so it’s easy to have people not engage or make poor assumptions based on nothing more than our word choice.

The most powerful mitigation to semantic confusion is to constantly seek feedback.  Ask for feedback specifically.  Ask for feedback using the work feedback.

Does this make sense?  I’d like your feedback.