To improve flow, we must view OpenStack community as a Software Factory

This post was sparked by a conversation at OpenStack Atlanta between OpenStack Foundation board members Todd Moore (IBM) and Rob Hirschfeld (Dell/Community).  We share a background in industrial and software process and felt that sharing lean manufacturing translates directly to helping face OpenStack challenges.

While OpenStack has done an amazing job of growing contributors, scale has caused our code flow processes to be bottlenecked at the review stage.  This blocks flow throughout the entire system and presents a significant risk to both stability and feature addition.  Flow failures can ultimately lead to vendor forking.

Fundamentally, Todd and I felt that OpenStack needs to address system flows to build an integrated product.  The post expands on the “hidden influencers” issue and adds an additional challenge because improving flow requires that the community influences better understands the need to optimize work inter-project in a more systematic way.

Let’s start by visualizing the “OpenStack Factory”

Factory Floor

Factory Floor from Alpha Industries Wikipedia page

Imagine all of OpenStack’s 1000s of developers working together in a single giant start-up warehouse.  Each project in its own floor area with appropriate fooz tables, break areas and coffee bars.  It’s easy to visualize clusters of intent developers talking around tables or coding in dark corners while PTLs and TC members dash between groups coordinating work.

Expand the visualization so that we can actually see the code flowing between teams as little colored boxes.  Giving project has a unique color allows us to quickly see dependencies between teams.  Some features are piled up waiting for review inside teams while others are waiting on pallets between projects waiting on needed cross features have not completed.  At release time, we’d be able to see PTLs sorting through stacks of completed boxes to pick which ones were ready to ship.

Watching a factory floor from above is a humbling experience and a key feature of systems thinking enlightenment in both The Phoenix Project and The Goal.  It’s very easy to be caught up in a single project (local optimization) and miss the broader system implications of local choices.

There is a large body of work about Lean Process for Manufacturing

You’ve already visualized OpenStack code creation as a manufacturing floor: it’s a small step to accept that we can use the same proven processes for software and physical manufacturing.

As features move between teams (work centers), it becomes obvious that we’ve created a very highly interlocked sequence of component steps needed to deliver product; unfortunately, we have minimal coordination between the owners of the work centers.  If a feature is needs a critical resource (think programmer) to progress then we rely on the resource to allocate time to the work.  Since that person’s manager may not agree to the priority, we have a conflict between system flow and individual optimization.

That conflict destroys flow in the system.

The number #1 lesson from lean manufacturing is that putting individual optimization over system optimization reduces throughput.  Since our product and people managers are often competitors, we need to work doubly hard to address system concerns.  Worse yet our inventory of work in process and the interdependencies between projects is harder to discern.  Unlike the manufacturing floor, our developers and project leads cannot look down upon it and see the physical work as it progresses from station to station in one single holistic view.  The bottlenecks that throttle the OpenStack workflow are harder to see but we can find them, as can be demonstrated later in this post.

Until we can engage the resource owners in balancing system flow, OpenStack’s throughput will decline as we add resources.  This same principle is at play in the famous aphorism: adding developers makes a late project later.

Is there a solution?

There are lessons from Lean Manufacturing that can be applied

  1. Make quality a priority (expand tests from function to integration)
  2. Ensure integration from station to station (prioritize working together over features)
  3. Make sure that owners of work are coordinating (expose hidden influencers)
  4. Find and mange from the bottleneck (classic Lean says find the bottleneck and improve that)
  5. Create and monitor a system view
  6. Have everyone value finished product, not workstation output

Added Subscript: I highly recommend reading Daniel Berrange’s email about this.

5 thoughts on “To improve flow, we must view OpenStack community as a Software Factory

  1. Great post, but I can’t help but visualize the software manufacturing floor filled with cats wandering about, a few people with food dishes and/or snacks coaxing the cats closer, some mice hiding beneath desks and occasionally scurrying out to be chased by the nearest cats, and a few racoons trundling about, blending in, but every so often taking out a less than observant cat nearby. What can I say? This *is* open source after all 😉 But, yes, optimizing the community will speed the quality and the delivery. It’s also really hard to shift even a little bit of the focus away from the developers so that people who advertise, coordinate and “influence” cross project interactions can be effective. And that is what needs to happen on a code project and community effort the size that OpenStack has obtained.

    Like

    • Thanks! Managing open projects is not that different from closed ones. It’s a mistake to think that we have to re-invent everything and/or shun corporate practices just because it’s open.

      Like

      • I tend to agree with Rob: the fact that OpenStack (generalizations are too hard) is developed collaboratevely across multiple corporate boundaries with different roadmaps and interests doesn’t make it defy laws of gravity or general business practices. There is too much literature from the old-days that may have left the impression that ‘open source’ is some weird place where things happen magically, while instead it requires as much discipline and focus as any other human production.
        To pile on Rob’s proposal, what should that monitoring solution look like? You’ve seen the quarterly report I am producing for the Board: what would you like to see there or regularly on http://activity.openstack.org/dash or http://stackalytics.com fwiw?

        Like

      • Thanks!
        RE your question: I like the Activity dashboard better since it’s more system view. I’d like to see more manufacturing control type metrics like avg time in queue, throughput (in the gate), rework rates, bottleneck identification. Each project is a queue and we should visualize the work that’s waiting in that queue. I’d also like to see upstream/downstream dependencies – like a Nova feature that’s needed in Neutron and blocking.

        Like

  2. Pingback: OpenStack Community Weekly Newsletter Sept. 12-19 - OpenStack Superuser

Leave a comment