My Dilemma with Folsom – why I want to jump to G

When your ship sailsThese views are my own.  Based on 1×1 discussions I’ve had in the OpenStack community, I am not alone.

If you’ve read my blog then you know I am a vocal and active supporter of OpenStack who is seeking re-election to the OpenStack Board.  I’m personally and professionally committed to the project’s success. And, I’m confident that OpenStack’s collaborative community approach is out innovating other clouds.

A vibrant project requires that we reflect honestly: we have an equal measure of challenges: shadow free fall Dev, API vs implementation, forking risk and others.  As someone helping users deploy OpenStack today, I find my self straddling between a solid release (Essex) and a innovative one (Grizzly). Frankly, I’m finding it very difficult to focus on Folsom.

Grizzly excites me and clearly I’m not alone.  Based on pace of development, I believe we saw a significant developer migration during feature freeze free fall.

In Grizzly, both Cinder and Quantum will have progressed to a point where they are ready for mainstream consumption. That means that OpenStack will have achieved the cloud API trifecta of compute-store-network.

  • Cinder will get beyond the “replace Nova Volume” feature set and expands the list of connectors.
  • Quantum will get to parity with Nova Network, addresses overlapping VM IPs and goes beyond L2 with L3 feature enablement like  load balancing aaS.
  • We are having a real dialog about upgrades while the code is still in progress
  • And new projects like Celio and Heat are poised to address real use problems in billing and application formation.

Everything I hear about Folsom deployment is positive with stable code and significant improvements; however, we’re too late to really influence operability at the code level because the Folsom release is done.  This is not a new dilemma.  As operators, we seem to be forever chasing the tail of the release.

The perpetual cycle of implementing deployment after release is futile, exhausting and demoralizing because we finish just in time for the spotlight to shift to the next release.

I don’t want to slow the pace of releases.  In Agile/Lean, we believe that if something is hard then we do should it more.  Instead, I am looking at Grizzly and seeing an opportunity to break the cycle.  I am looking at Folsom and thinking that most people will be OK with Essex for a little longer.

Maybe I’m a dreamer, but if we can close the deployment time gap then we accelerate adoption, innovation and happy hour.  If that means jilting Folsom at the release altar to elope with Grizzly then I can live with that.

Open Community Access to Crowbar 2 Efforts

We’re moving along on the Crowbar2 refactoring work (`/release/rails3anddb/master` branch for now) and it’s time to start making it easier for you to participate if you are interested.

We are planning to start having TWO weekly community sprint meetings.

NOTE: Times can still shift depending on community input! We are trying to a truly global community so we need your input.

The weekly design discussions will be on Tuesdays @ 10am Central (GMT -6). The topics will be relevant to the coming sprint and we expect dialog. The topics will be determined by Greg Althaus based on progress on the refactor. You’re welcome to contact him or the list with suggestions.

The purpose of these meetings is to

  • discuss/resolve design related to the refactor
  • Document use-cases
  • identify issues that need to be addressed in the next sprint

The weekly coordination meetings will be on Thursdays @ 8am Central (GMT -6). We want to respect everyone’s time and will strictly limit these to 1 hour. This meeting is in between our internal sprint review and planning so we have flexibility to adjust our plans based on your input. It is important to us that we make it possible for you to contribute and we need your input to make sure that you are not blocked!

The coordination meetings will be structured as follows (times approximate)

  • Voice & Screencast:
  • 25 minutes for review of current progress
  • 10 minutes for feedback/adjustments on process and workflow
  • 25 minutes for planning of next sprint
  • Online discussion & notes on the etherpads
  • Identification of working groups for further discussion and coding collaboration

These meetings are the primary way that we will be making sure the community is not blocked by our development efforts.

The purpose of these meetings is

  • to synchronize quickly so that we can connect the people who should be collaborating
  • eliminate blocking items for Crowbar contributors

Of course, we will attempt to record and post all of these meetings.

Stay tuned! We will likely announce additional meetings for community collaboration.

PS: These are design meetings, they are NOT Crowbar training meetings. Please consult for links and videos about learning Crowbar.

Quick turn OpenStack Essex on Crowbar (BOOM, now we’re at v1.4!)

Don’t blink if you’ve been watching the Crowbar release roadmap!

My team at Dell is about to turn another release of Crowbar. Version 1.3 released 5/14 (focused on Cloudera Apache Hadoop) and our original schedule showed several sprints of work on OpenStack Essex. Upon evaluation, we believe that the current community developed Essex barclamps are ready now.

The healthy state of the OpenStack Essex deployment is a reflection of 1) the quality of Essex and 2) our early community activity in creating deployments based Essex RC1 and Ubuntu Beta1.

We are planning many improvements to our OpenStack Essex and Crowbar Framework; however, most deployments can proceed without these enhancements.  This also enables participants in the 5/31 OpenStack Essex Deploy Day.

By releasing a core stable Essex reference deployment, we are accelerating field deployments and enabling the OpenStack ecosystem. In terms of previous posts, we are eliminating release interlocks to enable more downstream development. Ultimately, we hope that we are also creating a baseline OpenStack deployment.

We are also reducing the pressure to rush more disruptive Crowbar changes (like enabling high availability, adding multiple operating systems, moving to Rails 3, fewer crowbarisms in cookbooks and streamlining networking). With this foundational Essex release behind us (we call it an MVP), we can work on more depth and breadth of capability in OpenStack.

One small challenge, some of the changes that we’d expected to drop have been postponed slightly. Specifically, markdown based documentation (/docs) and some new UI pages (/network/nodes, /nodes/families). All are already in the product under but not wired into the default UI (basically, a split test).

On the bright side, we did manage to expose 10g networking awareness for barclamps; however, we have not yet refactored to barclamps to leverage the change.

Four alternatives to Process Interlock

Note: This is the third and final part of 3 part series about the “process interlock dilemma.”

In post 1, I’ve spelled out how evil Process Interlock causes well intentioned managers to add schedule risk and opportunity cost even as they appear to be doing the right thing. In post 2, I offered some alternative outcomes when process interlock is avoided. In this post, I attempt to provide alternatives to the allure of process interlock. We must have substitute interlocks types to replace our de facto standard because there are strong behavioral and traditional reasons to keep broken processes. In other words, process Interlock feels good because it gives you the illusion that your solution is needed and vital to other projects.

If your product is vital to another team then they should be able to leverage what you have, not what you’re planning to have.

We should focus on delivered code instead of future promises. I am not saying that roadmaps and projections are bad – I think they are essential. I am saying that roadmaps should be viewed as potential not as promises.

  1. No future commits (No interlock)

    The simplest way to operate without any process interlock is to never depend on other groups for future deliveries. This approach is best for projects that need to move quickly and have no tolerance for schedule risk. This means that your project is constrained to use the “as delivered” work product from all external groups. Depending on needs, you may further refine this as only rely on stable and released work.

    For example, OpenStack Cactus relied on features that were available in the interim 10.10 Ubuntu version. This allowed the project to advance faster, but also limited support because the OS this version was not a long term support (LTS) release.

  2. Smaller delivery steps (MVP interlock)

    Sometimes a new project really needs emerging capabilities from another project. In those cases, the best strategy is to identify a minimum viable feature set (or “product”) that needs to be delivered from the other project. The MVP needs to be a true minimum feature set – one that’s just enough to prove that the integration will work. Once the MVP has been proven, a much clearer understanding of the requirements will help determine the required amount of interlock. My objective with an MVP interlock is to find the true requirements because IMHO many integrations are significantly over specified.

    For example, the OpenStack Quantum project (really, any incubated OpenStack projects) focuses on delivering the core functionality first so that the ecosystem and other projects can start using it as soon as possible.

  3. Collaborative development (Shared interlock)

    A collaborative interlock is very productive when the need for integration is truly deep and complex. In this scenario, the teams share membership or code bases so that the needs of each team is represented in real time. This type of transparency exposes real requirements and schedule risk very quickly. It also allows dependent teams to contribute resources that accelerate delivery.

    For example, our Crowbar OpenStack team used this type of interlock with the Rackspace OpenStack team to ensure that we could get Diablo code delivered and deployed as fast as possible.

  4. Collaborative requirements (Fractal interlock)

    If you can’t collaborate or negotiate an MVP then you’re forced into working at the requirements level instead of development collaboration. You can think of this as a sprint-roadmap fast follow strategy because the interlocked teams are mutually evolving design requirements.

    I call this approach Fractal because you start at big concepts (road maps) and drill down to more and more detail (sprints) as the monitored project progresses. In this model, you interlock on a general capability initially and then work to refine the delivery as you learn more. The goal is to avoid starting delays or injecting false requirements that slow delivery.

    For example, if you had a product that required power from hamsters running in wheels then you’d start saying that you needed a small fast running animal. Over the next few sprints, you’d likely refine that down to four legged mammals and then to short tailed high energy rodents. Issues like nocturnal or bites operators could be addressed by the Hamster team or by the Wheel team as the issues arose. It could turn out that the right target (a red bull sipping gecko) surfaces during short tail rodent design review. My point is that you can avoid interlocks by allowing scope to evolve.

Breaking Process Interlocks delivers significant ROI

I have been trying to untangle both the cause and solution of process interlock for a long time. My team at Dell has an interlock-averse culture and it accelerates our work delivery. I write about this topic because I have real world experience that eliminating process interlocks increases

  1. team velocity
  2. collaboration
  3. quality
  4. return on investment

These are significant values that justify adoption of these non-interlock approachs; however, I have a more selfish motivation.

We want to work with other teams that are interlock-averse because the impacts multiply. Our team is slowed when others attempt to process interlock and accelerated when we are approached in the ways I list above.

I suspect that this topic deserves a book rather than a three part blog series and, perhaps, I will ultimately create one. Until then, I welcome your comments, suggestions and war stories.

How Good beats Great and avoids Process Interlock failure

Note: This is part 2 of a 3 part series about the “process interlock dilemma.”

This post addresses how to solve the Process Interlock dilemma I identified in part 1. It is critical to understand the failure of Process Interlock comes because the interlocks turn assumptions into facts. We must accept that any forward looking schedule is a guess. If your guesses are accurate then your schedule should be accurate. That type of insight and $5 will get you a Venti Carmel Frappuccino.

The problem of predicting the future and promising to deliver on that schedule results in one of two poor outcomes.

  1. The better poor outcome is that you are accurate and committed to a schedule.

    To keep on the schedule, you must focus on the committed deliverables. While this sounds ideal, there an opportunity cost to staying focused. Opportunity cost means that while your team is busy delivering on schedule, it is not doing work to pursue other opportunities. In a perfect world, your team picked the most profitable option before it committed the schedule. If you don’t live in a perfect world then it’s likely that while you are working on deliver you’ve learned about another opportunity. You may make your schedule but miss a more lucrative opportunity.

  2. The worse poor outcome is that you are not accurate and committed to a schedule.

    In that case, you miss both the opportunity you thought you had and the ones that you could not pursue while staying dedicated to your planning assumptions.

Let’s go back to our G.Mordler example and look at some better outcomes:

The “we’re going to try outcome.”

The Trans Ma’am team, Alpha, Omega and the supplier all get together and realize that the current design is not shippable; however, they realize that each team’s roadmaps converge within target time. To reduce interlocks, Omega takes Alpha in the low power form and begins integration. During integration, Omega identifies that Alpha can produce sufficient power for short periods of time travel but causes the exhaust vent of the power module to melt. Alpha determines that a change to the cooling system will address the problem. In consulting with their supplier, Alpha asks them to stop design on the new supply and adjust the current design as needed. The resulting time drive does not meet GM’s initial design for 4 hour time jumps, but is sufficient for lead footed mommies to retroactively avoid speeding tickets. GM decides it can still market the limited design.

The “we’re not ready outcome.

The Trans Ma’am team, Alpha and Omega all get together and realize that the current design is not shippable in their current state. While they cannot commit each realizes that there is a different market for their products: Alpha pursues dog poop power generation for high rises condo towers (aka brown energy) and Omega finds military applications for time travel nuclear submarines. In the experience gained from delivering products to these markets, Alpha improves power delivery by 20% and Omega improves efficiency by 20%. These modest mutual improvements allow Alpha to meet Omega’s requirement. While the combined product is too late for the target date, GM is able to incorporate the design into next design cycle.

While neither outcome delivers the desired feature at the original schedule, both provide better ROI for the company. One of the most common problems with process interlock is that we lost sight of ROI in our desire to meet an impractical objective.

Process interlock is a classic case of point optimization driving down system-wide performance.

If you’re interested in this effect, I recommend reading Eli Goldratt’s The Goal.

In the part, I’ve discussed some ways to escape from Process Interlock. I’ll talk about four alternative approaches in part 3 (to be published 3/16).

The Process Interlock Dilemma – where Roadmaps get lost and why Waterfalls suck

Note: This is part 1 of a 3 part series. I have been working on this series for nearly six months in an attempt to make this subtle but extremely expensive problem understandable. Rather than continue to polish the posts, I will post series for your enjoyment. I hope that it is enlightening, humorous or (ideally) both. Comments are welcome!

I’ve been struggling to explain a subtle process fail that occurs every day at my company (Dell) and also at every company I’ve ever worked with or for. I call this demon “Process Interlock” and it is the invisible bane of projects big and small. It manifests by forcing well-meaning product managers and engineering directors to make trade-offs that they know are wrong because of schedule commitments. It means that product quality consistently drops to the bottom of the list in favor of getting in that one promised feature. It shows up when customers get products late because of prospect who decided not to buy demanded a feature a year ago. These are the symptoms of the process interlock dilemma.

Process Interlock occurs when another team depends on your team for a future feature.

That sounds pretty innocuous right? It makes sense that other teams, customers and partners should be able to ask you about your roadmap and then build your delivery schedule into their plans. That is the perfectly logical request that happens inside my group every single day. Unfortunately, that exact commitment is what creates the problem because it locks your team’s velocity into the future and eliminates agility.

Note: I was reading chapter 11 in Eric Ries’ Lean Startup as was surprised to find him making very similar arguments but from a different perspective.

To hopefully help explain, I’m inventing a hypothetical project from the car division of the G.Mordler company. GM plans to add time travel as an option for their 2016 product line. They believe that there is a big market in minivan’s that can solve the proverbial “are we there yet problem” by simply skipping over the boring part of the trip. The trans-dimensional mommy mobile (or Trans Ma’am) will be part of a refresh of their 2014 model. The addition of a time circuit and power generator developed two internal divisions, Alpha and Omega, support a critical marketing event for the company so timing is important.

Let’s examine four outcomes of how these two divisions turn their assumed schedules into rigidly locked conundrum.

Scenario 0: Ideal Case.

Alpha makes the fusion power supply and Omega is making the time circuits. Based on experimental data, Omega’s design calls for 3.14 Gigawatts to operate their time capacitor; however, Alpha’s available design is limited to 0.73 Gigawatts. Alpha expects to reach 3.5 Gigawatts in 9 months when their supplier releases an updated nitrogen cooled super conductor. Based on that commitment, Omega has enough information to make an informed decision about their timeline. Since Alpha commits to deliver in 12 months (9 for the new part + 3 for development), Omega expects to deliver a working time circuit in 20 months (12 for the supply + 8 for development). In this example, there are 3 levels of Process Interlock: Alpha interlocks with the supplier and then Omega interlocks with Alpha. From a PERT schedule perspective, the world is now under control! It’s a brand new day and the birds are singing…

Scenario 1: Meet Schedule w/ Added Cost

Unfortunately, we now have a highly interlocked schedule. In the best case scenario (the one where we meet the schedule), Alpha has just signed up to meet an aggressive delivery timeframe. They have to put heavy pressure on the supplier to deliver their part which causes the supplier to increase the price for the cooler component. When their product manager identifies available alternative markets (such as power generating pet waste incineration), they are not able to purse the opportunities because they cannot risk the schedule impact of redirecting engineers. Meanwhile, Omega understands that a critical part is missing for 12 months and decides to reduce staffing while waiting for the needed part. In the process, they lose a key engineer who could have optimized the manufacturing process to half the production defect rate. Overall, the project meets schedule but at added cost, reduced quality and missed opportunities. This happened because the interlocks eliminated flexibility in the schedule for upstream and downstream participants. GM meets the launch window for the Trans Ma’am but high costs for the upgrade limit sales.

Scenario 2: Meet Schedule w/ Lost Features

A more likely “on schedule” alternative is that Alpha’s supplier cuts some corners to meet the aggressive deadline; consequently, power generation for Alpha is not reliable. This issue is not revealed by load testing in Alpha’s labs or short time travel testing by Omega. Instead, the faulty generators fail in integration field testing accidentally sending a DOT test driver home during rush hour traffic. Fixing the problem requires a redesign of the power plant. The new design does not fit into space allowed by the Trans Ma’am design team causing the entire program, while delivered “on time,” to be considered a failure and not shipped. GM misses the launch window for the Trans Ma’am.

Scenario 3:
Miss Schedule

In the most likely scenario the project is late. The schedule for Alpha slips because supplier requires an extra three months to meet the Alpha’s specs. In a common turn of fate, the supplier’s specs would be sufficient for Alpha to proceed; however, Alpha’s risk manager bumped up the cooling requirements by 20% in order to ensure they had wiggle room in their own design. Because of the supplier contract requiring delivery per spec, the supplier could not ship a workable but contractually unacceptable product. Since the part is delayed, Alpha has to slip the schedule to Omega. Compounding the problem, Alpha’s manager is optimistic that it will work out and does not alert Omega until 2 weeks before the deadline. Omega, who has been testing their circuits using liquid sodium cooled nuclear fission power plants, attempts to make up the schedule delay by imposing 20 hour Mountain Dew fueled work days. The aggressive schedule results in quality issues for the time circuits so that they can only be used during Mountain-time rebroadcasts of Seinfeld. After an unsuccessful bid to purchase the Denver cable TV station KDEV, GM misses the launch window for the Trans Ma’am.

I realize these examples are complicated, but I hope they humorously illuminate the problem.

In part 2, I’ll show an alternate approach for GM that addresses the process interlock.

Post Script

Of course, for this example, the entire project plan is a moot point since we’re talking about time machines! I’m offering two likely endings for the scenarios above:

The Pragmatists’ Ending: Once the project is finally complete, the manager simply drives the car back to the beginning of the project. Over white Russian martinis and sushi, her future self explains how the painful delivery schedule cost her the best years of her life causing her to quit. Her replacement cannot maintain funding for the project so it is eventually scraped by G.Mordler six months before the working pieces can be assembled.

The Realists’ Ending: Once the project is finally complete, the manager simply drives the car back to the beginning of the project. Over lemonade vodka tonics and tapas, her future self provides a USB stick with the critical design data needed to complete the project on time and budget. When she examines the data, the resulting time paradox creates a rift in the Einstein-Jacob space-time fabric thus ending the universe.