Your baby is ugly! Picking which code is required for Commercial Core.

babyThere’s no point in sugar-coating this: selecting API and code sections for core requires making hard choices and saying no.  DefCore makes this fair by 1) defining principles for selection, 2) going slooooowly to limit surprises and 3) being transparent in operation.  When you’re telling someone who their baby is not handsome enough you’d better be able to explain why.

The truth is that from DefCore’s perspective, all babies are ugly.  If we are seeking stability and interoperability, then we’re looking for adults not babies or adolescents.

Explaining why is exactly what DefCore does by defining criteria and principles for our decisions.  When we do it right, it also drives a positive feedback loop in the community because the purpose of designated sections is to give clear guidance to commercial contributors where we expect them to be contributing upstream.  By making this code required for Core, we are incenting OpenStack vendors to collaborate on the features and quality of these sections.

This does not lessen the undesignated sections!  Contributions in those areas are vital to innovation; however, they are, by design, more dynamic, specialized or single vendor than the designated areas.

Designated SectionsThe seven principles of designated sections (see my post with TC member Michael Still) as defined by the Technical Committee are:

Should be DESIGNATED:

  1. code provides the project external REST API, or
  2. code is shared and provides common functionality for all options, or
  3. code implements logic that is critical for cross-platform operation

Should NOT be DESIGNATED:

  1. code interfaces to vendor-specific functions, or
  2. project design explicitly intended this section to be replaceable, or
  3. code extends the project external REST API in a new or different way, or
  4. code is being deprecated

While the seven principles inform our choices, DefCore needs some clarifications to ensure we can complete the work in a timely, fair and practical way.  Here are our additions:

8.     UNdesignated by Default

  • Unless code is designated, it is assumed to be undesignated.
  • This aligns with the Apache license.
  • We have a preference for smaller core.

9.      Designated by Consensus

  • If the community cannot reach a consensus about designation then it is considered undesignated.
  • Time to reach consensus will be short: days, not months
  • Except obvious trolling, this prevents endless wrangling.
  • If there’s a difference of opinion then the safe choice is undesignated.

10.      Designated is Guidance

  • Loose descriptions of designated sections are acceptable.
  • The goal is guidance on where we want upstream contributions not a code inspection police state.
  • Guidance will be revised per release as part of the DefCore process.

In my next DefCore post, I’ll review how these 10 principles are applied to the Havana release that is going through community review before Board approval.

Supply Chain Transparency drives Open Source adoption, 6 reasons besides cost

Author’s note: If you don’t believe that software is manufactured then go directly to your TRS80, do not collect $200.

I’m becoming increasingly impatient with people stating that “open source is about free software” because it’s blatantly untrue as a primary driver for corporate adoption.   Adopting open source often requires companies (and individuals) to trade-off one cost (license expense) for another (building expertise).  It is exactly the same balance we make between insourcing, partnering and outsourcing.

Full Speed Ahead

When I probe companies about what motivates their use of open source, they universally talk about transparency of delivery, non-single-vendor ownership of the source and their ability to influence as critical selection factors.  They are generally willing to invest more to build expertise if it translates into these benefits.  Viewed in this light, licensed software or closed services both cost more and introduce significant business risks where open alternatives exist.

This is not new: its basic manufacturing applied to IT

We had this same conversation in the 90s around manufacturing as that industry joltingly shifted from batch to just-in-time (aka Lean) manufacturing.  The key driver for that transformation was improved integration and management of supply chains.   We review witty doctoral dissertations about inventory, drum-buffer-rope flow and economic order quantity; however, trust my summary that it all comes down to companies need supply chain transparency.

As technology becomes more and more integral to delivering any type of product, companies must extend their need for supply chain transparency into their IT systems too.   That does not mean that companies expect to self-generate (insource) all of their technology.  The goal is to manage the supply chain, not to own every step.   Smart companies find a balance between control of owning their supply (making it themselves) and finding a reliable supply (multi-source is preferred).  If you cannot trust your suppliers then you must create inventory buffers and rigid contracts.  Both of these defenses limit agility and drive systemic dysfunction.  This was the lesson learned from Lean Just-In-Time manufacturing.

What does this look like for IT supply chains?

A healthy supply chain allows companies to address these issues.  They can:

  1. Change vendors / suppliers and get equivalent supply
  2. Check the status of deliveries (features)
  3. Review and impact quality
  4. Take deliverables in small frequent batches
  5. Collaborate with suppliers to manage & control the process
  6. Get visibility into the pipeline

None of these items are specific to software; instead, they are general attributes of a strong supply chain.  In a closed system, companies lose these critical supply chain values.  While tightly integrated partnerships can provide these benefits, they carry a cost premium and inherently limit vendor choice.

This sounds great!  What’s the cost?

You need to consider the level of supply chain transparency that’s right for you.  Most companies are no more likely to refine their own metal than to build from pure open source repositories.  There are transparency benefits from open source even from a single supplier.  Yet in some cases like the OpenStack community, systems are so essential that they are warrant investing as core competencies and joining the contributing community.  Even in those cases, most rely on vendors to package and extend their chosen open source software.

But that misses the point: contributing to an open source project is not required in managing your IT supply chain.  Instead, you need to build the operational infrastructure and processes that is open source ready.  They may require investing in skills and capabilities related to underlying technologies like the operating system, database or configuration management.  For cloud, it is likely to require more investment fault-tolerant architecture and API driven deployment.  Companies that are strong in these skills are better able to manage an open source IT supply chain.  In fact, they are better able to manage any IT supply chain because they have more control.

So, it’s not about cost…

When considering motivations for open source adoption, cost (or technology sizzle) should not be the primary factor.  In my experience, the most successful implementations focus first about operational readiness and project stability, and program transparency.  These questions indicate companies are thinking with an IT supply chain focus.

PS: If you found this interesting, you’ll also like my upstream imperative post.

OpenStack’s Test Driven Core > it’s where I think “what is core” discussions are heading

THIS POST IS #7 IN A SERIES ABOUT “WHAT IS CORE.”

core lighthouse

In helping drive OpenStack’s “what is core” dialog, I’ve had the privilege of listening to a lot of viewpoints about what we are and should be.  Throughout the process, I’ve tried to put aside my positions and be an objective listener.  In this post, I’m expressing where I think this effort will lead us.

If OpenStack culture values implementation over API then our core definition should too.

How do we make a core definition that values implementation over API?  I think that our definition should also be based on what’s working in the field over qualitative definitional statements.  The challenge in defining core is to find a way to reinforce this culture in a quantifiable way.

The path forward lies in concrete decomposition (and not because you were talked to death on the sidewalk).

Concrete decomposition means breaking Core into small units for discussion like “is provisioning a single server critical?”  More importantly, we can use tests as the unit for decomposition.  Tests are gold when it comes to defining expected OpenStack behaviors.  In our tests, we have a description of which use-cases have been implemented.  Discussing those use-cases is much more finite than arguing over stable versus innovative development methodologies.

I believe that we are moving toward community tests playing an essential role in OpenStack.  As a believer in the value of BDD and CI, I think that placing high value on tests improves the project in fundamental ways beyond defining Core.  It creates a commercial motivation for contributors to add tests, inches us toward interoperability, and helps drive stability for users.  In these ways, using tests to measure OpenStack drives the right behaviors for the project.

Another consequence I anticipate is a new role for the User Committee (UC).  With a growing body of tests, the OpenStack Foundation needs a way to figure out the subset of tests which are required.  While the Technical Committee (TC) should demand a comprehensive suite of tests for all projects, they lack the perspective to figure out which use-cases are being implemented by our user base.  Gathering that data is already the domain of the UC so asking them to match implemented use-cases to tests seems like a natural extension of their role.

By having data supporting the elevation of tests to must-pass status, I envision a definition of Core that is based on how OpenStack is implemented.  That, in turn, will help drive our broader interoperability objectives.

OpenStack steps toward Interopability with Temptest, RAs & RefStack.org

Pipes are interoperableI’m a cautious supporter of OpenStack leading with implementation (over API specification); however, it clearly has risks. OpenStack has the benefit of many live sites operating at significant scale. The short term cost is that those sites were not fully interoperable (progress is being made!). Even if they were, we are lack the means to validate that they are.

The interoperability challenge was a major theme of the Havana Summit in Portland last week (panel I moderated) .  Solving it creates significant benefits for the OpenStack community.  These benefits have significant financial opportunities for the OpenStack ecosystem.

This is a journey that we are on together – it’s not a deliverable from a single company or a release that we will complete and move on.

There were several themes that Monty and I presented during Heat for Reference Architectures (slides).  It’s pretty obvious that interop is valuable (I discuss why you should care in this earlier post) and running a cloud means dealing with hardware, software and ops in equal measures.  We also identified lots of important items like Open OperationsUpstreamingReference Architecture/Implementation and Testing.

During the session, I think we did a good job stating how we can use Heat for an RA to make incremental steps.   and I had a session about upgrade (slides).

Even with all this progress, Testing for interoperability was one of the largest gaps.

The challenge is not if we should test, but how to create a set of tests that everyone will accept as adequate.  Approach that goal with standardization or specification objective is likely an impossible challenge.

Joshua McKenty & Monty Taylor found a starting point for interoperability FITS testing: “let’s use the Tempest tests we’ve got.”

We should question the assumption that faithful implementation test specifications (FITS) for interoperability are only useful with a matching specification and significant API coverage.  Any level of coverage provides useful information and, more importantly, visibility accelerates contributions to the test base.

I can speak from experience that this approach has merit.  The Crowbar team at Dell has been including OpenStack Tempest as part of our reference deployment since Essex and it runs as part of our automated test infrastructure against every build.  This process does not catch every issue, but passing Tempest is a very good indication that you’ve got the a workable OpenStack deployment.

The Atlantic magazine explains why Lean process rocks (and saves companies $$)

GearsI’m certain that the Atlantic‘s Charles Fishman was not thinking software and DevOps when he wrote the excellent article about “The Insourcing Boom.”  However, I strongly recommend reading this report for anyone who is interested in a practical example of the inefficiencies of software lean process (If you are impatient, jump to page 2 and search for toaster).

It’s important to realize that this article is not about software! It’s an article about industrial manufacturing and the impact that lean process has when you are making stuff.  It’s about how US companies are using Lean to make domestic plants more profitable than Asian ones.  It turns out that how you make something really matters – you can’t really optimize the system if you treat major parts like a black box.

When I talk about Agile and Lean, I am talking about proven processes being applied broadly to companies that want to make profit selling stuff. That’s what this article is about

If you are making software then you are making stuff! Your install and deploy process is your assembly line. Your unreleased code is your inventory.

This article does a good job explaining the benefits of being close to your manufacturing (DevOps) and being flexible in deployment (Agile) and being connected to customers (Lean).  The software industry often acts like it’s inventing everything from scratch. When it comes to manufacturing processes, we can learn a lot from industry.

Unlike software, industry has real costs for scrap and lost inventory. Instead of thinking “old school” perhaps we should be thinking of it as the school of hard knocks.

Open Source is The Power of We (Blog Action Day)

This post is part of a world wide “blog action day” where thousands of bloggers post their unique insights about a single theme. For 2012, it’s the “power of we is as a celebration of people working together to make a positive difference in the world, either for their own communities or for people they will never meet half way around the world.”

I’ve choosing open source software because I think that we are establishing models for building ideas collaboratively that can be extended beyond technology into broader use. The way we solve open source challenges translates broadly because we are the tool makers of the global interaction.

I started using open source¹ as a way to solve a problem; I did not understand community or how groups of loosely connected people came together to create something new. Frankly, the whole process of creating free software seemed to be some hybrid combination of ninja coders and hippy hackers. That changed when I got involve on the ground floor of the OpenStack project (of which I am now a Foundation board member).

I was not, could not have been, prepared for the power and reality of community and collaboration that fuels OpenStack and other projects. We have the same problems as any non-profit project except that we are technologists: we can make new tools to solve our teaming and process problems.

It is not just that open source projects solve problems that help people. The idea of OpenStack and Hadoop being used by medical researches to find cures for cancer is important; however, the learning how to build collaboratively is another critical dimension. Our world is getting more connected and interconnected by technology, but the actual tools for social media are only in their earliest stages.

Not only are the tools evolving, the people using the tools are changing too! We are training each other to work together in ways that were beyond our imagine even 10 years ago. It’s the combination of both new technology and new skills that is resetting the rules for collaboration.

Just a few years ago, open source technology was considered low quality, risky and fringe. Today, open source projects like OpenStack and Hadoop are seen as more innovative and equally secure and supportable compared to licensed products. This transformation represents a surprising alignment and collaboration between individuals and entities that would normally be competing. While the motivation for this behavior comes from many sources, we all share the desire to do collaborative effectively.

I don’t think that we have figured out how to really do this the best way yet. We are making progress and getting better and better. We are building tools (like etherpad, wikis, irc, twitter, github, jenkins, etc) that improve collaboration. We are also learning building a culture of collaboration.

Right now, I’m on a train bound for the semi-annual OpenStack summit that brings a world wide audience together for 4½ days of community work. The discussions will require a new degree of openness from people and companies that are normally competitive and secretive about product development. During the summit, we’ll be doing more than designing OpenStack, we will be learning the new skills of working together. Perhaps those are the most important deliverables.

Open source projects combination of both new technologies and new skills creates the Power of We.

——————

PS¹: Open source software is a growing class of applications in which the authors publish the instructions for running the software publicly so that other people can use the software. Sometimes (but not always) this includes a usage license that allows other people to run the software without paying the author royalties. In many cases, the author’s motivation is that other users will help them test, modify and improve the software so that improves more quickly than a single creator could do alone.

Continuous Release combats disruptions of “Free Fall” development

Since I posted the “Free Fall” development post, I’ve been thinking a bit about the pros and cons of this type of off-release development.

The OpenStack Swift project does not do free fall because they are on a constant “ship ready” state for the project and only loosely flow the broader OpenStack release track.  My team at Dell also has minimal free fall development because we have a more frequent release clock and choose to have the team focus together through dev/integrate/harden cycles as much as possible.

From a Lean/Agile/CI perspective, I would work to avoid hidden development where possible.  New features are introduced by split test (they are in the code, but not active for most users) so that the all changes in incremental.  That means that refactoring, rearchitecture and new capabilities appear less disruptively.  While it may this approach appears to take more effort in the short term; my experience is that it accelerates delivery because we are less likely to over develop code.

Unfortunately, free fall development has the opposite effect.  Having code that appears in big blocks is contrary to best practices in my opinion.  Further, it rewards groups that work asynchronously and

While I think that OpenStack benefits from free fall work, I think that it is ultimately counter-productive.

Ballistic Release Cycles: Tracking the Trajectory of OpenStack Milestones

I’ve been watching a pattern emerge on the semiannual OpenStack release cycles for a while now. There is a hidden but crucial development phase that accelerates projects faster than many observers realize. In fact, I believe that substantial work is happening outside of the “normal” design cycle during what I call “free fall” development.

Understanding when the cool, innovative stuff happens is essential to getting (and giving) the most from OpenStack.

The published release cycle looms like a 6 stage ballistic trajectory. Launching at the design summit, the release features change and progress the most in the first 3 milestones. At the apogee of the release, maximum velocity is reached just as we start having to decide which features are complete enough to include in the release. Since many are not ready, we have to jettison (really, defer) partial work to ensure that we can land the release on schedule.

I think of the period where we lose potential features as free fall because thing can go in any direction. The release literally reverses course: instead of expanding, it is contracting. This process is very healthy for OpenStack. It favors code stability and “long” hardening times. For operators, this means that the code stops changing early enough that we have more time to test and operationalize the release.

But what happens to the jettisoned work? In free fall, objects in motion stay in motion. The code does not just disappear! It continues on its original upward trajectory.

The developers who invested time in the code do not simply take a 3 month sabbatical, nor do they stop their work and start testing the code that was kept. No, after the short in/out sorting pause, the free fall work continues onward with rockets blasting. The challenge is that it is now getting outside of the orbit of the release plan and beyond the radar of many people who are tracking the release.

The consequence of this ongoing development is that developers (and the features they are working on) show up at the summit with 3 extra months of work completed. It also means that OpenStack starts each release cycle with a bucket of operationally ready code. Wow, that’s a huge advantage for the project in terms of delivered work, feature velocity and innovation. Even better, it means that the design summit can focus on practical discussions of real prototypes and functional features.

Unfortunately, this free fall work has hidden costs:

  • It is relatively hidden because it is outside of the normal release cycle.
  • It makes true design discussions less productive because the implemented code is more likely to make the next release cycle
  • Integration for the work is postponed because it continues before branching
  • Teams that are busy hardening a core feature can be left out of work on the next iteration of the same feature
  • Forking can make it hard to capture bugs caught during hardening

I think OpenStack greatly benefits from free fall development; consequently, I think we need to acknowledge and embrace it to reduce its costs. A more explicit mid-release design synchronization when or before we fork may help make this hidden work more transparent.

Our Vision for Crowbar – taking steps towards closed loop operations

When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.

Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.

The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.

System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.

We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.

While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.

Key strength areas for Crowbar

  1. Late binding – hardware and network configuration is held until software configuration is known.  This is a huge system concept.
  2. Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
  3. System Perspective – no Application is an island.  You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
  4. Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
  5. Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone.  We need to get a much broader net of environments and thinking involved.

Continuing Areas of Leadership

  1. Open / Lean / Incremental Architecture – these are core aspects of our approach.  While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
  2. Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
  3. Integrated networking – software defined networking is cool, but not enough.  We need to have semantics that link applications, networks and infrastructure together.
  4. Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
  5. Scale / Hybrid – the key element to hybrid is scale and to hybrid is scale.  The missing connection is being able to close the loop.
  6. Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.

Crowbar 2.0 Objectives: Scalable, Heterogeneous, Flexible and Connected

The seeds for Crowbar 2.0 have been in the 1.x code base for a while and were recently accelerated by SuSE.  With the Dell | Cloudera 4 Hadoop and Essex OpenStack-powered releases behind us, we will now be totally focused bringing these seeds to fruition in the next two months.

Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.

All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.

Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:

  1. simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
    1. reduce the initial effort required to leverage Crowbar
    2. opens Crowbar to a broader audience (see Upstreaming)
  2. provide heterogeneous / multiple operating system deployments. This enables:
    1. multiple versions of the same OS running for upgrades
    2. different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
    3. accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
    4. UEFI booting in Sledgehammer
  3. strengthen networking abstractions
    1. allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
    2. better manage connected operations
    3. enable pull-from-source deployments that are ahead of (or forked from) available packages.
  4. improvements in Crowbar’s core database and state machine to enable
    1. larger scale concerns
    2. controlled production migrations and upgrades
  5. other important items
    1. make documentation more coupled to current features and easier to maintain
    2. upgrade to Rails 3 to simplify code base, security and performance
    3. deepen automated test coverage and capabilities

Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.

We will kick off to community part of this effort with an online review on 7/16 (details).

PS: why a refactoring?

My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.

We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.