Patchwork Onion delivers stability & innovation: the graphics that explains how we determine OpenStack Core

This post was coauthored by the DefCore chairs, Rob Hirschfeld & Joshua McKenty.

The OpenStack board, through the DefCore committee, has been working to define “core” for commercial users using a combination of minimum required capabilities (APIs) and code (Designated Sections).  These minimums are decided on a per project basis so it can be difficult to visualize the impact on the overall effect on the Integrated Release.

Patchwork OnionWe’ve created the patchwork onion graphic to help illustrate how core relates to the integrated release.  While this graphic is pretty complex, it was important to find a visual way to show how different DefCore identifies distinct subsets of APIs and code from each project.  This graphic tries to show how that some projects have no core APIs and/or code.

For OpenStack to grow, we need to have BOTH stability and innovation.  We need to give clear guidance to the community what is stable foundation and what is exciting sandbox.  Without that guidance, OpenStack is perceived as risky and unstable by users and vendors. The purpose of defining “Core” is to be specific in addressing that need so we can move towards interoperability.

Interoperability enables an ecosystem with multiple commercial vendors which is one of the primary goals of the OpenStack Foundation.

Ecosystem OnionOriginally, we thought OpenStack would have “core” and “non-core” projects and we baked that expectation into the bylaws.  As we’ve progressed, it’s clear that we need a less binary definition.  Projects themselves have a maturity cycle (ecosystem -> incubated -> integrated) and within the project some APIs are robust and stable while others are innovative and fluctuating.

Encouraging this mix of stabilization and innovation has been an important factor in our discussions about DefCore.  Growing the user base requires encouraging stability and growing the developer base requires enabling innovation within the same projects.

The consequence is that we are required to clearly define subsets of capabilities (APIs) and implementation (code) that are required within each project.  Designating 100% of the API or code as Core stifles innovation because stability dictates limiting changes while designating 0% of the code (being API only) lessens the need to upstream.  Core reflects the stability and foundational nature of the code; unfortunately, many people incorrectly equate “being core” with the importance of the code, and politics ensues.

To combat the politics, DefCore has taken a transparent, principles-based approach to selecting core.   You can read about in Rob’s upcoming “Ugly Babies” post (check back on 8/14) .

7 Open Source lessons from your English Composition class

We often act as if coding, and especially open source coding, is a unique activity and that’s hubris.   Most human activities follow common social patterns that should inform how we organize open source projects.  For example, research papers are very social and community connected activities.  Especially when published, written compositions are highly interconnected activities.  Even the most basic writing builds off other people’s work with due credit and tries create something worth being used by later authors.

Here are seven principles to good writing that translate directly to good open source development:

  1. Research before writing – take some time to understand the background and goals of the project otherwise you re-invent or draw bad conclusions.
  2. Give credit where due – your work has more credibility when you acknowledge and cross-reference the work you are building on. It also shows readers that you are not re-inventing.
  3. Follow the top authors – many topics have widely known authors who act as “super nodes” in the relationship graph. Recognizing these people will help guide your work, leads to better research and builds community.
  4. Find proof readers – All writers need someone with perspective to review their work before it’s finished. Since we all need reviewers, we all also need to do reviews.
  5. Rework to get clarity – Simplicity and clarity take extra effort but they pay huge dividends for your audience.
  6. Don’t surprise your reader – Readers expect patterns and are distracted when you don’t follow them.
  7. Socialize your ideas – the purpose of writing/code is to make ideas durable. If it’s worth writing then it’s worth sharing.  Your artifact does not announce itself – you need to invest time in explaining it to people and making it accessible.

Thanks to Sean Roberts (a Hidden Influences collaborator) for his contributions to this post.  At OSCON, Sean Roberts said “companies should count open source as research [and development investment]” and I thought he’s said “…as research [papers].”  The misunderstanding was quickly resolved and we were happy to discover that both interpretations were useful.

The Upstream Imperative: paving the way for content creators is required for platform success

Since content is king, platform companies (like Google, Microsoft, Twitter, Facebook and Amazon) win by attracting developers to build on their services.  Open source tooling and frameworks are the critical interfaces for these adopters; consequently, they must invest in building communities around those platforms even if it means open sourcing previously internal only tools.

This post expands on one of my OSCON observations: companies who write lots of code have discovered an imperative to upstream their internal projects.   For background, review my thoughts about open source and supply chain management.

Huh?  What is an “upstream imperative?”  It sounds like what salmon do during spawning then read the post-script!

Historically, companies with a lot of internal development tools had no inventive to open those projects.  In fact, the “collaboration tax” of open source discouraged companies from sharing code for essential operations.   Historically, open source was considered less featured and slower than commercial or internal projects; however, this perception has been totally shattered.  So companies are faced with a balance between the overhead of supporting external needs (aka collaboration) and the innovation those users bring into the effort.

Until recently, this balance usually tipped towards opening a project but under-investing in the community to keep the collaboration costs low.  The change I saw at OSCON is that companies understand that making open projects successful bring communities closer to their products and services.

That’s a huge boon to the overall technology community.

Being able to leverage and extend tools that have been proven by these internal teams strengthens and accelerates everyone. These communities act as free laboratories that breed new platforms and build deep relationships with critical influencers.  The upstream savvy companies see returns from both innovation around their tools and more content that’s well matched to their platforms.

Oh, and companies that fail to upstream will find it increasingly hard to attract critical mind share.  Thinking the alternatives gives us a Windows into how open source impacts past incumbents.

That leads to a future post about how XaaS dog fooding and “pure-play” aaS projects like OpenStack and CloudFoundry.

Post Script about Upstreaming:

Continue reading

4 item OSCON report: no buzz winner, OpenStack is DownStack?, Free vs Open & the upstream imperative

Now that my PDX Trimet pass expired, it’s time to reflect on OSCON 2014.   Unfortunately, I did not ride my unicorn home on a rainbow; this year’s event seemed to be about raising red flags.

My four key observations:

  1. No superstar. Past OSCONs had at least one buzzy community super star.  2013 was Docker and 2011 was OpenStack.  This was not just my hallway track perception, I asked around about this specifically.  There was no buzz winner in 2014.
  2. People were down on OpenStack (“DownStack”). Yes, we did have a dedicated “Open Cloud Day” event but there was something missing.  OpenSack did not sponsor and there were no major parties or releases (compared to previous years) and little OpenStack buzz.  Many people I talked to were worried about the direction of the community, fragmentation of the project and operational readiness.  I would be more concerned about “DownStack” except that no open infrastructure was a superstar either (e.g.: Mesos, Kubernetes and CoreOS).  Perhaps, OSCON is simply not a good venue open infrastructure projects compared to GlueCon or Velocity?  Considering the rapid raise of container-friendly OpenStack alternatives; I think the answer may be that the battle lines for open infrastructure are being redrawn.
  3. Free vs. Open. Perhaps my perspective is more nuanced now (many in open source communities don’t distinguish between Free and Open source) but there’s a real tension between Free (do what you want) and Open (shared but governed) source.  Now that open source is a commercial darling, there is a lot of grumbling in the Free community about corporate influence and heavy handedness.   I suspect this will get louder as companies try to find ways to maintain control of their projects.
  4. Corporate upstreaming becomes Imperative. There’s an accelerating upstreaming trend for companies that write lots of code to support their primary business (Google is a primary example) to ensure that code becomes used outside their company.   They open their code and start efforts to ensure its adoption.  This requires a dedicated post to really explain.

There’s a clear theme here: Open source is going mainstream corporate.

We’ve been building amazing software in the open that create real value for companies.  Much of that value has been created organically by well-intentioned individuals; unfortunately, that model will not scale with the arrival for corporate interests.

Open source is thriving not dying: these companies value the transparency, collaboration and innovation of open development.  Instead, open source is transforming to fit within corporate investment and governance needs.  It’s our job to help with that metamorphosis.

DefCore Advances at the Core > My take on the OSCON’14 OpenStack Board Meeting

Last week’s day-long Board Meeting (Jonathan’s summary) focused on three major topics: DefCore, Contribute Licenses (CLA/DCO) and the “Win the Enterprise” initiative. In some ways, these three topics are three views into OpenStack’s top issue: commercial vs. individual interests.

But first, let’s talk about DefCore!

DefCore took a major step with the passing of the advisory Havana Capabilities (the green items are required). That means that vendors in the community now have a Board approved minimum requirements.  These are not enforced for Havana so that the community has time to review and evaluate.

Designated Sections (1)For all that progress, we only have half of the Havana core definition complete. Designated Sections, the other component of Core, will be defined by the DefCore committee for Board approval in September. Originally, we expected the TC to own this part of the process; however, they felt it was related to commercial interested (not technical) and asked for the Board to manage it.

The coming meetings will resolve the “is Swift code required” question and that topic will require a dedicated post.  In many ways, this question has been the challenge for core definition from the start.  If you want to join the discussion, please subscribe to the DefCore list.

The majority of the board meeting was spent discussion other weighty topics that are work a brief review.

Contribution Licenses revolve around developer vs broader community challenge. This issue is surprisingly high stakes for many in the community. I see two primary issues

  1. Tension between corporate (CLA) vs. individual (DCO) control and approval
  2. Concern over barriers to contribution (sadly, there are many but this one is in the board’s controls)

Win the Enterprise was born from product management frustration and a fragmented user base. My read on this topic is that we’re pushing on the donkey. I’m hearing serious rumbling about OpenStack operability, upgrade and scale.  This group is doing a surprisingly good job of documenting these requirements so that we will have an official “we need this” statement. It’s not clear how we are going to turn that statement into either carrots or sticks for the donkey.

Overall, there was a very strong existential theme for OpenStack at this meeting: are we a companies collaborating or individuals contributing?  Clearly, OpenStack is both but the proportions remain unclear.

Answering this question is ultimately at the heart of all three primary topics. I expect DefCore will be on the front line of this discussion over the next few weeks (meeting 1, 2, and 3). Now is the time to get involved if you want to play along.

Boot me up! out-of-band IPMI rocks then shuts up and waits

It’s hard to get excited about re-implementing functionality from v1 unless the v2 happens to also be freaking awesome.   It’s awesome because the OpenCrowbar architecture allows us to it “the right way” with real out-of-band controls against the open WSMAN APIs.

gangnam styleWith out-of-band control, we can easily turn systems on and off using OpenCrowbar orchestration.  This means that it’s now standard practice to power off nodes after discovery & inventory until they are ready for OS installation.  This is especially interesting because many servers RAID and BIOS can be configured out-of-band without powering on at all.

Frankly, Crowbar 1 (cutting edge in 2011) was a bit hacky.  All of the WSMAN control was done in-band but looped through a gateway on the admin server so we could access the out-of-band API.  We also used the vendor (Dell) tools instead of open API sets.

That means that OpenCrowbar hardware configuration is truly multi-vendor.  I’ve got Dell & SuperMicro servers booting and out-of-band managed.  Want more vendors?  I’ll give you my shipping address.

OpenCrowbar does this out of the box and in the open so that everyone can participate.  That’s how we solve this problem as an industry and start to cope with hardware snowflaking.

And this out-of-band management gets even more interesting…

Since we’re talking to servers out-of-band (without the server being “on”) we can configure systems before they are even booted for provisioning.  Since OpenCrowbar does not require a discovery boot, you could pre-populate all your configurations via the API and have the Disk and BIOS settings ready before they are even booted (for models like the Dell iDRAC where the BMCs start immediately on power connect).

Those are my favorite features, but there’s more to love:

  • the new design does not require network gateway (v1 did) between admin and bmc networks (which was a security issue)
  • the configuration will detect and preserves existing assigned IPs.  This is a big deal in lab configurations where you are reusing the same machines and have scripted remote consoles.
  • OpenCrowbar offers an API to turn machines on/off using the out-of-band BMC network.
  • The system detects if nodes have IPMI (VMs & containers do not) and skip configuration BUT still manage to have power control using SSH (and could use VM APIs in the future)
  • Of course, we automatically setup BMC network based on your desired configuration

 

Ops Bridges > Building a Sharable Ops Infrastructure with Composable Tool Chain Orchestration

This posted started from a discussion with Judd Maltin that he documented in a post about “wanting a composable run deck.”

Fitz and Trantrums: Breaking the Chains of LoveI’ve had several conversations comparing OpenCrowbar with other “bare metal provisioning” tools that do thing like serve golden images to PXE or IPXE server to help bootstrap deployments.  It’s those are handy tools, they do nothing to really help operators drive system-wide operations; consequently, they have a limited system impact/utility.

In building the new architecture of OpenCrowbar (aka Crowbar v2), we heard very clearly to have “less magic” in the system.  We took that advice very seriously to make sure that Crowbar was a system layer with, not a replacement to, standard operations tools.

Specifically, node boot & kickstart alone is just not that exciting.  It’s a combination of DHCP, PXE, HTTP and TFTP or DHCP and an IPXE HTTP Server.   It’s a pain to set this up, but I don’t really get excited about it anymore.   In fact, you can pretty much use open ops scripts (Chef) to setup these services because it’s cut and dry operational work.

Note: Setting up the networking to make it all work is perhaps a different question and one that few platforms bother talking about.

So, if doing node provisioning is not a big deal then why is OpenCrowbar important?  Because sustaining operations is about ongoing system orchestration (we’d say an “operations model“) that starts with provisioning.

It’s not the individual services that’s critical; it’s doing them in a system wide sequence that’s vital.

Crowbar does NOT REPLACE the services.  In fact, we go out of our way to keep your proven operations tool chain.  We don’t want operators to troubleshoot our IPXE code!  We’d much rather use the standard stuff and orchestrate the configuration in a predicable way.

In that way, OpenCrowbar embraces and composes the existing operations tool chain into an integrated system of tools.  We always avoid replacing tools.  That’s why we use Chef for our DSL instead of adding something new.

What does that leave for Crowbar?  Crowbar is providing a physical infratsucture targeted orchestration (we call it “the Annealer”) that coordinates this tool chain to work as a system.  It’s the system perspective that’s critical because it allows all of the operational services to work together.

For example, when a node is added then we have to create v4 and v6 IP address entries for it.  This is required because secure infrastructure requires reverse DNS.  If you change the name of that node or add an alias, Crowbar again needs to update the DNS.  This had to happen in the right sequence.  If you create a new virtual interface for that node then, again, you need to update DNS.   This type of operational housekeeping is essential and must be performed in the correct sequence at the right time.

The critical insight is that Crowbar works transparently alongside your existing operational services with proven configuration management tools.  Crowbar connects links in your tool chain but keeps you in the driver’s seat.