Your baby is ugly! Picking which code is required for Commercial Core.

babyThere’s no point in sugar-coating this: selecting API and code sections for core requires making hard choices and saying no.  DefCore makes this fair by 1) defining principles for selection, 2) going slooooowly to limit surprises and 3) being transparent in operation.  When you’re telling someone who their baby is not handsome enough you’d better be able to explain why.

The truth is that from DefCore’s perspective, all babies are ugly.  If we are seeking stability and interoperability, then we’re looking for adults not babies or adolescents.

Explaining why is exactly what DefCore does by defining criteria and principles for our decisions.  When we do it right, it also drives a positive feedback loop in the community because the purpose of designated sections is to give clear guidance to commercial contributors where we expect them to be contributing upstream.  By making this code required for Core, we are incenting OpenStack vendors to collaborate on the features and quality of these sections.

This does not lessen the undesignated sections!  Contributions in those areas are vital to innovation; however, they are, by design, more dynamic, specialized or single vendor than the designated areas.

Designated SectionsThe seven principles of designated sections (see my post with TC member Michael Still) as defined by the Technical Committee are:

Should be DESIGNATED:

  1. code provides the project external REST API, or
  2. code is shared and provides common functionality for all options, or
  3. code implements logic that is critical for cross-platform operation

Should NOT be DESIGNATED:

  1. code interfaces to vendor-specific functions, or
  2. project design explicitly intended this section to be replaceable, or
  3. code extends the project external REST API in a new or different way, or
  4. code is being deprecated

While the seven principles inform our choices, DefCore needs some clarifications to ensure we can complete the work in a timely, fair and practical way.  Here are our additions:

8.     UNdesignated by Default

  • Unless code is designated, it is assumed to be undesignated.
  • This aligns with the Apache license.
  • We have a preference for smaller core.

9.      Designated by Consensus

  • If the community cannot reach a consensus about designation then it is considered undesignated.
  • Time to reach consensus will be short: days, not months
  • Except obvious trolling, this prevents endless wrangling.
  • If there’s a difference of opinion then the safe choice is undesignated.

10.      Designated is Guidance

  • Loose descriptions of designated sections are acceptable.
  • The goal is guidance on where we want upstream contributions not a code inspection police state.
  • Guidance will be revised per release as part of the DefCore process.

In my next DefCore post, I’ll review how these 10 principles are applied to the Havana release that is going through community review before Board approval.

7 Open Source lessons from your English Composition class

We often act as if coding, and especially open source coding, is a unique activity and that’s hubris.   Most human activities follow common social patterns that should inform how we organize open source projects.  For example, research papers are very social and community connected activities.  Especially when published, written compositions are highly interconnected activities.  Even the most basic writing builds off other people’s work with due credit and tries create something worth being used by later authors.

Here are seven principles to good writing that translate directly to good open source development:

  1. Research before writing – take some time to understand the background and goals of the project otherwise you re-invent or draw bad conclusions.
  2. Give credit where due – your work has more credibility when you acknowledge and cross-reference the work you are building on. It also shows readers that you are not re-inventing.
  3. Follow the top authors – many topics have widely known authors who act as “super nodes” in the relationship graph. Recognizing these people will help guide your work, leads to better research and builds community.
  4. Find proof readers – All writers need someone with perspective to review their work before it’s finished. Since we all need reviewers, we all also need to do reviews.
  5. Rework to get clarity – Simplicity and clarity take extra effort but they pay huge dividends for your audience.
  6. Don’t surprise your reader – Readers expect patterns and are distracted when you don’t follow them.
  7. Socialize your ideas – the purpose of writing/code is to make ideas durable. If it’s worth writing then it’s worth sharing.  Your artifact does not announce itself – you need to invest time in explaining it to people and making it accessible.

Thanks to Sean Roberts (a Hidden Influences collaborator) for his contributions to this post.  At OSCON, Sean Roberts said “companies should count open source as research [and development investment]” and I thought he’s said “…as research [papers].”  The misunderstanding was quickly resolved and we were happy to discover that both interpretations were useful.

Back of the Napkin to Presentation in 30 seconds

I wanted to share a handy new process for creating presentations that I’ve been using lately that involves using cocktail napkins, smart phones and Google presentations.

Here’s the Process:

  1. sketch an idea out with my colleagues on a napkin, whiteboard or notebook during our discussion.
  2. snap a picture and upload it to my Google drive from my phone,
  3. import the picture into my presentation using my phone,
  4. tell my team that I’ve updated the presentation using Slack on my phone.

Clearly, this is not a finished presentation; however, it does serve to quickly capture critical content from a discussion without disrupting the flow of ideas.  It also alerts everyone that we’re adding content and helps frame what that content will be as we polish it.  When we immediately position the napkin into a deck, it creates clear action items and reference points for the team.

While blindingly simple, having a quick feedback loop and visual placeholders translates into improved team communication.

The Upstream Imperative: paving the way for content creators is required for platform success

Since content is king, platform companies (like Google, Microsoft, Twitter, Facebook and Amazon) win by attracting developers to build on their services.  Open source tooling and frameworks are the critical interfaces for these adopters; consequently, they must invest in building communities around those platforms even if it means open sourcing previously internal only tools.

This post expands on one of my OSCON observations: companies who write lots of code have discovered an imperative to upstream their internal projects.   For background, review my thoughts about open source and supply chain management.

Huh?  What is an “upstream imperative?”  It sounds like what salmon do during spawning then read the post-script!

Historically, companies with a lot of internal development tools had no inventive to open those projects.  In fact, the “collaboration tax” of open source discouraged companies from sharing code for essential operations.   Historically, open source was considered less featured and slower than commercial or internal projects; however, this perception has been totally shattered.  So companies are faced with a balance between the overhead of supporting external needs (aka collaboration) and the innovation those users bring into the effort.

Until recently, this balance usually tipped towards opening a project but under-investing in the community to keep the collaboration costs low.  The change I saw at OSCON is that companies understand that making open projects successful bring communities closer to their products and services.

That’s a huge boon to the overall technology community.

Being able to leverage and extend tools that have been proven by these internal teams strengthens and accelerates everyone. These communities act as free laboratories that breed new platforms and build deep relationships with critical influencers.  The upstream savvy companies see returns from both innovation around their tools and more content that’s well matched to their platforms.

Oh, and companies that fail to upstream will find it increasingly hard to attract critical mind share.  Thinking the alternatives gives us a Windows into how open source impacts past incumbents.

That leads to a future post about how XaaS dog fooding and “pure-play” aaS projects like OpenStack and CloudFoundry.

Post Script about Upstreaming:

Continue reading

Boot me up! out-of-band IPMI rocks then shuts up and waits

It’s hard to get excited about re-implementing functionality from v1 unless the v2 happens to also be freaking awesome.   It’s awesome because the OpenCrowbar architecture allows us to it “the right way” with real out-of-band controls against the open WSMAN APIs.

gangnam styleWith out-of-band control, we can easily turn systems on and off using OpenCrowbar orchestration.  This means that it’s now standard practice to power off nodes after discovery & inventory until they are ready for OS installation.  This is especially interesting because many servers RAID and BIOS can be configured out-of-band without powering on at all.

Frankly, Crowbar 1 (cutting edge in 2011) was a bit hacky.  All of the WSMAN control was done in-band but looped through a gateway on the admin server so we could access the out-of-band API.  We also used the vendor (Dell) tools instead of open API sets.

That means that OpenCrowbar hardware configuration is truly multi-vendor.  I’ve got Dell & SuperMicro servers booting and out-of-band managed.  Want more vendors?  I’ll give you my shipping address.

OpenCrowbar does this out of the box and in the open so that everyone can participate.  That’s how we solve this problem as an industry and start to cope with hardware snowflaking.

And this out-of-band management gets even more interesting…

Since we’re talking to servers out-of-band (without the server being “on”) we can configure systems before they are even booted for provisioning.  Since OpenCrowbar does not require a discovery boot, you could pre-populate all your configurations via the API and have the Disk and BIOS settings ready before they are even booted (for models like the Dell iDRAC where the BMCs start immediately on power connect).

Those are my favorite features, but there’s more to love:

  • the new design does not require network gateway (v1 did) between admin and bmc networks (which was a security issue)
  • the configuration will detect and preserves existing assigned IPs.  This is a big deal in lab configurations where you are reusing the same machines and have scripted remote consoles.
  • OpenCrowbar offers an API to turn machines on/off using the out-of-band BMC network.
  • The system detects if nodes have IPMI (VMs & containers do not) and skip configuration BUT still manage to have power control using SSH (and could use VM APIs in the future)
  • Of course, we automatically setup BMC network based on your desired configuration

 

a Ready State analogy: “roughed in” brings it Home for non-ops-nerds

I’ve been seeing great acceptance on the concept of ops Ready State.  Technologists from both ops and dev immediately understand the need to “draw a line in the sand” between system prep and installation.  We also admit that getting physical infrastructure to Ready State is largely taken for granted; however, it often takes multiple attempts to get it right and even small application changes can require a full system rebuild.

Since even small changes can redefine the ready state requirements, changing Ready State can feel like being told to tear down your house so you remodel the kitchen.

Foundation RawA friend asked me to explain “Ready State” in non-technical terms.  So far, the best analogy that I’ve found is when a house is “Roughed In.”  It’s helpful if you’ve ever been part of house construction but may not be universally accessible so I’ll explain.

Foundation PouredGetting to Rough In means that all of the basic infrastructure of the house is in place but nothing is finished.  The foundation is poured, the plumbing lines are placed, the electrical mains are ready, the roof on and the walls are up.  The house is being built according to architectural plans and major decisions like how many rooms there are and the function of the rooms (bathroom, kitchen, great room, etc).  For Ready State, that’s like having the servers racked and setup with Disk, BIOS, and network configured.

Framed OutWhile we’ve built a lot, rough in is a relatively early milestone in construction.  Even major items like type of roof, siding and windows can still be changed.  Speaking of windows, this is like installing an operating system in Ready State.  We want to consider this as a distinct milestone because there’s still room to make changes.  Once the roof and exteriors are added, it becomes much more disruptive and expensive to make.

Roughed InOnce the house is roughed in, the finishing work begins.  Almost nothing from roughed in will be visible to the people living in the house.  Like a Ready State setup, the users interact with what gets laid on top of the infrastructure.  For homes it’s the walls, counters, fixtures and following.  For operators, its applications like Hadoop, OpenStack or CloudFoundry.

Taking this analogy back to where we started, what if we could make rebuilding an entire house take just a day?!  In construction, that’s simply not practical; however, we’re getting to a place in Ops where automation makes it possible to reconstruct the infrastructure configuration much faster.

While we can’t re-pour the foundation (aka swap out physical gear) instantly, we should be able to build up from there to ready state in a much more repeatable way.

Ops Bridges > Building a Sharable Ops Infrastructure with Composable Tool Chain Orchestration

This posted started from a discussion with Judd Maltin that he documented in a post about “wanting a composable run deck.”

Fitz and Trantrums: Breaking the Chains of LoveI’ve had several conversations comparing OpenCrowbar with other “bare metal provisioning” tools that do thing like serve golden images to PXE or IPXE server to help bootstrap deployments.  It’s those are handy tools, they do nothing to really help operators drive system-wide operations; consequently, they have a limited system impact/utility.

In building the new architecture of OpenCrowbar (aka Crowbar v2), we heard very clearly to have “less magic” in the system.  We took that advice very seriously to make sure that Crowbar was a system layer with, not a replacement to, standard operations tools.

Specifically, node boot & kickstart alone is just not that exciting.  It’s a combination of DHCP, PXE, HTTP and TFTP or DHCP and an IPXE HTTP Server.   It’s a pain to set this up, but I don’t really get excited about it anymore.   In fact, you can pretty much use open ops scripts (Chef) to setup these services because it’s cut and dry operational work.

Note: Setting up the networking to make it all work is perhaps a different question and one that few platforms bother talking about.

So, if doing node provisioning is not a big deal then why is OpenCrowbar important?  Because sustaining operations is about ongoing system orchestration (we’d say an “operations model“) that starts with provisioning.

It’s not the individual services that’s critical; it’s doing them in a system wide sequence that’s vital.

Crowbar does NOT REPLACE the services.  In fact, we go out of our way to keep your proven operations tool chain.  We don’t want operators to troubleshoot our IPXE code!  We’d much rather use the standard stuff and orchestrate the configuration in a predicable way.

In that way, OpenCrowbar embraces and composes the existing operations tool chain into an integrated system of tools.  We always avoid replacing tools.  That’s why we use Chef for our DSL instead of adding something new.

What does that leave for Crowbar?  Crowbar is providing a physical infratsucture targeted orchestration (we call it “the Annealer”) that coordinates this tool chain to work as a system.  It’s the system perspective that’s critical because it allows all of the operational services to work together.

For example, when a node is added then we have to create v4 and v6 IP address entries for it.  This is required because secure infrastructure requires reverse DNS.  If you change the name of that node or add an alias, Crowbar again needs to update the DNS.  This had to happen in the right sequence.  If you create a new virtual interface for that node then, again, you need to update DNS.   This type of operational housekeeping is essential and must be performed in the correct sequence at the right time.

The critical insight is that Crowbar works transparently alongside your existing operational services with proven configuration management tools.  Crowbar connects links in your tool chain but keeps you in the driver’s seat.