Unicorn captured! Unpacking multi-node OpenStack Juno from ready state.

OpenCrowbar Packstack install demonstrates that abstracting hardware to ready state smooths install process.  It’s a working balance: Crowbar gets the hardware, O/S & networking right while Packstack takes care of OpenStack.

LAYERSThe Crowbar team produced the first open OpenStack installer back in 2011 and it’s been frustrating to watch the community fragment around building a consistent operational model.  This is not an OpenStack specific problem, but I think it’s exaggerated in a crowded ecosystem.

When I step back from that experience, I see an industry wide pattern of struggle to create scale deployments patterns that can be reused.  Trying to make hardware uniform is unicorn hunting, so we need to create software abstractions.  That’s exactly why IaaS is powerful and the critical realization behind the OpenCrowbar approach to physical ready state.

So what has our team created?  It’s not another OpenStack installer – we just made the existing one easier to use.

We build up a ready state infrastructure that makes it fast and repeatable to use Packstack, one of the leading open OpenStack installers.  OpenCrowbar can do the same for the OpenStack Chef cookbooks or Salt Formula.   It can even use Saltstack, Chef and Puppet together (which we do for the Packstack work)!  Plus we can do it on multiple vendors hardware and with different operating systems.   Plus we build the correct networks!

For now, the integration is available as a private beta (inquiries welcome!) because our team is not in the OpenStack support business – we are in the “get scale systems to ready state and integrate” business.  We are very excited to work with people who want to take this type of functionality to the next level and build truly repeatable, robust and upgradable application deployments.

OpenCrowbar bootstrap positions SSH Keys for hand-offs

I was reading a ComputerWorld article about how Google and Amazon achieve scale.  The theme: you must do better than linear cost scale and the only way to achieve that is to automate and commoditize hardware.  I find interesting parallels in the Crowbar physical devops effort.

KeysAs the OpenCrowbar team continues to explore the concepts around “ready state,” I discover more and more small ops nuisances that need to be included in the build up before installing software.  These small items quickly add up at scale breaking the rule above.

I’ve already posted about the performance benefit of building a Squid Proxy fabric as part of the underlying ops environment.  As we work on Chef Metal, SaltStack and Packstack integrations (private beta), we’ve rediscovered the importance of management/population of SSH public keys.

In cloud infrastructure, key injection is taken for granted; however, it’s not an automatic behavior in the physical ops.  Since OpenCrowbar handles keys by default but other tools (like Cobbler or Razor) expect that you will use kickstart to inject your SSH keys when you install the Operating System..

Including keys in kickstart (which I’m using generically instead of preseed, auto-yast, jumpstart, etc) hand generated scripts is a potentially dangerous security practice since it makes it difficult to propagate and manage your keys.  It also means that every time a new operating system update is released that you may have to update and retest your kickstarts.  OpenCrowbar has the same challenge but our approach allows everyone can share in the work because our bootstrapping files are scripted and generic.

OpenCrowbar takes care of these ready state configurations in our integrations with these DevOps platforms.  Our experience has been that little items like SSH keys and proxy configurations can make a disproportionate advantage in running scale ops or during iterative development.

Tweaking DefCore to subdivide OpenStack platform (proposal for review)

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsFor nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.

OpenStack Goldilocks’ Syndrome: three questions to help us find our bearings

Goldilocks Atlas

Action: Please join Stefano. Allison, Sean and me in Paris on Monday, November 3rd, in the afternoon (schedule link)

If wishes were fishes, OpenStack’s rapid developer and user rise would include graceful process and commercial transitions too.  As a Foundation board member, it’s my responsibility to help ensure that we’re building a sustainable ecosystem for the project.  That’s a Goldilock’s challenge because adding either too much or too little controls and process will harm the project.

In discussions with the community, that challenge seems to breaks down into three key questions:

After last summit, a few of us started a dialog around Hidden Influencers that helps to frame these questions in an actionable way.  Now, it’s time for us to come together and talk in Paris in the hallways and specifically on Monday, November 3rd, in the afternoon (schedule link).   From there, we’ll figure out about next steps using these three questions as a baseline.

If you’ve got opinions about these questions, don’t wait for Paris!  I’d love to start the discussion here in the comments, on twitter (@zehicle), by phone, with email or via carrier pidgins.

To improve flow, we must view OpenStack community as a Software Factory

This post was sparked by a conversation at OpenStack Atlanta between OpenStack Foundation board members Todd Moore (IBM) and Rob Hirschfeld (Dell/Community).  We share a background in industrial and software process and felt that sharing lean manufacturing translates directly to helping face OpenStack challenges.

While OpenStack has done an amazing job of growing contributors, scale has caused our code flow processes to be bottlenecked at the review stage.  This blocks flow throughout the entire system and presents a significant risk to both stability and feature addition.  Flow failures can ultimately lead to vendor forking.

Fundamentally, Todd and I felt that OpenStack needs to address system flows to build an integrated product.  The post expands on the “hidden influencers” issue and adds an additional challenge because improving flow requires that the community influences better understands the need to optimize work inter-project in a more systematic way.

Let’s start by visualizing the “OpenStack Factory”

Factory Floor

Factory Floor from Alpha Industries Wikipedia page

Imagine all of OpenStack’s 1000s of developers working together in a single giant start-up warehouse.  Each project in its own floor area with appropriate fooz tables, break areas and coffee bars.  It’s easy to visualize clusters of intent developers talking around tables or coding in dark corners while PTLs and TC members dash between groups coordinating work.

Expand the visualization so that we can actually see the code flowing between teams as little colored boxes.  Giving project has a unique color allows us to quickly see dependencies between teams.  Some features are piled up waiting for review inside teams while others are waiting on pallets between projects waiting on needed cross features have not completed.  At release time, we’d be able to see PTLs sorting through stacks of completed boxes to pick which ones were ready to ship.

Watching a factory floor from above is a humbling experience and a key feature of systems thinking enlightenment in both The Phoenix Project and The Goal.  It’s very easy to be caught up in a single project (local optimization) and miss the broader system implications of local choices.

There is a large body of work about Lean Process for Manufacturing

You’ve already visualized OpenStack code creation as a manufacturing floor: it’s a small step to accept that we can use the same proven processes for software and physical manufacturing.

As features move between teams (work centers), it becomes obvious that we’ve created a very highly interlocked sequence of component steps needed to deliver product; unfortunately, we have minimal coordination between the owners of the work centers.  If a feature is needs a critical resource (think programmer) to progress then we rely on the resource to allocate time to the work.  Since that person’s manager may not agree to the priority, we have a conflict between system flow and individual optimization.

That conflict destroys flow in the system.

The number #1 lesson from lean manufacturing is that putting individual optimization over system optimization reduces throughput.  Since our product and people managers are often competitors, we need to work doubly hard to address system concerns.  Worse yet our inventory of work in process and the interdependencies between projects is harder to discern.  Unlike the manufacturing floor, our developers and project leads cannot look down upon it and see the physical work as it progresses from station to station in one single holistic view.  The bottlenecks that throttle the OpenStack workflow are harder to see but we can find them, as can be demonstrated later in this post.

Until we can engage the resource owners in balancing system flow, OpenStack’s throughput will decline as we add resources.  This same principle is at play in the famous aphorism: adding developers makes a late project later.

Is there a solution?

There are lessons from Lean Manufacturing that can be applied

  1. Make quality a priority (expand tests from function to integration)
  2. Ensure integration from station to station (prioritize working together over features)
  3. Make sure that owners of work are coordinating (expose hidden influencers)
  4. Find and mange from the bottleneck (classic Lean says find the bottleneck and improve that)
  5. Create and monitor a system view
  6. Have everyone value finished product, not workstation output

Added Subscript: I highly recommend reading Daniel Berrange’s email about this.

VMware Integrated OpenStack (VIO) is smart move, it’s like using a Volvo to tow your ski boat

I’m impressed with VMware’s VIO (beta) play and believe it will have a meaningful positive impact in the OpenStack ecosystem.  In the short-term, it paradoxically both helps enterprises stay on VMware and accelerates adoption of OpenStack.  The long term benefit to VMware is less clear.

From VWVortex

Sure, you can use a Volvo to tow a boat

Why do I think it’s good tactics?  Let’s explore an analogy….

My kids think owning a boat will be super fun with images of ski parties and lazy days drifting at anchor with PG13 umbrella drinks; however, I’ve got concerns about maintenance, cost and how much we’d really use it.  The problem is not the boat: it’s all of the stuff that goes along with ownership.  In addition to the boat, I’d need a trailer, a new car to pull the boat and driveway upgrades for parking.  Looking at that, the boat’s the easiest part of the story.

The smart move for me is to rent a boat and trailer for a few months to test my kids interest.  In that case, I’m going to be towing the boat using my Volvo instead of going “all in” and buying that new Ferd 15000 (you know you want it).  As a compromise, I’ll install a hitch in my trusty sedan and use it gently to tow the boat.  It’s not ideal and causes extra wear to the transmission but it’s a very low risk way to explore the boat owning life style.

Enterprise IT already has the Volvo (VMware vCenter) and likely sees calls for OpenStack as the illusion of cool ski parties without regard for the realities of owning the boat.  Pulling the boat for a while (using OpenStack on VMware) makes a lot of sense to these users.  If the boat gets used then they will buy the truck and accessories (move off VMware).  Until then, their still learning about the open source boating life style.

Putting open source concerns aside.  This helps VMware lead the OpenStack play for enterprises but may ultimately backfire if they have not setup their long game to keep the customers.

OpenStack DefCore Process Flow: Community Feedback Cycles for Core [6 points + chart]

If you’ve been following my DefCore posts, then you already know that DefCore is an OpenStack Foundation Board managed process “that sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack™ products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack™.”

In this post, I’m going to be very specific about what we think “community resources and involvement” entails.

The draft process flow chart was provided to the Board at our OSCON meeting without additional review.  It below boils down to a few key points:

  1. We are using the documents in the Gerrit review process to ensure that we work within the community processes.
  2. Going forward, we want to rely on the technical leadership to create, cluster and describe capabilities.  DefCore bootstrapped this process for Havana.  Further, Capabilities are defined by tests in Tempest so test coverage gaps (like Keystone v2) translate into Core gaps.
  3. We are investing in data driven and community involved feedback (via Refstack) to engage the largest possible base for core decisions.
  4. There is a “safety valve” for vendors to deal with test scenarios that are difficult to recreate in the field.
  5. The Board is responsible for approving the final artifacts based on the recommendations.  By having a transparent process, community input is expected in advance of that approval.
  6. The process is time sensitive.  There’s a need for the Board to produce Core definition in a timely way after each release and then feed that into the next one.  Ideally, the definitions will be approved at the Board meeting immediately following the release.

DefCore Process Draft

Process shows how the key components: designated sections and capabilities start from the previous release’s version and the DefCore committee manages the update process.  Community input is a vital part of the cycle.  This is especially true for identifying actual use of the capabilities through the Refstack data collection site.

  • Blue is for Board activities
  • Yellow is or user/vendor community activities
  • Green is for technical community activities
  • White is for process artifacts

This process is very much in draft form and any input or discussion is welcome!  I expect DefCore to take up formal review of the process in October.