Starting RackN – Delivering open ops by pulling an OpenCrowbar Bunny out of our hat

When Dell pulled out from OpenCrowbar last April, I made a commitment to our community to find a way to keep it going.  Since my exit from Dell early in October 2014, that commitment has taken the form of RackN.

Rack N BlackToday, we’re ready to help people run and expand OpenCrowbar (days away from v2.1!). We’re also seeking investment to make the project more “enterprise-ready” and build integrations that extend ready state.

RackN focuses on maintenance and support of OpenCrowbar for ready state physical provisioning.  We will build the community around Crowbar as an open operations core and extend it with a larger set of hardware support and extensions.  We are building partnerships to build application integration (using Chef, Puppet, Salt, etc) and platform workloads (like OpenStack, Hadoop, Ceph, CloudFoundry and Mesos) above ready state.

I’ve talked with hundreds of people about the state of physical data center operations at scale. Frankly, it’s a scary state of affairs: complexity is increasing for physical infrastructure and we’re blurring the lines by adding commodity networking with local agents into the mix.

Making this jumble of stuff work together is not sexy cloud work – I describe it as internet plumbing to non-technical friends.  It’s unforgiving, complex and full of sharp edge conditions; however, people are excited to hear about our hardware abstraction mission because it solves a real pain for operators.

I hope you’ll stay tuned, or even play along, as we continue the Open Ops journey.

Unicorn captured! Unpacking multi-node OpenStack Juno from ready state.

OpenCrowbar Packstack install demonstrates that abstracting hardware to ready state smooths install process.  It’s a working balance: Crowbar gets the hardware, O/S & networking right while Packstack takes care of OpenStack.

LAYERSThe Crowbar team produced the first open OpenStack installer back in 2011 and it’s been frustrating to watch the community fragment around building a consistent operational model.  This is not an OpenStack specific problem, but I think it’s exaggerated in a crowded ecosystem.

When I step back from that experience, I see an industry wide pattern of struggle to create scale deployments patterns that can be reused.  Trying to make hardware uniform is unicorn hunting, so we need to create software abstractions.  That’s exactly why IaaS is powerful and the critical realization behind the OpenCrowbar approach to physical ready state.

So what has our team created?  It’s not another OpenStack installer – we just made the existing one easier to use.

We build up a ready state infrastructure that makes it fast and repeatable to use Packstack, one of the leading open OpenStack installers.  OpenCrowbar can do the same for the OpenStack Chef cookbooks or Salt Formula.   It can even use Saltstack, Chef and Puppet together (which we do for the Packstack work)!  Plus we can do it on multiple vendors hardware and with different operating systems.   Plus we build the correct networks!

For now, the integration is available as a private beta (inquiries welcome!) because our team is not in the OpenStack support business – we are in the “get scale systems to ready state and integrate” business.  We are very excited to work with people who want to take this type of functionality to the next level and build truly repeatable, robust and upgradable application deployments.

OpenCrowbar bootstrap positions SSH Keys for hand-offs

I was reading a ComputerWorld article about how Google and Amazon achieve scale.  The theme: you must do better than linear cost scale and the only way to achieve that is to automate and commoditize hardware.  I find interesting parallels in the Crowbar physical devops effort.

KeysAs the OpenCrowbar team continues to explore the concepts around “ready state,” I discover more and more small ops nuisances that need to be included in the build up before installing software.  These small items quickly add up at scale breaking the rule above.

I’ve already posted about the performance benefit of building a Squid Proxy fabric as part of the underlying ops environment.  As we work on Chef Metal, SaltStack and Packstack integrations (private beta), we’ve rediscovered the importance of management/population of SSH public keys.

In cloud infrastructure, key injection is taken for granted; however, it’s not an automatic behavior in the physical ops.  Since OpenCrowbar handles keys by default but other tools (like Cobbler or Razor) expect that you will use kickstart to inject your SSH keys when you install the Operating System..

Including keys in kickstart (which I’m using generically instead of preseed, auto-yast, jumpstart, etc) hand generated scripts is a potentially dangerous security practice since it makes it difficult to propagate and manage your keys.  It also means that every time a new operating system update is released that you may have to update and retest your kickstarts.  OpenCrowbar has the same challenge but our approach allows everyone can share in the work because our bootstrapping files are scripted and generic.

OpenCrowbar takes care of these ready state configurations in our integrations with these DevOps platforms.  Our experience has been that little items like SSH keys and proxy configurations can make a disproportionate advantage in running scale ops or during iterative development.

Tweaking DefCore to subdivide OpenStack platform (proposal for review)

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsFor nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.

OpenStack Goldilocks’ Syndrome: three questions to help us find our bearings

Goldilocks Atlas

Action: Please join Stefano. Allison, Sean and me in Paris on Monday, November 3rd, in the afternoon (schedule link)

If wishes were fishes, OpenStack’s rapid developer and user rise would include graceful process and commercial transitions too.  As a Foundation board member, it’s my responsibility to help ensure that we’re building a sustainable ecosystem for the project.  That’s a Goldilock’s challenge because adding either too much or too little controls and process will harm the project.

In discussions with the community, that challenge seems to breaks down into three key questions:

After last summit, a few of us started a dialog around Hidden Influencers that helps to frame these questions in an actionable way.  Now, it’s time for us to come together and talk in Paris in the hallways and specifically on Monday, November 3rd, in the afternoon (schedule link).   From there, we’ll figure out about next steps using these three questions as a baseline.

If you’ve got opinions about these questions, don’t wait for Paris!  I’d love to start the discussion here in the comments, on twitter (@zehicle), by phone, with email or via carrier pidgins.

Need a physical ops baseline? Crowbar continues to uniquely fill gap

Robots Everywhere!I’ve been watching to see if other open “bare metal” projects would morph to match the system-level capabilities that we proved in Crowbar v1 and honed in the re-architecture of OpenCrowbar.  The answer appears to be that Crowbar simply takes a broader approach to solving the physical ops repeatably problem.

Crowbar Architect Victor Lowther says “What makes Crowbar a better tool than Cobbler, Razor, or Foreman is that Crowbar has an orchestration engine that can be used to safely and repeatably deploy complex workloads across large numbers of machines. This is different from (and better than, IMO) just being able to hand responsibility off to Chef/Puppet/Salt, because we can manage the entire lifecycle of a machine where Cobbler, Razor and Chef cannot, we can describe how we want workloads configured at a more abstract level than Foreman can, and we do it all using the same API and UI.”

Since we started with a vision of an integrated system to address the “apply-rinse-repeat” cycle; it’s no surprise that Crowbar remains the only open platform that’s managed to crack the complete physical deployment life-cycle.

The Crowbar team realized that it’s not just about automation setting values: physical ops requires orchestration to make sure the values are set in the correct sequence on the appropriate control surface including DNS, DHCP, PXE, Monitoring, et cetera.  Unlike architectures for aaS platforms, the heterogeneous nature of the physical control planes requires a different approach.

We’ve seen that making more and more complex kickstart scripts or golden images is not a sustainable solution.  There is simply too much hardware variation and dependency thrash for operators to collaborate with those tools.  Instead, we’ve found that decomposing the provisioning operations into functional layers with orchestration is much more multi-site repeatable.

Accepting that physical ops (discovered infrastructure) is fundamentally different from cloud ops (created infrastructure) has been critical to architecting platforms that were resilient enough for the heterogeneous infrastructure of data centers.

If we want to start cleaning up physical ops, we need to stop looking at operating system provisioning in isolation and start looking at the full server bring up as just a part of a broader system operation that includes networking, management and operational integration.

OpenCrowbar 2.B to deliver multiple hardware vendor support and advanced integrations

I’ve stayed quiet on the subject of Crowbar for a few months, but that does not mean that Crowbar has been.  Activity has been picking up, after Dell pulled resources off, to complete hardware configuration.

[Disclosure: As of 10/3/2014, I am no longer a Dell employee]

With the re-addition of hardware configuration, OpenCrowbar delivers the essential requirements for Ready State and we’ve piloted integration that shows how to drive Crowbar via the API.

From BuildersKnowledge

There has been substantial burn-down on the Broom release theme of hardware workload deliverable which mainly focus on the IPMI/BMC, RAID and BIOS functions working in the framework.  It has required us to add additional out-of-band abstractions (“hammers”) and node abstractions (“quirks”).

We’ve also had a chance to work ahead on the Camshaft release theme of tools integration components like:

  •         SaltStack Integration – Crowbar sets up a Salt master and minions on discovered metal (pull request)
  •         Chef Metal Integration – a Chef Metal driver talks to the Crowbar API to claim discovered servers from an allocation pool (Judd’s repo).
  •         Puppet Integration – Crowbar is able to use the stand-alone mode to execute Puppet manifests on the nodes (as a replacement for Foreman) (puppet sa client).
  •         Chef Integration – not new, but worth including in the list so it’s not overlooked! (chef-client install)
  • We also added some essential operational configurations including Squid proxy setup and auto configuration and preparing a Consul foundation for future integration with HashiCorp tools

These initial integration are key to being able to bring in OpenStack via Packstack, Chef Cookbooks, or Salt formulas.  Since Crowbar is agnostic about OS, Hardware and Configuration Management tools (Chef, Puppet, Salt), I am seeing interest from several fronts in parallel.  There seems to be substantial interest in RDO + Centos 7 using Packstack or Chef.  Happily, OpenCrowbar.Broom is ready to sweep in those workloads.

There is significant need for Crowbar to deliver ready state under these deployers.  For example, preparing the os, disk, monitoring, cache, networking and SDN infrastructure (OVS, Contrails) are outside the scripts but essential to a sustainable deployment.

These ready state configurations are places where Crowbar creates repeatable cross-platform base that spans the operational choices.