API Driven Metal = OpenCrowbar + Chef Provisioning

The OpenCrowbar community created a Chef-Provisioning driver that allows you to quickly build hardware clusters using Chef cookbooks.

2012-08-05_14-13-18_310When we started using Chef in 2011, there was a distinct gap around bootstrapping systems.  The platform did a great job of automation and even connecting services together (via the Search anti-pattern, see below) but lacked a way to build the initial clusters automatically.

The current answer to this problem from Chef is refreshingly simply: a cookbook API extension called Chef Provisioning.  This approach uses the regular Chef DSL in recipes to create request and bind a cluster into Chef.  Basically, the code simply builds an array of nodes using an API that creates the nodes if they are missing from the array in the code.  Specifically, when a node is missing from the array, Chef calls out to create the node in an external system.

For clouds, that means using the API to request a server and then inject credentials for Chef management.  It’s trickier for physical gear because you cannot just make a server in the configuration you need it in.  Physical systems must first be discovered and profiled to ready state: the system must know how many NICs and disk drives are available to correctly configure the hardware prior to laying down the Operating System.

Consequently, Chef Provisioning automation is more about reallocation of existing discovered physical assets to Chef.  That’s exactly the approach the OpenCrowbar team took for our Chef Provisioning driver.

OpenCrowbar interacts with Chef Provisioning by pulling nodes from the System deployment into a Chef Provisioning deployment.  That action then allows the API client to request specific configurations like Operating System or network that need to be setup for Chef to execute.  Once these requests are made, Crowbar will simply run its normal annealing processes to ready state and then injects the Chef credentials.  Chef waits until the work queue is empty and then takes over management of the asset.  When Chef is finished, Crowbar can be instructed to reconfigure the node back to a base state.

Does that sound simple?  It is simple because the Crowbar APIs match the Chef needs very cleanly.

It’s worth noting that this integration is a great test of the OpenCrowbar API design.  Over the last two years, we’ve evolved the API to make it more final result focused.  Late binding is a critical concept for the project and the APIs reflect that objective.  For Chef Provisioning, we allow the integration to focus on simple requests like “give me a node then put this O/S on the node and go.”  Crowbar has the logic needed to figure out how to accomplish those objectives without much additional instruction.

Bonus Side Note: Why Search can become an anti-pattern?  

Search is an incredibly powerful feature in Chef that allows cross-role and cross-node integration; unfortunately, it’s also very difficult to maintain as complexity and contributor counts grow.  The reason is that search creates “forward dependencies” in the scripts that require operators creating data to be aware of downstream, hidden consumers.  High Availability (HA) is a clear example.  If I add a new “cluster database” role to the system then it is very likely to return multiple results for database searches.  That’s excellent until I learn that my scripts have coded search to assume that we only return one result for database lookups.  It’s very hard to find these errors since the searches are decoupled and downstream of the database cookbook.  Ultimately, the community had to advise against embedded search for shared cookbooks

Starting RackN – Delivering open ops by pulling an OpenCrowbar Bunny out of our hat

When Dell pulled out from OpenCrowbar last April, I made a commitment to our community to find a way to keep it going.  Since my exit from Dell early in October 2014, that commitment has taken the form of RackN.

Rack N BlackToday, we’re ready to help people run and expand OpenCrowbar (days away from v2.1!). We’re also seeking investment to make the project more “enterprise-ready” and build integrations that extend ready state.

RackN focuses on maintenance and support of OpenCrowbar for ready state physical provisioning.  We will build the community around Crowbar as an open operations core and extend it with a larger set of hardware support and extensions.  We are building partnerships to build application integration (using Chef, Puppet, Salt, etc) and platform workloads (like OpenStack, Hadoop, Ceph, CloudFoundry and Mesos) above ready state.

I’ve talked with hundreds of people about the state of physical data center operations at scale. Frankly, it’s a scary state of affairs: complexity is increasing for physical infrastructure and we’re blurring the lines by adding commodity networking with local agents into the mix.

Making this jumble of stuff work together is not sexy cloud work – I describe it as internet plumbing to non-technical friends.  It’s unforgiving, complex and full of sharp edge conditions; however, people are excited to hear about our hardware abstraction mission because it solves a real pain for operators.

I hope you’ll stay tuned, or even play along, as we continue the Open Ops journey.

Need a physical ops baseline? Crowbar continues to uniquely fill gap

Robots Everywhere!I’ve been watching to see if other open “bare metal” projects would morph to match the system-level capabilities that we proved in Crowbar v1 and honed in the re-architecture of OpenCrowbar.  The answer appears to be that Crowbar simply takes a broader approach to solving the physical ops repeatably problem.

Crowbar Architect Victor Lowther says “What makes Crowbar a better tool than Cobbler, Razor, or Foreman is that Crowbar has an orchestration engine that can be used to safely and repeatably deploy complex workloads across large numbers of machines. This is different from (and better than, IMO) just being able to hand responsibility off to Chef/Puppet/Salt, because we can manage the entire lifecycle of a machine where Cobbler, Razor and Chef cannot, we can describe how we want workloads configured at a more abstract level than Foreman can, and we do it all using the same API and UI.”

Since we started with a vision of an integrated system to address the “apply-rinse-repeat” cycle; it’s no surprise that Crowbar remains the only open platform that’s managed to crack the complete physical deployment life-cycle.

The Crowbar team realized that it’s not just about automation setting values: physical ops requires orchestration to make sure the values are set in the correct sequence on the appropriate control surface including DNS, DHCP, PXE, Monitoring, et cetera.  Unlike architectures for aaS platforms, the heterogeneous nature of the physical control planes requires a different approach.

We’ve seen that making more and more complex kickstart scripts or golden images is not a sustainable solution.  There is simply too much hardware variation and dependency thrash for operators to collaborate with those tools.  Instead, we’ve found that decomposing the provisioning operations into functional layers with orchestration is much more multi-site repeatable.

Accepting that physical ops (discovered infrastructure) is fundamentally different from cloud ops (created infrastructure) has been critical to architecting platforms that were resilient enough for the heterogeneous infrastructure of data centers.

If we want to start cleaning up physical ops, we need to stop looking at operating system provisioning in isolation and start looking at the full server bring up as just a part of a broader system operation that includes networking, management and operational integration.

OpenCrowbar 2.B to deliver multiple hardware vendor support and advanced integrations

I’ve stayed quiet on the subject of Crowbar for a few months, but that does not mean that Crowbar has been.  Activity has been picking up, after Dell pulled resources off, to complete hardware configuration.

[Disclosure: As of 10/3/2014, I am no longer a Dell employee]

With the re-addition of hardware configuration, OpenCrowbar delivers the essential requirements for Ready State and we’ve piloted integration that shows how to drive Crowbar via the API.

From BuildersKnowledge

There has been substantial burn-down on the Broom release theme of hardware workload deliverable which mainly focus on the IPMI/BMC, RAID and BIOS functions working in the framework.  It has required us to add additional out-of-band abstractions (“hammers”) and node abstractions (“quirks”).

We’ve also had a chance to work ahead on the Camshaft release theme of tools integration components like:

  •         SaltStack Integration – Crowbar sets up a Salt master and minions on discovered metal (pull request)
  •         Chef Metal Integration – a Chef Metal driver talks to the Crowbar API to claim discovered servers from an allocation pool (Judd’s repo).
  •         Puppet Integration – Crowbar is able to use the stand-alone mode to execute Puppet manifests on the nodes (as a replacement for Foreman) (puppet sa client).
  •         Chef Integration – not new, but worth including in the list so it’s not overlooked! (chef-client install)
  • We also added some essential operational configurations including Squid proxy setup and auto configuration and preparing a Consul foundation for future integration with HashiCorp tools

These initial integration are key to being able to bring in OpenStack via Packstack, Chef Cookbooks, or Salt formulas.  Since Crowbar is agnostic about OS, Hardware and Configuration Management tools (Chef, Puppet, Salt), I am seeing interest from several fronts in parallel.  There seems to be substantial interest in RDO + Centos 7 using Packstack or Chef.  Happily, OpenCrowbar.Broom is ready to sweep in those workloads.

There is significant need for Crowbar to deliver ready state under these deployers.  For example, preparing the os, disk, monitoring, cache, networking and SDN infrastructure (OVS, Contrails) are outside the scripts but essential to a sustainable deployment.

These ready state configurations are places where Crowbar creates repeatable cross-platform base that spans the operational choices.

5 things keeping DevOps from playing well with others (Chef, Crowbar and Upstream Patterns)

Sharing can be hardSince my earliest days on the OpenStack project, I’ve wanted to break the cycle on black box operations with open ops. With the rise of community driven DevOps platforms like Opscode Chef and Puppetlabs, we’ve reached a point where it’s both practical and imperative to share operational practices in the form of code and tooling.

Being open and collaborating are not the same thing.

It’s a huge win that we can compare OpenStack cookbooks. The real victory comes when multiple deployments use the same trunk instead of forking.

This has been an objective I’ve helped drive for OpenStack (with Matt Ray) and it has been the Crowbar objective from the start and is the keystone of our Crowbar 2 work.

This has proven to be a formidable challenge for several reasons:

  1. diverging DevOps patterns that can be used between private, public, large, small, and other deployments -> solution: attribute injection pattern is promising
  2. tooling gaps prevent operators from leveraging shared deployments -> solution: this is part of Crowbar’s mission
  3. under investing in community supporting features because they are seen as taking away from getting into production -> solution: need leadership and others with join
  4. drift between target versions creates the need for forking even if the cookbooks are fundamentally the same -> solution: pull from source approaches help create distro independent baselines
  5. missing reference architectures interfere with having a stable baseline to deploy against -> solution: agree to a standard, machine consumable RA format like OpenStack Heat.

Unfortunately, these five challenges are tightly coupled and we have to progress on them simultaneously. The tooling and community requires patterns and RAs.

The good news is that we are making real progress.

Judd Maltin (@newgoliath), a Crowbar team member, has documented the emerging Attribute Injection practice that Crowbar has been leading. That practice has been refined in the open by ATT and Rackspace. It is forming the foundation of the OpenStack cookbooks.

Understanding, discussing and supporting that pattern is an important step toward accelerating open operations. Please engage with us as we make the investments for open operations and help us implement the pattern.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

  • Contributors need to sign CLAs
  • We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
  • Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
  • There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
  • Due to time limits, we have to stop discussions and continue them online.
  • We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

  • Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
  • We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
  • Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
  • Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

  • This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
  • We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
  • We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
  • The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

  • This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
  • It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
  • We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
  • The abstraction layer needs to have both import and export functions
  • Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
  • In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
  • There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

  • This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
  • Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
  • Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
  • We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
  • Critical use cases to consider:
    • an API for provisioning (not sure if it needs to be more than the current one)
    • pick which Operating Systems go on which nodes (potentially with a rules engine?)
    • inventory capabilities of available nodes (like ohai and factor) into a database
    • inventory available operating systems

Join us 5/31 for a OpenStack Deploy Hack-a-thon (all-day, world-wide online & multi-city)

An OpenStack Deploy Hack-a-thon is like 3-liter bottle of distilled open source community love.  Do you want direct access to my Dell team of OpenStack/Crowbar/Hadoop engineers?  Are you just getting started and want training about OpenStack and DevOps?  This is the event for you!

Here’s the official overview:

The OpenStack Deploy hack-a-thon focuses on automation for deploying OpenStack Essex with Dell Crowbar and Opscode Chef. This is a day-long, world-wide event bringing together developers, operators, users, ecosystem vendors and the open source cloud curious. (read below: We are looking for global sites and leaders to extend the event hours!)

OpenStack is the fastest growing open source cloud infrastructure project with broad market adoption from major hardware and software vendors. Crowbar is an Apache 2 licensed, open infrastructure deployment tool and is one of the leading multi-node deployers for OpenStack and Hadoop.

Learn first-hand how OpenStack and Crowbar can make it easy to deploy and operate your own cloud environments.

The Deploy day will offer two individual parallel tracks with something for both experts and beginners:

  • Newbies n00bs will learn the basics of OpenStack, Crowbar and DevOps and how they can benefit your organization. We’ll also have time for ecosystem vendors to discuss how they are leveraging OpenStack.
  • Experts l33ts will take a deep dive into new features of OpenStack Essex and Crowbar, and learn how Crowbar works under the hood, which will enable them to extend the product using Crowbar Barclamps.
Note: If you’re a n00b but want l33t content, we’ll be offering online training materials and videos to help get you up to speed.

Why now? We’ve validated our OpenStack Essex deployment against the latest release bits from Ubuntu. Now it’s time to reach out to the OpenStack and Crowbar communities for training, testing and collaborative development.

Join the event!  We’re organizing information on the Crowbar wiki.  (I highly recommend you join the Crowbar list to get access to support for prep materials).  You can also reach out to me via the @DellCrowbar handle.

We’d love to get you up to speed on the basics and dive deep into the core.