OpenCrowbar.Anvil released – hammering out a gold standard in open bare metal provisioning

OpenCrowbarI’m excited to be announcing OpenCrowbar’s first release, Anvil, for the community.  Looking back on our original design from June 2012, we’ve accomplished all of our original objectives and more.
Now that we’ve got the foundation ready, our next release (OpenCrowbar Broom) focuses on workload development on top of the stable Anvil base.  This means that we’re ready to start working on OpenStack, Ceph and Hadoop.  So far, we’ve limited engagement on workloads to ensure that those developers would not also be trying to keep up with core changes.  We follow emergent design so I’m certain we’ll continue to evolve the core; however, we believe the Anvil release represents a solid foundation for workload development.
There is no more comprehensive open bare metal provisioning framework than OpenCrowbar.  The project’s focus on a complete operations model that comprehends hardware and network configuration with just enough orchestration delivers on a system vision that sets it apart from any other tool.  Yet, Crowbar also plays nicely with others by embracing, not replacing, DevOps tools like Chef and Puppet.
Now that the core is proven, we’re porting the Crowbar v1 RAID and BIOS configuration into OpenCrowbar.  By design, we’ve kept hardware support separate from the core because we’ve learned that hardware generation cycles need to be independent from the operations control infrastructure.  Decoupling them eliminates release disruptions that we experienced in Crowbar v1 and­ makes it much easier to use to incorporate hardware from a broad range of vendors.
Here are some key components of Anvil
  • UI, CLI and API stable and functional
  • Boot and discovery process working PLUS ability to handle pre-populating and configuration
  • Chef and Puppet capabilities including Birk Shelf v3 support to pull in community upstream DevOps scripts
  • Docker, VMs and Physical Servers
  • Crowbar’s famous “late-bound” approach to configuration and, critically, networking setup
  • IPv6 native, Ruby 2, Rails 4, preliminary scale tuning
  • Remarkably flexible and transparent orchestration (the Annealer)
  • Multi-OS Deployment capability, Ubuntu, CentOS, or Different versions of the same OS
Getting the workloads ported is still a tremendous amount of work but the rewards are tremendous.  With OpenCrowbar, the community has a new way to collaborate and integration this work.  It’s important to understand that while our goal is to start a quarterly release cycle for OpenCrowbar, the workload release cycles (including hardware) are NOT tied to OpenCrowbar.  The workloads choose which OpenCrowbar release they target.  From Crowbar v1, we’ve learned that Crowbar needed to be independent of the workload releases and so we want OpenCrowbar to focus on maintaining a strong ops platform.
This release marks four years of hard-earned Crowbar v1 deployment experience and two years of v2 design, redesign and implementation.  I’ve talked with DevOps teams from all over the world and listened to their pains and needs.  We have a long way to go before we’re deploying 1000 node OpenStack and Hadoop clusters, OpenCrowbar Anvil significantly moves the needle in that direction.
Thanks to the Crowbar community (Dell and SUSE especially) for nurturing the project, and congratulations to the OpenCrowbar team getting us this to this amazing place.

 

7 takeaways from DevOps Days Austin

Block Tables

I spent Tuesday and Wednesday at DevOpsDays Austin and continue to be impressed with the enthusiasm and collaborative nature of the DOD events.  We also managed to have a very robust and engaged twitter backchannel thanks to an impressive pace set by Gene Kim!

I’ve still got a 5+ post backlog from the OpenStack summit, but wanted to do a quick post while it’s top of mind.

My takeaways from DevOpsDays Austin:

  1. DevOpsDays spends a lot of time talking about culture.  I’m a huge believer on the importance of culture as the foundation for the type of fundamental changes that we’re making in the IT industry; however, it’s also a sign that we’re still in the minority if we have to talk about culture evangelism.
  2. Process and DevOps are tightly coupled.  It’s very clear that Lean/Agile/Kanban are essential for DevOps success (nice job by Dominica DeGrandis).  No one even suggested DevOps+Waterfall as a joke (but Patrick Debois had a picture of a xeroxed butt in his preso which is pretty close).
  3. Still need more Devs people to show up!  My feeling is that we’ve got a lot of operators who are engaging with developers and fewer developers who are engaging with operators (the “opsdev” people).
  4. Chef Omnibus installer is very compelling.  This approach addresses issues with packaging that were created because we did not have configuration management.  Now that we have good tooling we separate the concerns between bits, configuration, services and dependencies.  This is one thing to watch and something I expect to see in Crowbar.
  5. The old mantra still holds: If something is hard, do it more often.
  6. Eli Goldratt’s The Goal is alive again thanks to Gene Kims’s smart new novel, The Phoenix project, about DevOps and IT  (I highly recommend both, start with Kim).
  7. Not DevOps, but 3D printing is awesome.  This is clearly a game changing technology; however, it takes some effort to get right.  Dell brought a Solidoodle 3D printer to the event to try and print OpenStack & Crowbar logos (watch for this in the future).

I’d be interested in hearing what other people found interesting!  Please comment here and let me know.

5 things keeping DevOps from playing well with others (Chef, Crowbar and Upstream Patterns)

Sharing can be hardSince my earliest days on the OpenStack project, I’ve wanted to break the cycle on black box operations with open ops. With the rise of community driven DevOps platforms like Opscode Chef and Puppetlabs, we’ve reached a point where it’s both practical and imperative to share operational practices in the form of code and tooling.

Being open and collaborating are not the same thing.

It’s a huge win that we can compare OpenStack cookbooks. The real victory comes when multiple deployments use the same trunk instead of forking.

This has been an objective I’ve helped drive for OpenStack (with Matt Ray) and it has been the Crowbar objective from the start and is the keystone of our Crowbar 2 work.

This has proven to be a formidable challenge for several reasons:

  1. diverging DevOps patterns that can be used between private, public, large, small, and other deployments -> solution: attribute injection pattern is promising
  2. tooling gaps prevent operators from leveraging shared deployments -> solution: this is part of Crowbar’s mission
  3. under investing in community supporting features because they are seen as taking away from getting into production -> solution: need leadership and others with join
  4. drift between target versions creates the need for forking even if the cookbooks are fundamentally the same -> solution: pull from source approaches help create distro independent baselines
  5. missing reference architectures interfere with having a stable baseline to deploy against -> solution: agree to a standard, machine consumable RA format like OpenStack Heat.

Unfortunately, these five challenges are tightly coupled and we have to progress on them simultaneously. The tooling and community requires patterns and RAs.

The good news is that we are making real progress.

Judd Maltin (@newgoliath), a Crowbar team member, has documented the emerging Attribute Injection practice that Crowbar has been leading. That practice has been refined in the open by ATT and Rackspace. It is forming the foundation of the OpenStack cookbooks.

Understanding, discussing and supporting that pattern is an important step toward accelerating open operations. Please engage with us as we make the investments for open operations and help us implement the pattern.

OpenStack Summit: Let’s talk DevOps, Fog, Upgrades, Crowbar & Dell

If you are coming to the OpenStack summit in San Diego next week then please find me at the show! I want to hear from you about the Foundation, community, OpenStack deployments, Crowbar and anything else.  Oh, and I just ordered a handful of Crowbar stickers if you wanted some CB bling.

Matt Ray (Opscode), Jason Cannavale (Rackspace) and I were Ops track co-chairs. If you have suggestions, we want to hear. We managed to get great speakers and also some interesting sessions like DevOps panel and up streaming deploy working sessions. It’s only on Monday and Tuesday, so don’t snooze or you’ll miss it.

My team from Dell has a lot going on, so there are lots of chances to connect with us:

At the Dell booth, Randy Perryman will be sharing field experience about hardware choices. We’ve got a lot of OpenStack battle experience and we want to compare notes with you.

I’m on the board meeting on Monday so likely occupied until the Mirantis party.

See you in San Diego!

PS: My team is hiring for Dev, QA and Marketing. Let me know if you want details.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

  • Contributors need to sign CLAs
  • We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
  • Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
  • There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
  • Due to time limits, we have to stop discussions and continue them online.
  • We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

  • Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
  • We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
  • Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
  • Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

  • This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
  • We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
  • We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
  • The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

  • This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
  • It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
  • We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
  • The abstraction layer needs to have both import and export functions
  • Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
  • In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
  • There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

  • This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
  • Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
  • Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
  • We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
  • Critical use cases to consider:
    • an API for provisioning (not sure if it needs to be more than the current one)
    • pick which Operating Systems go on which nodes (potentially with a rules engine?)
    • inventory capabilities of available nodes (like ohai and factor) into a database
    • inventory available operating systems

Crowbar 2.0 Objectives: Scalable, Heterogeneous, Flexible and Connected

The seeds for Crowbar 2.0 have been in the 1.x code base for a while and were recently accelerated by SuSE.  With the Dell | Cloudera 4 Hadoop and Essex OpenStack-powered releases behind us, we will now be totally focused bringing these seeds to fruition in the next two months.

Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.

All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.

Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:

  1. simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
    1. reduce the initial effort required to leverage Crowbar
    2. opens Crowbar to a broader audience (see Upstreaming)
  2. provide heterogeneous / multiple operating system deployments. This enables:
    1. multiple versions of the same OS running for upgrades
    2. different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
    3. accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
    4. UEFI booting in Sledgehammer
  3. strengthen networking abstractions
    1. allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
    2. better manage connected operations
    3. enable pull-from-source deployments that are ahead of (or forked from) available packages.
  4. improvements in Crowbar’s core database and state machine to enable
    1. larger scale concerns
    2. controlled production migrations and upgrades
  5. other important items
    1. make documentation more coupled to current features and easier to maintain
    2. upgrade to Rails 3 to simplify code base, security and performance
    3. deepen automated test coverage and capabilities

Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.

We will kick off to community part of this effort with an online review on 7/16 (details).

PS: why a refactoring?

My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.

We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.

What does “enable upstream recipes” mean? Not just fishing for community goodness!

One of the major Crowbar 2.0 design targets is to allow you to “upstream” operations scripts more easily.  “Upstream code” means that parts of Crowbar’s source code could be maintained in other open source repositories.  This is beyond a simple dependency (like Rails, Curl, Java or Apache): Upstreaming allows Crowbar can use code managed in the other open source repositories for more general application.  This is important because Crowbar users can leverage DevOps logic that is more broadly targeted than just Crowbar.  Even more importantly, upstreaming means that we can contribute and take advantage of community efforts to improve the upstream source.

Specifically, Crowbar maintains a set of OpenStack cookbooks that make up the core of our OpenStack deployment.  These scripts have been widely cloned (not forked) and deCrowbarized for other deployments.  Unfortunately, that means that we do not benefit from downstream improvements and the cloners cannot easily track our updates.  This happened because Crowbar was not considered a valid upstream OpenStack repository because our deployment scripts required Crowbar.  The consequence of this cloning is that incompatible OpenStack recipes have propagated like cracks in a windshield.

While there are concrete benefits to upstreaming, there are risks too.  We have to evaluate if the upstream code has been adequately tested, operates effectively, implements best practices and leverages Crowbar capabilities.  I believe strongly that untested deployment code is worse than useless; consequently, the Dell Crowbar team provides significant value by validating that our deployments work as an integrated system.  Even more importantly, we will not upstream from unmoderated sources where changes are accepted without regard for downstream impacts.  There is a significant amount of trust required for upstreaming to work.

If upstreaming is so good, why did we not start out with upstream code?  It was simply not an option at the time – Crowbar was the first (and is still!) most complete set of DevOps deployment scripts for OpenStack in a public repository.
By design, Crowbar 1.0 was tightly coupled to Opscode Chef and required users to inject Crowbar dependencies into their Chef Recipes.  This approach allowed us to more quickly integrate capabilities between recipes and with nascent Crowbar features.  Our top design requirement was that our deployment was tightly integrated between hardware, networking, operating system, operations infrastructure and the application.  Figuring out the correct place to separate concerns was impractical; consequently, we injected dependencies into our Chef code.
We have reached a point with Crowbar development that we can correctly decouple Crowbar and Chef.
The benefits to upstreaming go far beyond enabling more collaboration on OpenStack deployments.  These same changes make it easier for Crowbar to leverage community deployment scripts without one-way modifications.  If you have a working Chef Recipe then making it work with Crowbar will no longer require changes that break it outside of Crowbar; therefore, you can leverage Crowbar capabilities without losing community input and without being locked into Crowbar.

OSCON preso graphic about Upstreaming added 7/23: