OpenCrowbar reaches critical milestone – boot, discover and forge on!

OpenCrowbarWe started the Crowbar project because we needed to make OpenStack deployments to be fast, repeatable and sharable.  We wanted a tool that looked at deployments as a system and integrated with our customers’ operations environment.  Crowbar was born as an MVP and quickly grew into a more dynamic tool that could deploy OpenStack, Hadoop, Ceph and other applications, but most critically we recognized that our knowledge gaps where substantial and we wanted to collaborate with others on the learning.  The result of that learning was a rearchitecture effort that we started at OSCON in 2012.

After nearly two years, I’m proud to show off the framework that we’ve built: OpenCrowbar addresses the limitations of Crowbar 1.x and adds critical new capabilities.

So what’s in OpenCrowbar?  Pretty much what we targeted at the launch and we’ve added some wonderful surprises too:

  • Heterogeneous Operating Systems – chose which operating system you want to install on the target servers.
  • CMDB Flexibility – don’t be locked in to a devops toolset.  Attribute injection allows clean abstraction boundaries so you can use multiple tools (Chef and Puppet, playing together).
  • Ops Annealer –the orchestration at Crowbar’s heart combines the best of directed graphs with late binding and parallel execution.  We believe annealing is the key ingredient for repeatable and OpenOps shared code upgrades
  • Upstream Friendly – infrastructure as code works best as a community practice and Crowbar use upstream code without injecting “crowbarisms” that were previously required.  So you can share your learning with the broader DevOps community even if they don’t use Crowbar.
  • Node Discovery (or not) – Crowbar maintains the same proven discovery image based approach that we used before, but we’ve streamlined and expanded it.  You can use Crowbar’s API outside of the PXE discovery system to accommodate Docker containers, existing systems and VMs.
  • Hardware Configuration – Crowbar maintains the same optional hardware neutral approach to RAID and BIOS configuration.  Configuring hardware with repeatability is difficult and requires much iterative testing.  While our approach is open and generic, my team at Dell works hard to validate a on specific set of gear: it’s impossible to make statements beyond that test matrix.
  • Network Abstraction – Crowbar dramatically extended our DevOps network abstraction.  We’ve learned that a networking is the key to success for deployment and upgrade so we’ve made Crowbar networking flexible and concise.  Crowbar networking works with attribute injection so that you can avoid hardwiring networking into DevOps scripts.
  • Out of band control – when the Annealer hands off work, Crowbar gives the worker implementation flexibility to do it on the node (using SSH) or remotely (using an API).  Making agents optional means allows operators and developers make the best choices for the actions that they need to take.
  • Technical Debt Paydown – We’ve also updated the Crowbar infrastructure to use the latest libraries like Ruby 2, Rails 4, Chef 11.  Even more importantly, we’re dramatically simplified the code structure including in repo documentation and a Docker based developer environment that makes building a working Crowbar environment fast and repeatable.

Why change to OpenCrowbar?  This new generation of Crowbar is structurally different from Crowbar 1 and we’ve investing substantially in refactoring the tooling, paying down technical debt and cleanup up documentation.  Since Crowbar 1 is still being actively developed, splitting the repositories allow both versions to progress with less confusion.  The majority of the principles and deployment code is very similar, I think of Crowbar as a single community.

Interested?  Our new Docker Admin node is quick to setup and can boot and manage both virtual and physical nodes.

OpenStack Summit: Let’s talk DevOps, Fog, Upgrades, Crowbar & Dell

If you are coming to the OpenStack summit in San Diego next week then please find me at the show! I want to hear from you about the Foundation, community, OpenStack deployments, Crowbar and anything else.  Oh, and I just ordered a handful of Crowbar stickers if you wanted some CB bling.

Matt Ray (Opscode), Jason Cannavale (Rackspace) and I were Ops track co-chairs. If you have suggestions, we want to hear. We managed to get great speakers and also some interesting sessions like DevOps panel and up streaming deploy working sessions. It’s only on Monday and Tuesday, so don’t snooze or you’ll miss it.

My team from Dell has a lot going on, so there are lots of chances to connect with us:

At the Dell booth, Randy Perryman will be sharing field experience about hardware choices. We’ve got a lot of OpenStack battle experience and we want to compare notes with you.

I’m on the board meeting on Monday so likely occupied until the Mirantis party.

See you in San Diego!

PS: My team is hiring for Dev, QA and Marketing. Let me know if you want details.

Why doesn’t Chef call them “bowls” instead of roles?

The extended Crowbar team (my employer Dell and community) recently had a bit of a controversy heated discussion over the renaming of “proposals” to “configurations.” It was pretty clear that the term “proposal” confused users because an “active proposal” seems like a bit of an oxymoron. Excepting Scott Jensen, our schedule-czar and director of engineering, we had relatively few die-hard “I love proposal” advocates; however, deciding on an alternative was not quite so easy.

Ultimately, the question came down to “do we use an invented term or an intuitive term.”  The answer lies in when to use or avoid the congruity theory as articulated by Roger Cauvin.

We considered many alternatives like calling them “fixtures” to go along with the Crowbar & Barclamp tool theme. Even “Chuck Norris” was considered until copyright issues were flagged. The top alternative, “configuration” seemed just too bland. Frankly and amazingly, we originally considered it “too descriptive!”

The crux of the argument really revolved around the users’ ability to intuitively grasp a concept or to force them learn a new term. For example, we specifically chose “barclamp” instead of “module” because we felt that there were more components to a barclamp than just being a Crowbar module. In many ways, module would be sufficiently descriptive; however, we saw that there was benefit to the user tax in introducing a new term. It also fit nicely within our tool theme.

Opscode Chef is an example of investing heavily in a naming theme. For example, the concept of “cookbooks” and “recipes” seems relatively intuitive for users but starts getting stretched for “knife” because it is not immediately clear to users what that component does (it executes instructions on nodes and the server). After learning Chef, I appreciate “knife” as the universal tool but still remember having to figure it out.

A good theme is awesome, but it can quickly encumber usability.

For example, what if Chef has used “bowl” instead of role. It’s logical: you put a group of ingredients to mix into a bowl that acts as a container. While it may be logical to the initiated, it mainly extends the learning curve for new users. A role is a commonly accepted term for an operational classification so it is a much better term for users. The same is true for “node” and “data bag.”

I love a good incongruent theme as much as any meme-enabled tech geek but themes must not hinder usability. After all, we all fight for the users.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

  • Contributors need to sign CLAs
  • We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
  • Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
  • There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
  • Due to time limits, we have to stop discussions and continue them online.
  • We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

  • Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
  • We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
  • Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
  • Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

  • This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
  • We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
  • We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
  • The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

  • This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
  • It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
  • We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
  • The abstraction layer needs to have both import and export functions
  • Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
  • In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
  • There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

  • This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
  • Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
  • Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
  • We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
  • Critical use cases to consider:
    • an API for provisioning (not sure if it needs to be more than the current one)
    • pick which Operating Systems go on which nodes (potentially with a rules engine?)
    • inventory capabilities of available nodes (like ohai and factor) into a database
    • inventory available operating systems

Open Community Access to Crowbar 2 Efforts

We’re moving along on the Crowbar2 refactoring work (`/release/rails3anddb/master` branch for now) and it’s time to start making it easier for you to participate if you are interested.

We are planning to start having TWO weekly community sprint meetings.

NOTE: Times can still shift depending on community input! We are trying to a truly global community so we need your input.

The weekly design discussions will be on Tuesdays @ 10am Central (GMT -6). The topics will be relevant to the coming sprint and we expect dialog. The topics will be determined by Greg Althaus based on progress on the refactor. You’re welcome to contact him or the list with suggestions.

The purpose of these meetings is to

  • discuss/resolve design related to the refactor
  • Document use-cases
  • identify issues that need to be addressed in the next sprint

The weekly coordination meetings will be on Thursdays @ 8am Central (GMT -6). We want to respect everyone’s time and will strictly limit these to 1 hour. This meeting is in between our internal sprint review and planning so we have flexibility to adjust our plans based on your input. It is important to us that we make it possible for you to contribute and we need your input to make sure that you are not blocked!

The coordination meetings will be structured as follows (times approximate)

  • Voice & Screencast: https://join.me/dellcrowbar
  • 25 minutes for review of current progress
  • 10 minutes for feedback/adjustments on process and workflow
  • 25 minutes for planning of next sprint
  • Online discussion & notes on the http://crowbar.sync.in/sprintMMDD etherpads
  • Identification of working groups for further discussion and coding collaboration

These meetings are the primary way that we will be making sure the community is not blocked by our development efforts.

The purpose of these meetings is

  • to synchronize quickly so that we can connect the people who should be collaborating
  • eliminate blocking items for Crowbar contributors

Of course, we will attempt to record and post all of these meetings.

Stay tuned! We will likely announce additional meetings for community collaboration.

PS: These are design meetings, they are NOT Crowbar training meetings. Please consult http://bit.ly/crowbarwiki for links and videos about learning Crowbar.

Ops “Late Binding” is critical best practice and key to Crowbar differentiation

From OSCON 2012: Portland's light rail understands good dev practice.Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.

We believe that late binding is a best practice for CloudOps.

Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down.  The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.

If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”

Late binding arose from our design objectives. We started the project with a few critical operational design objectives:

  1. Treat the nodes and application layers as an interconnected system
  2. Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
  3. Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.

We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.

We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?

In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.

Late Binding Visualized

The real workloads begin: Crowbar’s Sophomore Year

Given Crowbar‘s frenetic Freshman year, it’s impossible to predict everything that Crowbar could become. I certainly aspire to see the project gain a stronger developer community and the seeds of this transformation are sprouting. I also see that community driven work is positioning Crowbar to break beyond being platforms for OpenStack and Apache Hadoop solutions that pay the bills for my team at Dell to invest in Crowbar development.

I don’t have to look beyond the summer to see important development for Crowbar because of the substantial goals of the Crowbar 2.0 refactor.

Crowbar 2.0 is really just around the corner so I’d like to set some longer range goals for our next year.

  • Growing acceptance of Crowbar as an in data center extension for DevOps tools (what I call CloudOps)
  • Deeper integration into more operating environments beyond the core Linux flavors (like virtualization hosts, closed and special purpose operating systems.
  • Improvements in dynamic networking configuration
  • Enabling more online network connected operating modes
  • Taking on production ops challenges of scale, high availability and migration
  • Formalization of our community engagement with summits, user groups, and broader developer contributions.

For example, Crowbar 2.0 will be able to handle downloading packages and applications from the internet. Online content is not a major benefit without being able to stage and control how those new packages are deployed; consequently, our goals remains tightly focused improvements in orchestration.

These changes create a foundation that enables a more dynamic operating environment. Ultimately, I see Crowbar driving towards a vision of fully integrated continuous operations; however, Greg & Rob’s Crowbar vision is the topic for tomorrow’s post.