Crowbar Celebrates 1st Anniversary

Nearly a year ago at OSCON 2011, my team at Dell opened sourced “Crowbar, an OpenStack installer.” That first Github commit was a much more limited project than Crowbar today: there was no separation into barclamps, no distinct network configuration, one operating system option and the default passwords were all “openstack.” We simply did not know if our effort would create any interest.

The response to Crowbar has been exciting and humbling. I most appreciate those who looked at Crowbar and saw more than a bare metal installer. They are the ones who recognized that we are trying to solve a bigger problem: it has been too difficult to cope with change in IT operations.

During this year, we have made many changes. Many have been driven by customer, user and partner feedback while others support Dell product delivery needs. Happily, these inputs are well aligned in intent if not always in timing.

  • Introduction of barclamps as modular components
  • Expansion into multiple applications (most notably OpenStack and Apache Hadoop)
  • Multi-Operating System
  • Working in the open (with public commits)
  • Collaborative License Agreements

Dell‘s understanding of open source and open development has made a similar transformation. Crowbar was originally Apache 2 open sourced because we imagined it becoming part of the OpenStack project. While that ambition has faded, the practical benefits of open collaboration have proven to be substantial.

The results from this first year are compelling:

  • For OpenStack Diablo, coordination with the Rackspace Cloud Builder team enabled Crowbar to include the Keystone and Dashboard projects into Dell’s solution
  • For OpenStack Essex, the community focused work we did for the March Essex Hackday are directly linked to our ability to deliver Dell’s OpenStack-Powered Essex solution over two months earlier than originally planned.
  • For Apache Hadoop distributions for 3.x and 4.x with implementation of Cloudera Manager and eco system components.
  • We’ve amassed hundreds of mail subscribers and Github followers
  • Support for multiple releases of RHEL, Centos & Ubuntu including Ubuntu 12.04 while it was still in beta.
  • SuSE does their own port of Crowbar to SuSE with important advances in Crowbar’s install model (from ISO to package).

We stand on the edge of many exciting transformations for Crowbar’s second year. Based on the amount of change from this year, I’m hesitant to make long term predictions. Yet, just within next few months there are significant plans based on Crowbar 2.0 refactor. We have line of site to changes that expand our tool choices, improve networking, add operating systems and become more even production ops capable.

That’s quite a busy year!

OSED OMG: OpenStack Essex Deploy Day!! A day-long four-session two-track International Online Conference

Curious about OpenStack? Know it, but want to tune your Ops chops? JOIN US on Thursday 5/31 (or Friday 6/1 if you are in Asia)!

Already know the event logistics? Skip back to my OSED observations post.

Some important general notes:

  1. We are RECORDING everything and will link posts from the event page.
  2. There is HOMEWORK if you want to get ahead by installing OpenStack yourself.
  3. For last minute updates about the event, I recommend that you join the Crowbar Listserver.

Content Logistics work like this.

  1. Everything will be available ONLINE. We are also coordinating many physical sites as rally points.
  2. Introductory: FOUR 3-hour sessions for people who do not have OpenStack or Crowbar experience. These sessions will show how to install OpenStack using Crowbar, discuss DevOps and showcase companies that are in the OpenStack ecosystem. They are planned to have 2 European slots (afternoon & evening), 3 US slots (morning, afternoon & evening), and 1 Asian slot (morning).
  3. Expert: ON-GOING deep technical sessions for engineers who have OpenStack and/or Crowbar experience. There will be one main screen and voice channel in which we are planning to highlight and discuss these topics in blocks throughout the day. We have a long list of topics to discuss and will maintain an ongoing Google Hangout for each topic. Depending on interest, we will jump back and forth to different hangouts.

Intro/Overview Session Logistics work like this

We’re planning FOUR introductory sessions throughout the day (read ahead?). Each session should be approximately 3 hours. The first hour of the sessions will be about OpenStack Essex and installing it using Crowbar. After some Q&A, we’re going to highlight the OpenStack ecosystem. The schedule for the ecosystem is in flux and will likely shift even during the event.

The Session start times for Overview & Ecosystem content

Region EDT Session 1 Session 2 Session 3 Session 4
Europe (-5) -5 3pm 6pm * *
Americas Eastern 0 10am 1pm 4pm *
Americas Central +1 9am Noon 3pm *
Americas Mtn +2 * 11am 2pm 7pm
Americas West +3 * 10am 1pm 6pm
Asia (Toyko) +10 * * * 6/1 10 am

* There are no planned live venues at this time/region. You are always welcome to join online!

Experts Track Logistics

Note: we expect experts to have already installed OpenStack (see homework page). Ideally, an expert has already setup a build environment.

We have a list of topics (Essex, Quantum, Networking, Pull from Source, Documentation, etc) that we plan to cover on a 30-60 minute rotation.

We will cover the OpenStack Essex deploy at the start of each planned session (9am, Noon, 3pm & 8pm EDT). Before we cover the OpenStack deploy, we’ll spend 10 minutes setting (and posting) the agenda for the next three hours based on attendee input.

Even if we are not talking about a topic on the main channel, we will keep a dialog going on topic specific Google hangouts. The links to the hangouts will be posted with the Expert track agenda.

We need an OpenStack Reference Deployment (My objectives for Deploy Day)

I’m overwhelmed and humbled by the enthusiasm my team at Dell is seeing for the OpenStack Essex Deploy day on 5/31 (or 6/1 for Asia). What started as a day for our engineers to hack on Essex Cookbooks with a few fellow Crowbarians has morphed into an international OpenStack event spanning Europe, Americas & Asia.

If you want to read more about the event, check out my event logistics post (link pending).

I do not apologize for my promotion of the Dell-lead open source Crowbar as the deployment tool for the OpenStack Essex Deploy. For a community to focus on improving deployment tooling, there must be a stable reference infrastructure. Crowbar provides a fast and repeatable multi-node environment with scriptable networking and packaging.

I believe that OpenStack benefits from a repeatable multi-node reference deployment. I’ll go further and state that this requires DevOps tooling to ensure consistency both within and between deployments.

DevStack makes trunk development more canonical between different developers. I hope that Crowbar will help provide a similar experience for operators so that we can truly share deployment experience and troubleshooting. I think it’s already realistic for Crowbar deployments to a repeatable enough deployment that they provide a reference for defect documentation and reproduction.

Said more plainly, it’s a good thing if a lot of us use OpenStack in the same way so that we can help each out.

My team’s choice to accelerate releasing the Crowbar barclamps for OpenStack Essex makes perfect sense if you accept our rationale for creating a community baseline deployment.

Crowbar is Dell-lead, not Dell specific.

One of the reasons that Crowbar is open source and we do our work in the open (yes, you can see our daily development in github) is make it safe for everyone to invest in a shared deployment strategy. We encourage and welcome community participation.

PS: I believe the same is true for any large scale software project. Watch out for similar activity around Apache Hadoop as part of our collaboration with Cloudera!

Seven Cloud Success Criteria to consider before you pick a platform

From my desk at Dell, I have a unique perspective.   In addition to a constant stream of deep customer interactions about our many cloud solutions (even going back pre-OpenStack to Joyent & Eucalyptus), I have been an active advocate for OpenStack, involved in many discussions with and about CloudStack and regularly talk shop with Dell’s VIS Creator (our enterprise focused virtualization products) teams.  And, if you go back ten years to 2002, patented the concept of hybrid clouds with Dave McCrory.

Rather than offering opinions in the Cloud v. Cloud fray, I’m suggesting that cloud success means taking a system view.

Platform choice is only part of the decision: operational readiness, application types and organization culture are critical foundations before platform.

Over the last two years at Dell, I found seven points outweigh customers’ choice of platform.

  1. Running clouds requires building operational expertise both at the application and infrastructure layers.  CloudOps is real.
  2. Application architectures matter for cloud deployment because they can redefine the SLA requirements and API expectations
  3. Development community and collaboration is a significant value because sharing around open operations offers significant returns.
  4. We need to build an accelerating pace of innovation into our core operating principles
  5. There are still significant technology gaps to fill (networking & storage) and we will discover new gaps as we go
  6. We can no longer discuss public and private clouds as distinct concepts.   True hybrid clouds are not here yet, but everyone can already see their massive shadow.
  7. There is always more than one right technological answer.  Avoid analysis paralysis by making incrementally correct decisions (committing, moving forward, learning and then re-evaluating).

OpenStack Austin: What we’d like to see at the Design Summit

Last week, the OpenStack Austin user group discussed what we’d like to see at the upcoming OpenStack Design Summit. We had a strong turnout (48?!).

  1. To get the meeting started, Marc Padovani from HP (this month’s sponsor) provided some lessons learned from the HP OpenStack-Powered Cloud. While Marc noted that HP has not been able to share much of their development work on OpenStack; he was able to show performance metrics relating to a fix that HP contributed back to the OpenStack community. The defect related to the scheduler’s ability to handle load. The pre-fix data showed a climb and then a gap where the scheduler simply stopped responding. Post-fix, the performance curve is flat without any “dead zones.” (sharing data like this is what I call “open operations“)
  2. Next, I (Rob Hirschfeld) gave a brief overview of the OpenStack Essex Deploy Day (my summary) that Dell coordinated with world-wide participation. The Austin deploy day location was in the same room as the meetup so several of the OSEDD participants were still around.
  3. The meat of the meetup was a freeform discussion about what the group would like to see discussed at the Design Summit. My objective for the discussion was that the Austin OpenStack community could have a broader voice is we showed consensus for certain topics in advance of the meeting.

At Jim Plamondon‘s suggestion, we captured our brain storming on the OpenStack etherpad. The Etherpad is super cool – it allows simultaneous editing by multiple parties, so the notes below were crowd sourced during the meeting as we discussed topics that we’d like to see highlighted at the conference. The etherpad preserves editors, but I removed the highlights for clarity.

The next step is for me to consolidate the list into a voting page and ask the membership to rank the items (poll online!) below.

Brain storm results (unedited)

Stablity vs. Features

API vs. Code

  • What is the measurable feature set?
  • Is it an API, or an implementation?
  • Is the Foundation a formal-ish standards body?
  • Imagine the late end-game: can Azure/VMWare adopt OPenStack’s APIs and data formats to deliver interop, without running OpenStack’s code? Is this good? Are there conversations on displacing incumbents and spurring new adoption?
  • Logo issues

Documentation Standards

  • Dev docs vs user docs
  • Lag of update/fragmentation (10 blogs, 10 different methods, 2 “work”)
  • Per release getting started guide validated and available prior or at release.

Operations Focus

  • Error messages and codes vs python stack traces
  • Alternatively put, “how can we make error messages more ops-friendly, without making them less developer-friendly?”
  • Upgrade and operations of rolling updates and upgrades. Hot migrations?

If OpenStack was installable on Windows/Hyper-V as a simple MSI/Service installer – would you try it as a node?

  • Yes.

Is Nova too big?  How does it get fixed?

  • libraries?
  • sections?
  • make it smaller sub-projects
  • shorter release cycles?

nova-volume

  • volume split out?
  • volume expansion of backend storage systems
  • Is nova-volume the canonical control plane for storage provisioning?  Regardless of transport? It presently deals in block devices only… is the following blueprint correctly targeted to nova-volume?

https://blueprints.launchpad.net/nova/+spec/filedriver

Orchestration

  • Is the Donabe project dead?

Discussion about invitations to Summit

  • What is a contribution that warrants an invitation
  • Look at Launchpad’s Karma system, which confers karma for many different “contributory” acts, including bug fixes and doc fixes, in addition to code commitments

Summit Discussions

  • Is there a time for an operations summit?
  • How about an operators’ track?
  • Just a note: forums.openstack.org for users/operators to drive/show need and participation.

How can we capture the implicit knowledge (of mailing list and IRC content) in explicit content (documentation, forums, wiki, stackexchange, etc.)?

Hypervisors: room for discussion?

  • Do we want hypervisor featrure parity?
  • From the cloud-app developer’s perspective, I want to “write once, run anywhere,” and if hypervisor features preclude that (by having incompatible VM images, foe example)
  • (RobH: But “write once, run anywhere” [WORA] didn’t work for Java, right?)
  • (JimP: Yeah, but I was one of Microsoft’s anti-Java evangelists, when we were actively preventing it from working — so I know the dirty tricks vendors can use to hurt WORA in OpenStack, and how to prevent those trick from working.)

CDMI

Swift API is an evolving de facto open alternative to S3… CDMI is SNIA standards track.  Should Swift API become CDMI compliant?  Should CDMI exist as a shim… a la the S3 stuff.

CloudOps white paper explains “cloud is always ready, never finished”

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.

Greg Althaus at 11/15 Austin Cloud User Group meeting (annotated 90 min audio)

Greg Althaus did a 90 minute Crowbar deep dive at this week’s Austin Cloud User Group.  Brad Knowles recorded audio and posted it so I thought I’d share the link and my annotations.  There are a lot more times to catch up with our team at Dell in Austin, Boston, and Seattle.

Video Annotations –  (##:## time stamp)

  • 00:00 Intros & Meeting Management
  • 12:00 Joseph George Introduction / Sponsorship
  • 16:00 Greg Starts – why Crowbar
  • 19:00 DevOps slides
  • 21:00 What does Crowbar do for DevOps
    • make it easier to manage
    • make it easier to repeat
  • 24:00 What’s included – how we grow / where to start
  • 27:20 Starting to show crowbar – what’s included as barclamps
    • pluggable / configuration
    • Barclamps!
  • 28:10 What is a barclamp
    • discussion about the barclamps in the base
  • 34:30: We ❤ Chef. Puppet vs Chef
  • 36:00 Why barclamps are more than cookbooks
  • 36:30 State machine & transitions
  • Q&A Section
    • 38:50 Reference Architectures
    • 43:00 Barclamps work outside of Crowbar?
    • 44:15 Hardware models supported
    • 47:30 Storage Queston
    • 49:00 HA progress
    • 53:00 Ceph as a distributed cloud on all nodes
    • 56:20 Deployer has a map of how to give out roles
  • 58:00 Demo Fails
  • 58:30 Crowbar Architecture
  • 62:00 How Crowbar can be extended
  • 63:00 Workflow & Proposals
  • 65:40 Meta Barclamps
  • 71:10 Chef Environments
  • 73:40 Taking OpenStack releases and Environments
  • 75:00 The case for remove recipes
  • 77:33 Git Hub Tour
  • 79:00 Network Barclamp deep dive
  • 84:00 Adding switch config (roadmap topic)
  • 86:30 Conduits
  • 87:40 Barclamp Extensions / Services
  • 89:00 Questions
    • 89:20 Proposal operations
    • 93:30 OpenStack Readiness & Crowbar Design Approach
    • 93:10 Network Teaming
    • 94:30 Which OS & Hypervisors
    • 96:30 Continuous Integration & Tools
    • 98:40 BDD (“cucumberesque”) & Testing
    • 99:40 Build approach & barclamp construction
  • 100:00 Wrap up by Joseph

Cote & Rob interview: Crowbar+OpenStack Summit/Conference Reflections (40 mins)

I’m working on a larger post about the OpenStack Summit around API Implementation vs. Specification. You can have a preview of that AND A LOT OF OTHER STUFF (OpenStack, Crowbar, lunch) in this 40 minute interview w/ Michael Cote.

Setting: Dell World
Interview w/ @Cote at the Hilton Hotel Lobby on 6th street in Austin.

I know that Cote’s post does not have a time marker for easy navigation; however, I added them to help guide your navigation in the interview (link for audio) if you want to jump around.

  • 0:00 Introductions
  • 1:00 OpenStack
    • 1:00 Essex Conference – what is it, naming conventions
    • 2:45 Diablo is adding projects from incubation (Keystone, Dashboard,Quantum,
    • 5:30 OpenStack vs. Amazon – “OpenStack has ambitions.” We see it as a “platform for innovation.”
    • 6:30 OpenStack is a competitor for Amazon. It implements the EC2 APIs.
    • 7:30 How are people managing the evolving nature?
    • 8:20 We’re going to see OpenStack in production for the next release based on what we see in our deal flow.
    • 9:00 Every user that comes on adds momentum
    • 9:30 Rackspace setting up the OpenStack foundation is a reflection of the speed of adoption
    • 11:15 Our message is “we’re doing it, we’re in the field.” We are very hands on
    • 11:15 We chose early on to focus on helping deployment to help drive adoption
    • 12:00 “Our first test for partners is: Are you contributing back to the community?”
    • 12:44 The community told us “if you are participating then you are going to open source.” Our commits for OpenStack are live and in the open on our github.
    • 13:40 Why Github? We’re happy with it.
    • 14:20 OpenStack is using Gerrit because they have a gated trunk. They are migrated to Github
    • 15:20 APIs have been a big topic for OpenStack
    • 16:00 Do you track who is forking and following? Yes. We also have a listserv. We are trying to do a better job managing the Crowbar community. We know we need to do a better job.
    • 17:30 OpenStack is defined by its Implementation. That’s “an effective way to move the project forward quickly;” however, we’re getting to a point where people want to use alternate implementations.
    • 19:20 Implementation vs Specification is like the SOAP vs REST debate
    • 20:05 This is something the community needs to wrestle with
    • 21:45 Specification would allow the efforts to scale. The more people consume the API, the more people care about how it operates
    • 22:30 “Bugs can become the API”
    • 23:10 Asia and Europe are very active. We are seeing a ton of activity overseas.
  • 23:30 Crowbar
    • 24:00 Crowbar arose out of our need to deploy cloud software regardless of customer infrastructure
    • 24:45 We would show up and the customer needed all this cloud infrastructure. We created Crowbar because we always needed this
    • 26:00 We extended Chef because we had to do the initial bring-up including BIOS and RAID
    • 26:45 We added a state machine and an orchestration layer
    • 27:45 Updating the system is a huge component. Every month you may be upgrading the infrastructure!
    • 28:30 In our lab, we build whole clouds multiple times a day
    • 29:45 Crowbar is the “cloud unboxer”
    • 30:00 We modularized Crowbar with barclamps. Hadoop and OpenStack are a series of barclamps. Over 5 for each
    • 31:00 Barclamps are applied as layers. We are using that as a term to define DevOps
    • 31:15 We are using Crowbar to help message that we understand DevOps
    • 31:45 Soup vs Sandwich analogy – Images are like soup while DevOps is like a sandwich.
    • 32:45 If you don’t want something in a 1000 server deployment, DevOps lets you make a small change. Gives you flexibility.
    • 33:45 We added Cloud Foundry
    • 34:00 We’ve made it so easy with barclamps that partners are coming to us with ideas for barclamps. It’s like “changing the meat for the sandwich.”
    • 34:30 Dreamhost Ceph team created a barclamp and was actually running a majority of the Crowbar demos at the OpenStack conference
  • 35:25 What’s the future for Crowbar?
    • 35:30 More aspects of the infrastructure as open source
    • 35:45 More Hardware
    • 36:00 Multiple operating systems at the same time (XenServer, ESX, etc)
    • 36:30 Larger scale
    • 36:50 More types of infrastructure: storage & network
    • 37:40 Scalr shout out
    • 38:00 We know we need to collaborate more with our community
    • 38:30 The first step is to download it and try. Read my blog and sign up for the list serve
    • 39:00 CROWBAR IS NOT DELL SPECIFIC – we are working with people who want to create support for other vendor’s hardware. This benefits Dell.
    • 39:40 We don’t pretend that our customers are single vendor


OpenStack Hardware Sessions Slides & Updates to “Bootstrapping Hyperscale Clouds”

Disclaimer: I work for Dell and we make very fine cloud optimized hardware.  This presentation is NOT slanted towards Dell hardware, it simply reflects lessons learned from listening to our customers.  Of course, people buying Dell servers, networking and storage products pays my salary and WE LOVE OUR CLOUD CUSTOMERS!

This presentation OpenStack Hardware Oct 2011 was the outline for our revisiting of the hyperscale white paper from last fall.  I’m in the process of updating this to reflect lessons learned.

Here are some highlights:

  1. Fault zones are not as critical as expected – customers are putting in more redundancy instead of striping apps
  2. 10 GbE is more popular than expected
  3. Logically segmented redundant networking (teamed) is more popular than physical isolate
  4. Customers are starting small (AnyScale instead of HyperScale)

Crowbar modularized: latest changes that make clouds even easier to create, update, and maintain

In the last week, my team at Dell completed a major refactoring of Crowbar that significantly improves our ability to bring in community contributions and field customizations.  Today, we merged it into Crowbar’s public repo(s).

From the very first versions, our objective for Crowbar was to create the fastest and most reliable cloud deployments. Along the way, we realized Crowbar’s true potential lay in embracing DevOps as an operational model for maintaining clouds. That meant building up cloud deployments in layers from pieces that we call barclamps (extensions of Chef cookbooks). Our first version, centered on OpenStack Cactus, leveraged barclamps but was still created as a single system. This unified system was a huge step forward in cloud deployments, but did not live up to our CloudOps vision of continuous delivery.

In this version, each Crowbar barclamp is an independent delivery unit that can be integrated before, while or after installing Crowbar.

The core of the change is each barclamp, including the most core ones, are stored in independent code repositories. Putting the code into distinct repos means that each barclamp can have its own life cycle, its own maintainer site and its own dependency tree. This modularization allows customers to manage their Crowbar deployments with a very fine brush: they may choose to customize parts of the system, they could lock components to specific tag and they can bring in barclamps from other vendors.

While the core barclamps are automatically integrated into the Crowbar build using git submodules; other barclamps are installed into the system as needed. This allows you to pull in the suite of OpenStack barclamps at build time or to wait until your Crowbar system is running before installing. Once you install a barclamp, you are able to retrieve an updated barclamp and reapply it to the system.

This feature gives you the ability to 1) choose exactly what you want to include and 2) perform field updates to a live Crowbar system.

Let’s look at some examples:

  1. The Cloud Foundry barclamp can be sourced Cloud Foundry instead of bundled into the Crowbar repository. This allows the team working on the cloud application to take ownership for their own deployment. As a continuous delivery proponent, I believe strongly that the development team should be responsible for ensuring that their code is deployable (refer to my OpenStack “Deployer API” blue print attempting to codify this).
  2. DreamHost, maintainers of Ceph Storage, can maintain their own local barclamp repos for OpenStack that are cloned from our community Swift barclamp. This allows them to innovate and customize OpenStack deployments for their business and choose which updates to merge back to the community.
  3. Rackspace Cloud Builders can work on the most leading edge OpenStack features and maintaining workable deployments on branches. As the code stabilizes, they simply merge in their changes.
  4. Dell BIOS and RAID barclamps only support the PowerEdge C line today. When we offer PowerEdge R support, you will be able to install or update the barclamps to add that capability. If another hardware vendor creates a barclamp for their hardware then you can install that into your existing system.

I believe that these changes to Crowbar are a huge step forwards on our journey of creating a community supportable Open Operations framework. I hope that you are as excited as I am about these changes.

I encourage you to take the first step by trying out Crowbar and, ultimately, writing your own barclamps.

Post Scripts:

  • In addition to the modularization, the updated code includes RHEL as a deployment platform. At present, you must choose to be either RHEL or Ubuntu at build time.
  • We have enhanced the network barclamp to describe connections as more abstract connections, called conduits, between nodes. This is a powerful change, but requires some understanding before you start making changes.
  • We have only begun testing the change as of 9/12, we expect the system to be fully stabilized by 10/3. If you are not willing to deal with bugs then I recommend building the Crowbar “v1.0” tag (or using the ISOs from our July launch).