Crowbar near-term features: increasing DevOps mojo and brewing Diablo

We’ve been so busy working on getting RHEL support ready to drop into the Crowbar repos that I have not had time to post about what’s coming next for Crowbar. The RHEL addition has required a substantial amount of work to accommodate different packaging models and capabilities. This change moves Crowbar closer to being able allow nodes’ operating systems (the allocated TFTP Boot Image) to be unique per node.

I will post more forward looking details soon but wanted to prime the pump and invite suggestions from our community.

We are tracking two major features for delivery by the OpenStack October Design Conference

  1. OpenStack Diablo Barclamps. Expect to see individual barclamps for various components like Keystone, Dashboard, Glace, Nova, Swift, etc)
  2. Barclamp versioning / connected imports. This feature will enable Crowbar to pull in the latest components for barclamps from remote repositories. I consider this a critical feature for Crowbar’s core DevOps/CloudOps capabilities and to support more community development for barclamps.

We are also working on some UI enhancements

  • Merging together the barclamps/proposals/active views into a single view
  • Enabling bulk actions for nodes (description, BIOS types, and allocate)
  • Allowing users to set node names and showing the names throughout the UI
  • More clarity on state of proposal application process (stretch goal)

I am planning to post more about our design ideas as work begins.

If you want to help with Diablo barclamps, these will be worked in the open and we’d be happy to collaborate. We’re also open to suggestion for what’s next.

Videos about Crowbar, CloudOps, and Dell OpenStack Cloud

I’m not usually a big fan of launch videos (too much markitecture); however, these turned out to be nice and meaty.  The meaty part explains why it looks like I’m about to eat a big sandwich in the last video.  yum!

  • What is Crowbar: Dell Crowbar Software Overview  

Continue reading

Crowbar source released, includes OpenStack Cloud install

I’m delighted to announce (official version) that my team at Dell has opened the Crowbar source under the Apache 2 license. This action is part of the broader Dell OpenStack Cloud Solution which includes OpenStack install packages, Crowbar, reference hardware architectures, and services/consulting to support deployments.

There are two important components to this news:

  1. Dell is officially offering our OpenStack Solution and helping advance the community’s ability to implement OpenStack quickly and consistently.
  2. Dell is releasing the Crowbar code (which is included in the solution) as open source.

Both are significant items; however, my focus here is on the Crowbar release.

Crowbar started as a Dell OpenStack installer project and then grew beyond that in scope.  Now it can be extended to work with other vendors’ kits and other solutions bits.

We are contributing Crowbar to the community because we believe that everyone benefits by sharing in the operational practices that Crowbar embodies. These are rooted in Opscsode Chef (which Crowbar tightly integrates with) and the cloud & hyper-scale proven DevOps practices that are reflected in our deployment model.

Where to get it?

What’s included?

  • A comprehensive set of barclamps to set up an OpenStack cloud.
  • Crowbar UI and Remote APIs to make it easy to set up your cloud
  • Automated testing scripts for community members doing continuous integration with OpenStack.
  • Build scripts so you can create your own Crowbar install ISO
  • Switch discovery so you can create Chef Cookbooks that are network aware.
  • Open source Chef server that powers much of Crowar’s functionality

What’s not included?

  • Non-open source license components (BIOS+RAID config) that we could not distribute under the Apache 2 license.  We are working to address this and include them in our release.  They are available in the Dell Licensed version of Crowbar.
  • Dell Branded Components (skin + overview page).   Crowbar has an OpenSource skin with identical functionality.
  • Pre-built ISOs with install images (you must download the open source components yourself, we cannot redistribute them to you as a package)

Important notes:

  • Crowbar uses Chef Server as its database and relies on cookbooks for node deployments.  It is installed (using Chef Solo) automatically as part of the Crowbar install.
  • Crowbar has a modular architecture so individual components can be removed, extended, and added. These components are known individually as barclamps.
  • Each barclamp has its own Chef configuration, UI sub-component, deployment configuration, and documentation.

On the project roadmap:

  • Hadoop support
  • Additional operating system support (specifically RHEL)
  • Barclamp version repository
  • Network configuration
  • We’d like suggestions!  Please comment!

Sites for more information: Joseph George, Barton George (launch day), Dell

Austin CloudCamp 7/20 – Lightening Talk

If you’re in Austin on 7/20 then come to the 2011 ATX CloudCamp @ 6pm (Downtown)

In addition to the normal great unconference format, I’ll be giving one of the lightning talks.  My topic will be about Cloud Operations for OpenStack.

Here’s a copy of my CloudCamp 07 2011 preso.  Unfortunately, the video was not complete so I can’t include it.

 

The unexpected openness of OpenStack: why it’s important to learn from others’ operations experience.

During the OpenStack Design Conference, Forrester’s James Staten (@Staten7) raved about OpenStack’s transparency compared to AWS.  Within the enclave of OpenStack fan boys supports (Dell alone sent >14 people to the summit), his post drew a considerable attention but did little to really further the value proposition.

“Open deployments” are a much more significant value to implementors than transparency from open source code.

For any technology solution, there are significant challenges that will only be understood when the system is under stress.  In some cases, these challenges are code defects; however, many will be related to configuration and deployment choices that are site specific.  It is correcting these issues that result in design patterns and practices that create a robust infrastructure; consequently, the process of hardening a solution is critical to its ultimate stability and success.

When a solution, like AWS, is deployed and managed by a single entity, it is extremely rare for operational lessons learned and best practices to make it to the larger community.  Amazon’s recent post mortem is a welcome exception.   This is not a bad thing (Roman Stanek’s contrasting point), it is just the reality of a proprietary cloud.  AWS operates as a black box and I don’t believe that Amazon’s operational experience would be relevant to others unless they were also operationally transparent.

While it makes business sense to remain operationally opaque, service providers lose the benefit of external lessons learned when there is no community working in parallel with them.

OpenStack’s community has an opportunity to iterate on CloudOps patterns and practices at a dramatically faster rate than any single provider.  This creates distinct value for OpenStack adopters because they can shorten or eliminate their own challenges because other adopters will have the same pains and benefit from the same fixes.

It is critical to understand that the benefit is conferred to both the party sharing the problem (they get advice and support) and the party lending assistance (they avoid the problem).  This is distinctly different from proprietary clouds where sharing is likely to cause embarrassment  unlikely to create helpful outcomes.

I am not advocating that all OpenStack deployments be the same or follow a prescriptive patterns. 

I believe that each installation will be unique in some way; however, there will  be enough commonalities and shared code to make sharing worthwhile.  This is especially true for adopters who start with tools like Crowbar that leverage community based Chef Recipes and automating scripts.  Tools that encourage automation and shared scripts help accelerate the establishment of robust deployment patterns and practices.

Ultimately, the ability to collaborate on cloud operation practice does more to strengthen OpenStack than developers, code reviews or corporate endorsements.

OpenStack “Operating Hyperscale Clouds” preso given at Design Summit on 4/27.

Today, I gave the a presentation that is the sequel to the “bootstrapping the hyperscale clouds.”  The topic addresses concerns raised by the BlackOps concept but brings up history and perspectives that drive Dell’s Solution focus for OpenStack.

You can view in slide share or here’s a PDF of the preso: Operating the Hyperscale Cloud 4-27

I tend to favor pictures over text, so you may get more when OpenStack posts the video [link pending].

The talk is about our journey to build an OpenStack solution starting from our drive to build a cloud installer (Crowbar).  The critical learning is that success for clouds is the adoption of an “operational model” (aka Cloud Ops) that creates cloud agility.

Ultimately, the presentation is the basis for another white paper.

Appending:

BlackOps: 7 tenants for infrastructure & operations in hyperscale clouds. #CloudOps #Hyperscale

Traditional IT Ops

In my work queue at Dell, the request for a “cloud taxonomy” keeps turning up on my priority list just behind world dominance peace.  Basically, a cloud taxonomy is layer cake picture that shows all the possible cloud components stacked together like gears in an antique Swiss watch.  Unfortunately, our clock-like layer cake has evolved to into a collaboration between the Swedish Chef and Rube Goldberg as we try to accommodate more and more technologies into the mix.

The key to de-spaghettifying our cloud taxomony was to realize that clouds have two distinct sides: an external well-known API and internal “black box” operations.  Each side has different objectives that together create an elastic, scalable cloud.

The objective of the API side is to provide the smallest usable “surface area” for cloud consumers.  Surface area describes the scope of the interface that is published to the users.  The smaller the area, the easier it is for users to comprehend and harder it is for them to break.  Amazon’s EC2 & S3 APIs set the standards for small surface area design and spawned a huge cloud ecosystem.

Hyperscale Cloud (APIs!)

To understand the cloud taxonomy, it is essential to digest the impact of the cloud ecosystem.  The cloud ecosystem exists primarily beyond the API of the cloud.  It provides users with flexible options that address their specific use cases.  Since the ecosystem provides the user experience on top of the APIs (e.g.: RightScale), it frees the cloud provider to focus on services and economies of scale inside the black box.

The objective of the internal side of clouds is to create a prefect black box to give API users the illusion of a perfectly performing, strictly partitioned and totally elastic resource pool.  To the consumer, it does should not matter how ugly, inefficient, or inelegant the cloud operations really are; except, of course, that it does matter a great deal to the cloud operator. 

Cloud operation cannot succeed at scale without mastering the discipline of operating the black box cloud (BlackOps). 

Cloud APIs spawn Ecosystems

The BlackOps challenge is that clouds cannot wait until all of the answers are known because issues (or solutions) to scale architecture are difficult to predict in advance.  Even worse, taking the time to solve them in advance likely means that you will miss the market.

Since new technologies and approaches are constantly emerging, there is no “design pattern” for hyperscale.  To cope with constant changes, BlackOps live by seven tenants that help manage their infrastructure efficiently in a dynamic environment.

  1. Operational ownership – don’t wait for all the king’s horses and consultants to put your back together again (but asking for help is OK).
  2. Simple APIs – reduce the ways that consumers can stress the system making the scale challenges more predictable.
  3. Efficiency based financial incentives – customers will dramatically modify their consumption if you offer rewards that better match your black box’s capabilities.
  4. Automated processes & verification – ensures that changes and fixes can propagate at scale while errors are self-correcting.
  5. Frequent incremental rolling adjustments – prevents the great from being the enemy of the good so that systems are constantly improving (learn more about “split testing”)
  6. Passion for operational simplicity – at hyperscale, technical debt compounds very quickly.  Debt translates into increased risk and reduced agility and can infect hardware, software, and process aspects of operations.
  7. Hunger for feedback & root-cause knowledge – if you’re building the airplane in flight, it’s worth taking extra time to check your work.  You must catch problems early before they infect your scale infrastructure.  The only thing more frustrating than fixing a problem at scale, if fixing the same problem multiple times.

It’s no surprise that these are exactly the Agile & Lean principles.  The pace of change of cloud is so fast and fluid, that BlackOps must use an operational model that embraces iterative and rolling deployment.

Compared to highly orchestrated traditional IT operations, this approach seems like sending a team of ninjas to battle on quicksand with objectives delivered in a fortune cookie.

I am not advocating fuzzy mysticism or by-the-seat-of-your-pants do-or-die strategies.  BlackOps is a highly disciplined process based on well understood principles from just-in-time (JIT) and lean manufacturing.  Best of all, they are fast to market, able to deliver high quality and capable of responding to change.

Post Script / Plug: My understanding of BlackOps is based on the operational model that Dell has introduced around our OpenStack Crowbar project.  I’m going to be presenting more about this specific topic at the OpenStack Design Conference next week.