Rackspace will balance control of OpenStack. It takes time & strong partners

Rick Clark’s post “Why I Left Rackspace and What About Openstack” (+ his softer post script) is part of a longer conversation that started when Rackspace acquired Anso Labs and was expanded with the resignation of Chris Kemp (NASA CTO & OpenStack #1 fanboy).

Building a community is a delicate balance: you need show leadership while you cultivate leadership.

Putting aside the context (resigning from Rackspace to join Cisco) of his post, I think that Rick’s comments do resonate with parts of the community.  OpenStack goverance became unbalanced when Anso became Rackspace.  The governance board formed at the Austin conference was dominated by a small number (2: NASA/Anso & Rackspace) of highly committed voices but there was no single master.

Considering OpenStack’s momentum, we are in a very good position to fix the single master problem.  However, it takes time.  While companies like Dell (my employer), NTT, Citrix, Cisco (Rick’s employer), and Microsoft are clearly investing in OpenStack, none have yet achieved NASA or Rackspace’s level of technical committment.

The challenge for Rackspace is to expand the OpenStack market and ecosystem so that partners are motivated to jump in more and more quickly.  If my experiences inside Dell are indicative of the broader community, Rackspace’s leadership makes it much easier for partners to increase their own commitment.  Like teaching my daughter to ride her bike, she needed to know that I was running next to her before she would pedal hard enough to balance by herself.

Like teaching bike riding – you can’t lead communities too hard or too lightly.

To build a community around OpenStack, we (the partners) need to stand up our own capability.  Until we have demonstrated more leadership, Rackspace must cultivate both a community and a market.  This is a challenging role to balance.  While the community wants distributed ownership, the market wants leadership.  Rick’s governance comments are evidence of this struggle and Rick’s move to Cisco is an indication of leadership diversification.

I believe that Rackspace is committed to distributed ownership – we, in the community, need to rise to the challenge!

OpenStack still needs strong leadership from Rackspace because the market needs someone to be accountable for releases and features.  That allows new partners to depend on someone to run beside them while the wobble their way along to independence.  As the community leaders stand up, we’ll see a balanced community emerge.  The challenge is on us to make that happen (and happen quickly).

How OpenStack installer (crowbar + chefops) works (video from 3/14 demo)

July 24th 2012 Update:

This page is very very old and Crowbar has progressed significantly since this was posted.  For better information, please visit the Crowbar wiki and  review my Crowbar 2 writeups.

August 5th 2011 Update:

While still relevant and accurate, the information on this page does not reflect the latest information about the now Apache 2 released Crowbar code.  In the 4+ months following this post, we substantially refactored the code make make it more modular (see Barclamps), better looking, and multi-vendor/multi-application (Hadoop & RHEL).  If you want more information, I recommend that you try Crowbar for yourself.

Original March 14th 2011 Text:

I’ve been getting some “how does Crowbar work” inquiries and wanted to take a shot at adding some technical detail.   Before I launch into technical babble, there are some important things to note:

  1. Dell has committed to open source release the code for Crowbar (Apache 2)
  2. Crowbar is an extension of Chef Server – it does not function stand alone and uses Chef’s APIs to store all it’s data.
  3. The OpenStack components install is managed by Chef cookbooks & recipes jointly developed by Dell, Opscode and Rackspace.
  4. Crowbar can be used to simply bootstrap your data center; however, we believe it is the start of a cloud operational model that I described in the hyperscale cloud white paper.

LIVE DEMO (video via Barton George): If you’re at SXSW on 3/14 @ 2pm in Kung Fu Salon, you can ask Greg Althaus to explain it – he does a better job than I do.

Here’s what you need to know to understand Crowbar:

Crowbar is a PXE state machine.

The primary function of Crowbar is to get new hardware into a state where it can be managed by Chef.   To get hardware into a “Chef Ready” state, there are several steps that must be performed.  We need to setup the BIOS, RAID, figure out where the server is racked, install an operating system, assign IP networking and names, synchronize clocks (NTP) and setup a chef client linked to our server.  That’s a lot of steps!

In order to do these steps, we need to boot the server through a series of controlled images (stages) and track the progress through each state.  That means that each state corresponds to a PXE boot image.  The images have a simple script that uses WGET to update the Crowbar server (which stores it’s data in Chef) when the script completes.  When a state is finished, Crowbar will change the PXE server to provide the next image in the sequence.

During the Crowbar managed part of the install, the servers will reboot several times.  Once all of the hardware configuration is complete, Crowbar will use an operating system install image to create the base configuration.  For the first release, we are only planning to have a single Operating System (Ubuntu 10.10); however, we expect to be adding more operating system options.

The current architecture of Crowbar (and the Chef Server that it extends) is to use a dedicated server in the system for administration.  Our default install adds PXE, DHCP, NTP, DNS, Nagios, & Ganglia to the admin server.  For small systems, you can use Chef to add other infrastructure capabilities to the admin server; unfortunately, adding components makes it harder to redeploy the components.  For dynamic configurations where you may want to rehearse deployments while building Chef recipes, we recommend installing other infrastructure services on the admin server.

Of course, the hardware configuration steps are vendor specific.  We had to make the state machine (stored in Chef data bags) configurable so that you can add or omit steps.  Since hardware config is slow, error prone and painful, we see this as a big value add.  Making it work for open source will depend on community participation.

Once Chef has control of the servers, you can use Chef (on the local Chef Server) to complete the OpenStack installation.  From there, you can continue to use Chef to deploy VMs into the environment.  Because Chef encourages a DevOps automation mindset, I believe there is a significant ROI to your investment in learning how this tool operates if you want to manage hyperscale clouds.

Crowbar effectively extends the reach of Chef earlier into the cloud management life cycle.

3/21 Note: Updated graphic to show WGET.

Demo Redux: OpenStack installer SXSW demo of Chef + Crowbar

If you missed the OpenStack installer demo at Cloud Connect Event then you’ll have another chance to see us go from bare iron to provisioning VMs in under 30 minutes at SXSW on Monday 3/14 from 2-4 pm at Kung Fu Saloon.

Note: Rackspace rented out the Kung Fu Saloon all day Monday, and are doing various events — from live webinars to a Scoble tweetup to a happy hour and more VIP after hours event.

The demo will be orchestrated by Greg Althaus from my team at Dell.  Greg is the primary architect for Crowbar and responsible for some of it’s amazing capabilities including the Chef integrations, network discovery and rockin’ PXE state machine.  Dell Cloud Evanglist, Barton George, will also be on hand.

Of course, our friends from Opscode & Rackspace will be there too – this is Rackspace’s party (they are a Platinum SXSW sponsor)

More more information (outside of this blog, of course), check out http://www.Dell.com/OpenStack.

Dell to spin bare iron into OpenStack gold

I’m at the CloudConnect conference today supporting my team’s initial OpenStack foray.   Our announcement part of the Rackspace Cloud Builders announcement.

Tonight (3/8), we’re at the Rackspace Launch with a pony rack of servers (6 nodes) where we will run a LIVE DEMO of our cloud installer (codename “Crowbar”).  The initial offer includes my hyperscale white paper and our cloud foundation kit.

Interested in the details?  Here are background posts that talk about the Lean/Agile process we use, what is Crowbar, and my write up about hyperscale (“flat edge”) data centers.

Added 3/9: Links to articles about the release:

Here’s what Dell is saying about OpenStack on Dell.com/openstack:

Dell is one of the original partners in the OpenStack community, which has now grown to more than 50 companies and participants. To accelerate adoption of this powerful platform, Dell has worked to develop an effortless out-of-box OpenStack experience with:
  • Optimized PowerEdge™ C-based hardware configurations
  • A technical whitepaper that details the design of an OpenStack hyperscale cloud on PowerEdge C server technology
  • An OpenStack installer that allows bare metal deployment of OpenStack clouds in a few hours (vs. a manual installation period of several days)

Read more about the steps to design an OpenStack hyperscale cloud in a Dell technical whitepaper entitled “Bootstrapping OpenStack Clouds.”

Interested?  Contact OpenStack@Dell.com.

Unboxing OpenStack clouds with Crowbar and Chef [in just over 9,000 seconds! ]

I love elegant actionable user requirements so it’s no wonder that I’m excited about how simply we have defined the deliverable for project Crowbar**, our OpenStack cloud installer.

On-site, go from 6+ servers in boxes to a fully working OpenStack cloud before lunch.

That’s pretty simple!  Our goal was to completely eliminate confusion, learning time and risk in setting up an OpenStack cloud.  So if you want to try OpenStack then our installer will save you weeks of effort in figuring out what to order, how to set it up and, most critically, how to install all myriad of pieces and parts required.

That means that the instructions + automation must be able to:

  • Starting with servers in boxes and without external connectivity
  • Setup the BIOS and RAID on all systems
  • Identify the networking topology
  • Install the base operating systems
  • Discover the resources available
  • Select resources for deployment
  • Install the OpenStack infrastructure appropriately on those resources
  • Validate the system is operating correctly
  • Deploy a reference application
  • In under 4 hours (or 14400 seconds).

That’s a lot of important and (normally) painful work!

Crowbar does not do all this lifting alone.  It is really an extension of Opscode’s Chef Server – an already awesome deployment management product.  The OpenStack deployment scripts that we include are collaborations between Dell, Opscode (@MattRay), and RackSpace (@JordanRinke, Wayne Wallis (@waynewalls)
& Jason Cannavale).

There are two critical points to understand about our OpenStack installer:

  1. It’s an open source collaboration* using proven tools (centrally Chef)
  2. It delivers an operational model to cloud management (really a DevOps model)

One of my team’s significant lessons learned about installing clouds is that current clouds are more about effective operations than software features.  We believe that helping customers succeed with OpenStack should focus more heavily on helping you become operationally capable of running a hyperscale system than on adding lots of features to the current code base.

That is why our cloud installer delivers a complete operational environment.

I believe that the heart of this environment must be a strong automated deployment system.  This translates into a core operational model for hyperscale cloud success.  The operational model says that

  1. Individual nodes are interchangeable (can be easily reimaged)
  2. Automation controls the configuration of each node
  3. Effort is invested to make the system deployment highly repeatable
  4. System selection favors general purpose (80% case)
  5. Exceptions should really be exceptions

Based on this model, I expect that cloud operators may rebuild their entire infrastructure on a weekly (even daily!) basis during the pre-production phase while your Ops team works to get their automation into a predictable and repeatable state.  This state provides a stable foundation for expansion.

My experience with Crowbar reinforces this attitude.  We started making choices that delivered a smooth out-of-box experience and then quickly learned that we had built something more powerful than an installer.  It was the concept that you could build and then rebuild your cloud in the time it takes to get a triple caramel mochachino.

Don’t believe me?  I’ve got a system with your name on it just waiting in the warehouse.

*Open source note: Dell has committed to open source release (Apache 2) the Crowbar code base as part of our ongoing engagement in the OpenStack community.

**Crowbar naming history.  The original code name for this project was offered by Greg Althaus as “you can name it purple fuzzy bunny for all I care.”  While excellent as a mascot, it was cumbersome to say quickly.  Crowbar was picked up as a code name because it is 1) easy to say, 2) used for unboxing things, 3) a powerful and fast tool and 4) the item you start with in a popular FPS.  Once properly equipped, our bunny (I call him “Mesa”) was ready to hit IT.

Bootstrapping Hyperscale OpenStack Clouds – slides from 2/3 OpenStack SJC Meetup

The OpenStack meeting lightening talk is only 5 minutes, so the deck is mostly pictures that support points around a more detailed followup.

Here’s the deck: bootstrapping clouds preso

 and my Hyperscale white paper (links through Dell.com)

The theme of the talk is that hyperscale systems requires a fundamentally different management paradigm because at hyperscale

hardware faults are common,manual steps are impractical and small costs add up quickly.

Included in the preso are concepts I introduced at Flatness at the Edge.

2/10 Update: Now you can watch it Thanks to “@opnstk_com_mgr Stephen Spector lighting talks video of Rob Hirschfeld, Dell at Santa Clara, CA Meetup Feb 3, 2011 http://ow.ly/3U8OA

OpenStack Swift Retriever Demo Online (with JavaScript xmlhttprequest image retrieval)

This is a follow-up to my earlier post with the addition of WORKING CODE and an ONLINE DEMO. Before you go all demo happy, you need to have your own credentials to either a local OpenStack Swift (object storage) system or RackSpace CloudFiles.

The demo is written entirely using client side JavaScript. That is really important because it allows you to test Swift WITHOUT A WEB SERVER. All the other Swift/Rackspace libraries (there are several) are intended for your server application to connect and then pass the file back to the client. In addition, the API uses meta tags that are not settable from the browser so you can’t just browse into your Swift repos.

Here’s what the demo does:

  1. Login to your CloudFiles site – returns the URL & Token for further requests.
  2. Get a list of your containers
  3. See the files in each container (click on the container)
  4. Retrieve the file (click on the file) to see a preview if it is an image file

The purpose of this demo is to be functional, not esthetic. Little hacks like pumping the config JSON data to the bottom of the page are helpful for debugging and make the action more obvious. Comments and suggestions are welcome.

The demo code is 4 files:

  1. demo.html has all the component UI and javascript to update the UI
  2. demo.js has the Swift interfacing code (I’ll show a snippet below) to interact with Swift in a generic way
  3. demo.css is my lame attempt to make the page readable
  4. jQuery.js is some first class code that I’m using to make my code shorter and more functional.

1-17 update: in testing, we are working out differences with Swift and RackSpace. Please expect updates.

HACK NOTE: This code does something unusual and interesting. It uses the JavaScript XmlHttpRequest object to retrieve and render a BINARY IMAGE file. Doing this required pulling together information from several sources. I have not seen anyone pull together a document for the whole process onto a single page! The key to making this work is overrideMimeType (line G), truncating the 32 bit string to 16 bit ints ( & 0xFF in encode routine), using Base64 encoding (line 8 and encode routine), and then “src=’data:image/jpg;base64,[DATA GOES HERE]'” in the tag (see demo.html file).

Here’s a snippet of the core JavaScript code (full code) to interact with Swift. Fundamentally, the API is very simple: inject the token into the meta data (line E-F), request the /container/file that you want (line D), wait for the results (line H & 2). I made it a little more complex because the same function does EITHER binary or JSON returns. Enjoy!

retrieve : function(config, path, status, binary, results) {

1   xmlhttp = new XMLHttpRequest();

2   xmlhttp.onreadystatechange=function()  //callback

3      {

4         if (xmlhttp.readyState==4 && xmlhttp.status==200) {

5            var out = xmlhttp.responseText;

6            var type = xmlhttp.getResponseHeader("content-type");

7            if (binary)

8               results(Swift.encode(out), type);

9            else

A               results(JSON.parse(out));

B         }

C      }

D   xmlhttp.open('GET',config.site+'/'+path+'?format=json', true)

E   xmlhttp.setRequestHeader('Host', config.host);

F   xmlhttp.setRequestHeader('X-Auth-Token', config.token);

G   if (binary) xmlhttp.overrideMimeType('text/plain; charset=x-user-defined');

H   xmlhttp.send();
}

OpenStack Swift Demo (in a browser)

I’m working on mini-demo project for OpenStack Swift.  To keep things very simple and easy to understand, I decided that the whole demo would work in JavaScript in the browser.  I also choose to use RackSpace’s CloudFiles as a Swift target for testing since they have the same API are are universally available (unlike my lab systems).

One advantage of this approach is that FireBug makes it very nice to debug and check the activity of the code.  Unfortunately, FireBug also seems to eat the headers that I need.  *Let me phrase that in a google friend way so that someone else will not loose the 2 hours I just lost*

“XmlHttpRequest setRequestHeader FireFox Not Respected when using FireBug”

It works great in Safari. So onward and upward.  So far, I’ve got step #1 ready – getting the authorization token back from the cloud site.

Here’s the HTML page (you need jQuery too).  Basically, it uses the username and key from the inputs to set “x-auth-user” and “x-auth-key” header attributes.  These attributes will allow Swift to return a token that you can use on future requests when you want to do useful work.

<!DOCTYPE html>

<html>

<head>

<title>Dell Swift Demo [0.0]</title>

<script src=”jquery.js” type=”text/javascript”></script>

<script type=”text/javascript” charset=”utf-8″>

var xmlhttp = null;

function swiftLogin() {

var usr = $(‘input:text[name=usr]’).val();

var key = $(‘input:text[name=key]’).val();

// code for IE7+, Firefox, Chrome, Opera, Safari (UR SOL IE<7)

xmlhttp = new XMLHttpRequest();

xmlhttp.onreadystatechange=function() //callback

{

if (xmlhttp.readyState==2)

{

$(‘#status’).replaceWith(xmlhttp.getResponseHeader(“X-Auth-Token”));

}

}

xmlhttp.open(‘GET’,’https://auth.api.rackspacecloud.com/v1.0&#8242;, true);

xmlhttp.setRequestHeader(‘Host’, ‘auth.api.rackspacecloud.com’);

xmlhttp.setRequestHeader(‘X-Auth-User’, usr);

xmlhttp.setRequestHeader(‘X-Auth-Key’, key);

xmlhttp.send();

}

</script>

</head>

<body>

<div id=”credentials”>

<fieldset id=”credentials” class=””>

<legend>Swift Login</legend>

<label for=”user”>User: </label><input type=”text” name=”usr” value=”user” id=”user”>

<label for=”key”>Key: </label><input type=”text” name=”key” value=”key” id=”key”>

<input type=”button” name=”Login” value=”login” id=”Login” onclick=”swiftLogin();”>

</fieldset>

</div>

<div id=”status”>[pending]</div>

<div id=”footer”>Time?</div>

<script type=”text/javascript”>

$(‘#footer’).replaceWith((new Date).toString());

swiftLogin();

</script>

</body>

</html>

OpenStack videos peek into cloud shakers

Barton George (Dell’s cloud evangalist and cloud shouter) has posted videos from the OpenStack conference last week:

OpenStack Day 2 Aspiration: Dreaming & Breathing

Between partnering meetings, I bounced through biz and tech sessions during Day 2 of the OpenStack conference (day 1 notes).   After my impression summary, I’m including some succinct impressions, pictures, and copies of presentations by my Dell team-mates Greg Althaus & Brent Douglas.

Clouds on the road to Bexar
My overwhelming impression is a healthy tension between aspirational* and practical discussions.  The community appetite for big broad and bodacious features is understandably high: cloud seems on track as a solution for IT problems but there are is still an impedance mismatch between current apps and cloud capabilities.
As service providers ASPire to address these issues, some OpenStack blue print discussions tended to digress towards more forward-looking or long-term designs.  However, watching the crowd, there was also a quietly heads down and pragmatic audience ready to act and implement.  For this action focused group, delivering working a cloud was the top priority.  The Rackers and Nebuliziers have product to deploy and will not be distracted from the immediate concerns of living, breathing shippable code.
I find the tension between dreaming aspiration (cloud futures) and breathing aspiration (cloud delivery) necessary to the vitality of OpenStack.
[Day 3 update, these coders are holding the floor.  People who are coding have moved into the front seats of the fishbowl and the process is working very nicely.]
Specific Comments (sorry, not linking everything):
  • Cloud networking is a mess and there is substantial opportunity for innovation here.  Nicira was making an impression talking about how Open vSwitch and OpenFlow could address this at the edge switches.  interesting,  but messy.
  • I was happy with our (Dell’s) presentations: real clouds today (Bexas111010DataCenterChanges) and what to deploy on (Bexar111010OpenStackOnDCS).
  • SheepDog was presented as a way to handle block storage.  Not an iSCSI solution, works directly w/ KVM.  Strikes me as too limiting – I’d rather see just using iSCSI.  We talked about GlusterFS or Ceph (NewDream).  This area needs a lot of work to catch up with Amazon EBS.  Unfortunately, persisting data on VM “local” disks is still the dominate paradigm.
  • Discussions about how to scale drifted towards aspirational.
  • Scalr did a side presentation about automating failover.
  • Discussion about migration from Eucalyptus to OpenStack got side tracked with aspirations for a “hot” migration.  Ultimately, the differences between network was a problem.  The practical issue is discovering the meta data – host info not entirely available from the API.
  • Talked about an API for cloud networking.  This blue print was heavily attended and messy.  The possible network topologies present too many challenges to describe easily.  Fundamentally, there seems consensus that the API should have a very very simple concept of connecting VM end points to a logical segment.  That approach leverages the accepted (but out dated) VLAN semantic, but implementation will have to be topology aware.  ouch!
  • Day 3 topic Live migration: Big crowd arguing with bated breath about this.  The summary “show us how to do it without shared storage THEN we’ll talk about the API.”
Executive Tweet:  #OpenStack getting to down business.  Big dreams.  Real problems.  Delivering Code.
 
Note: I nominate Aspirational for 2010 buzzword of the year.

Greg PresentingBig Crowd on Day 1