Crowbar lays it all out: RAID & BIOS configs officially open sourced

MediaToday, Dell (my employer) announced a plethora of updates to our open source derived solutions (OpenStack and Hadoop). These solutions include the latest bits (Grizzly and Cloudera) for each project. And there’s another important notice for people tracking the Crowbar project: we’ve opened the remainder of its provisioning capability.

Yes, you can now build the open version of Crowbar and it has the code to configure a bare metal server.

Let me be very specific about this… my team at Dell tests Crowbar on a limited set of hardware configurations. Specifically, Dell server versions R720 + R720XD (using WSMAN and iIDRAC) and C6220 + C8000 (using open tools). Even on those servers, we have a limited RAID and NIC matrix; consequently, we are not positioned to duplicate other field configurations in our lab. So, while we’re excited to work with the community, caveat emptor open source.

Another thing about RAID and BIOS is that it’s REALLY HARD to get right. I know this because our team spends a lot of time testing and tweaking these, now open, parts of Crowbar. I’ve learned that doing hard things creates value; however, it’s also means that contributors to these barclamps need to be prepared to get some silicon under their fingernails.

I’m proud that we’ve reached this critical milestone and I hope that it encourages you to play along.

PS: It’s worth noting is that community activity on Crowbar has really increased. I’m excited to see all the excitement.

Update a pull request from git command line

Hand up

Sometimes we just need to feed the SEO gods… in this case, I could not find the simple git command line to update a pull request that I had in flight.

I was looking for the following:

git push -f personal HEAD:[pull branch]

Github.com happily gave me instructions from the pull branch but not the CLI version of the command.  The trick is that you need to know your remote (git remote) for the command so it’s not perfectly generic.  In the example above, my personal repo is named “personal”.

Deconstructing the command: you are pushing your to your personal clone from the local HEAD commit against the branch created for the pull request.  That’s because the pull request creates a branch from your clone to be pulled into the upstream repo.  That’s why it’s a PULL request not a push.

Ultimately, this is pretty basic git.  My experience with git is that the definition of “pretty basic” is a binary function.  Once you know how git works, everything in git is pretty basic.  Until then it’s completely opaque.

Side note: this is my 301st post on this blog!

8/1/2013 Post Script from Crowbar Contributor Adam Spiers

He noticed that I should include the -f in the git push instruction.  Read more at about that on his blog.

In scale-out infrastructure, tools & automation matter

WiseScale out platforms like Hadoop have different operating rules.  I heard an interesting story today in which the performance of the overall system was improved 300% (run went from 15 mins down to 5 mins) by the removal of a node.

In a distributed system that coordinates work between multiple nodes, it only takes one bad node to dramatically impact the overall performance of the entire system.

Finding and correcting this type of failure can be difficult.  While natural variability, hardware faults or bugs cause some issues, the human element is by far the most likely cause.   If you can turn down noise injected by human error then you’ve got a chance to find the real system related issues.

Consequently, I’ve found that management tooling and automation are essential for success.  Management tools help diagnose the cause of the issue and automation creates repeatable configurations that reduce the risk of human injected variability.

I’d also like to give a shout out to benchmarks as part of your tooling suite.  Without having a reasonable benchmark it would be impossible to actually know that your changes improved performance.

Teaming Related Post Script: In considering the concept of system performance, I realized that distributed human systems (aka teams) have a very similar characteristic.  A single person can have a disproportionate impact on overall team performance.

my lean & open source reading list – recommendations welcome!

Cube Seat

I think it’s worth pulling together a list of essential books that I think should be required reading for people on Lean & open source teams (like mine):

  • Basis for the team values that we practice: The Five Dysfunctions of a Team: A Leadership Fable Patrick Lencioni (amazon)
  • This is a foundational classic for team building:  Peopleware: Productive Projects and Teams (Second Edition) Tom DeMarco (amazon)
  • This novel is good primer for lean and devops The Phoenix Project: A Novel About IT, DevOps, and Helping Your Business Win Gene Kim and George Spafford & Kevin Behr (amazon)
  • Business Focus on Lean: The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses Eric Ries (amazon)
  • Foundational (and easy) reading about Lean: The Goal: A Process of Ongoing Improvement Eliyahu M. Goldratt (amazon)
  • One of my favorites on Lean / Agile: Implementing Lean Software Development: From Concept to Cash Mary Poppendieck (amazon)
  • Should be required reading for open source (as close to “Open Source for Dummies as you can get): The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary Eric S. Raymond (amazon)
  • Culture Change Liquid Leadership: From Woodstock to Wikipedia–Multigenerational Management Ideas That Are Changing the Way We Run Things Brad Szollose (amazon)
  • More Team Building – this one is INTERACTIVE! http://www.strengthsfinder.com/home.aspx

There are some notable additions, but I think this is enough for now.  I’m always looking for recommendations!  Please post your favorites in the comments!

Connecting the dots: Dell stays course on OpenStack private

rob pdx drivingWhen it comes to OpenStack, I don’t just work for Dell: I’m the technical lead for our OpenStack-powered private Cloud Solution and an elected director to the OpenStack Foundation board.

Frankly, the announcement of our change in public cloud strategy overshadowed our increasing level of investment in OpenStack-powered private cloud solutions (we are hiring!).  Sam Greenblatt, Dell Product Group VP and Chief Architect, is very specific that the recent announcements are about increasing investment where Dell is already successful plus accelerating with new features (such as leadership in HyperV enablement).

The fact that we focused on our decision to pivot away from Dell hosted public cloud distracted from the strategic choices that we’ve been making.  In the lean process that we use, pivots are a positive sign of listening and self-honesty.  Sadly, that distraction led to confusion, misleading comments, and implications that Dell was dropping OpenStack or questioning OpenStack sustainability and market success.

For the record, Dell was one of the first companies to support OpenStack with supporting quotes from Forrest Norrod (Dell GM for Servers and my direct boss) way back  in July 2010.  Our private OpenStack based cloud, built on open source Crowbar, was the first to market 2 years ago (deploying Cactus!).  We’ve been investing steadily in both fundamental improvements to OpenStack deployment and being early supporting the Grizzly release.

I am not implying that OpenStack’s future is certain (we have a lot of work to do) or that Dell OpenStack strategy will not change again; however, I know first-hand that both are on much firmer footing than some reports have implied.

Crowbar cuts OpenStack Grizzly (“pebbles”) branch & seeks community testing

Pebbles CutThe Crowbar team (I work for Dell) continues to drive towards “zero day” deployment readiness. Our Hadoop deployments are tracking Dell | Cloudera Hadoop-powered releases within a month and our OpenStack releases harden within three months.

During the OpenStack summit, we cut our Grizzly branch (aka “pebbles”) and switched over to the release packages. Just a reminder, we basically skipped Folsom. While we’re still tuning out issues on OpenStack Networking (OVS+GRE) setup, we’re also looking for community to start testing and tuning the Chef deployment recipes.

We’re just sprints from release; consequently, it’s time for the Crowbar/OpenStack community to come and play! You can learn Grizzly and help tune the open source Ops scripts.

While the Crowbar team has been generating a lot of noise around our Crowbar 2.0 work, we have not neglected progress on OpenStack Grizzly.  We’ve been building Grizzly deploys on the 1.x code base using pull-from-source to ensure that we’d be ready for the release. For continuity, these same cookbooks will be the foundation of our CB2 deployment.

Features of Crowbar’s OpenStack Grizzly Deployments

  • We’ve had Nova Compute, Glance Image, Keystone Identity, Horizon Dashboard, Swift Object and Tempest for a long time. Those, of course, have been updated to Grizzly.
  • Added Block Storage
    • importable Ceph Barclamp & OpenStack Block Plug-in
    • Equalogic OpenStack Block Plug-in
  • Added Quantum OpenStack Network Barclamp
    • Uses OVS + GRE for deployment
  • 10 GB networking configuration
  • Rabbit MQ as its own barclamp
  • Swift Object Barclamps made a lot of progress in Folsom that translates to Grizzly
    • Apache Web Service
    • Rack awareness
    • HA configuration
    • Distribution Report
  • “Under the covers” improvements for Crowbar 1.x
    • Substantial improvements in how we configure host networking
    • Numerous bug fixes and tweaks
  • Pull from Source via the Git barclamp
    • Grizzly branch was switched to use Ubuntu & SUSE packages

We’ve made substantial progress, but there are still gaps. We do not have upgrade paths from Essex or Folsom. While we’ve been adding fault-tolerance features, full automatic HA deployments are not included.

Please build your own Crowbar ISO or check our new SoureForge download site then join the Crowbar List and IRC to collaborate with us on OpenStack (or Hadoop or Crowbar 2). Together, we will make this awesome.

Thanks! I’m enjoying my conversation with you

I write because I love to tell stories and to think about how actions we take today will impact tomorrow.  Ultimately, everything here is about a dialog with you because you are my sounding board and my critic.  I appreciate when people engage me about posts here and extend the conversation into other dimensions.  Feel free to call me on points and question my position – that’s what this is all about.

Thank you for being at part of my blog and joining in.  I’m looking forward to hearing more from you.

During the OpenStack Summit, I got to lead and participate in some excellent presentations and panels.  While my theme for this summit was interoperability, there are many other items discussed.

I hope you enjoy them.

Did one of these topics stand out?  Is there something I missed?  Please let me know!

Parable of Lions and Elephants

ElephantThere was once a family with two children: Barney and Bailum.  Both wanted wanted to start a circus and did exactly that when they came into their inheritance.  Being highly competitive, they each wanted to have the greatest show the world has ever seen.

Always ambitious, Barney wanted to start big and decided to start with elephants.  To have an elephant act, Barney has a lot of planning to do.  Even before acquiring the actual elephants, he had to get permits, hire handlers, arrange transport and arrange special feeding.  He really had to get busy and make some plans even before he could start on the tusks of selling tickets.

Bailum, more humble, decided to start by training some stray dogs into an animal act.  While not nearly as exciting as elephants, she was able to procure dogs immediately and start training them.  Instead of having to host her own shows, she was able to bring the dogs into other people’s shows.  That let her gain critical experience, get a reputation and even have positive cash flow.

Barney was merciless about Bailum’s flea bag circus.  Barney was 100% confident that his vision of a grand circus was the right plan because that’s what he saw from going to other shows.  Based on her behind the scenes experience, Bailum was starting to learn that running a circus was a lot more than the animal shows.  Some of those tasks, like booking venues, selling ads and clown discipline, made cleaning up after elephants look like a circus highlight.

As time went on, Bailum extended her expertise with dogs into lions, horses and even bipedal simians.  Her business was thriving as a specialist for other circuses to such an extent that she abandoned adjusted her original ringmaster vision and embraced a new plan as an animal specialist.  Based on her discussions with her circus partners, her limited scope as a lion trainer was more profitable than their lives in the spotlight.

Meanwhile, Barney was still working out the issues with his elephants.  It seemed like every time he turned around there was a new complication.  After spending every penny on getting his glorious African pachyderms he discovered that his cages were sized for Indian elephants (which are smaller).  Out of money and unable to operate, Barney had to abandon his vision and go back to clown school.

It’s hard to eat an elephant, but if you start with something you can handle then you can learn to tame lions.

OpenStack steps toward Interopability with Temptest, RAs & RefStack.org

Pipes are interoperableI’m a cautious supporter of OpenStack leading with implementation (over API specification); however, it clearly has risks. OpenStack has the benefit of many live sites operating at significant scale. The short term cost is that those sites were not fully interoperable (progress is being made!). Even if they were, we are lack the means to validate that they are.

The interoperability challenge was a major theme of the Havana Summit in Portland last week (panel I moderated) .  Solving it creates significant benefits for the OpenStack community.  These benefits have significant financial opportunities for the OpenStack ecosystem.

This is a journey that we are on together – it’s not a deliverable from a single company or a release that we will complete and move on.

There were several themes that Monty and I presented during Heat for Reference Architectures (slides).  It’s pretty obvious that interop is valuable (I discuss why you should care in this earlier post) and running a cloud means dealing with hardware, software and ops in equal measures.  We also identified lots of important items like Open OperationsUpstreamingReference Architecture/Implementation and Testing.

During the session, I think we did a good job stating how we can use Heat for an RA to make incremental steps.   and I had a session about upgrade (slides).

Even with all this progress, Testing for interoperability was one of the largest gaps.

The challenge is not if we should test, but how to create a set of tests that everyone will accept as adequate.  Approach that goal with standardization or specification objective is likely an impossible challenge.

Joshua McKenty & Monty Taylor found a starting point for interoperability FITS testing: “let’s use the Tempest tests we’ve got.”

We should question the assumption that faithful implementation test specifications (FITS) for interoperability are only useful with a matching specification and significant API coverage.  Any level of coverage provides useful information and, more importantly, visibility accelerates contributions to the test base.

I can speak from experience that this approach has merit.  The Crowbar team at Dell has been including OpenStack Tempest as part of our reference deployment since Essex and it runs as part of our automated test infrastructure against every build.  This process does not catch every issue, but passing Tempest is a very good indication that you’ve got the a workable OpenStack deployment.

Crowbar and our Pivot (or, how we slipped and shipped Grizzly)

Crowbar Grizzly PostMy team at Dell uses Lean process because it forces us to be honest about making hard choices. Our recent decision to pivot back to Crowbar 1.x for the OpenStack Grizzly release is a great example how the pivot process works.

4/24 note: I have a longer post and ISO for Grizzly on Crowbar waiting until we enter QA. The Crowbar community is already very active around this work and you’re encouraged to join.

Like any refactor, there was schedule risk when we started the Crowbar 2.x release. To mitigate this risk, we made two critical choices. First, we choose to advance the OpenStack barclamps on the 1.x code base in parallel with the 2.x work. Second, we chose a pivot date for the team to choose releasing Grizzly on the 1.x or 2.x trunks.

Choosing to jump back to 1.x was one of the hardest choices I’ve made in my career. I’m proud that we had the foresight to keep that as an option and prouder that our team rallied to make it happen.

I acknowledge that 1.x has gaps; however, getting Grizzly into the field for PoCs and pilots with 1.x provide substantial benefits to the community.  That said, there are barclamps for HA deployments and other production features that are under development on the 1.x branch and will be available in the community.

The 2.x code base provides important features but we are building from on the 1.x deployment recipes. This means that development, testing and tuning applied to the Grizzly barclamps will translates directly into Crowbar 2.x field readiness. In fact, more completeness on OpenStack can dramatically simplify Crowbar 2.x testing efforts.  This is especially true on the OpenStack Networking (fka Quantum) barclamps because they are new work.

Delivering solutions is a balance between features, timing and field experience.  The Crowbar team’s preference is to collaborate with operators in the field and that means making workable software available quickly.

I hope that you’ll agree with our approach and help us make Grizzly the most deployable OpenStack yet.