Hadoop Crowbar released to open source! (plus AN HOUR of videos!)

I’m proud to announce that my team at Dell has open sourced our Apache Hadoop barclamps!  This release follows our Dell | Cloudera Hadoop Solution open source commitment from Hadoop World earlier this month.

As part of this release, we’ve created nearly AN HOUR of video content showing the Hadoop Barclamps in action, installing Crowbar (on CentOS), building Crowbar ISOs in the cloud and specialized developer focused builds.

If you want to talk to the Crowbar team.  We’re attending events in Boston 11/29, Seattle 11/30, and Austin 12/8.

Here are links to the videos:

More Hadoop perspectives from Dell:  Joseph George on what it means and  Barton George‘s backgrounder about barclamps.

Seattle meetup on 11/30 (will bring massive laptops for OpenStack, Hadoop & Crowbar demos)

After Greg Althaus and I are done attending the sold out Opcode Community Summit (11/29-30), Opscode has offered to let us have an informal meetup at Opscode HQ from 6:30 to 8pm on 11/30.  I’ve proposed this as an official Seattle OpenStack Meetup (waiting on confirmation from @heckj).

We’re not limiting the agenda to OpenStack!  We’ll happily talk about Hadoop, Crowbar, Opscode or any other cloud technology that’s on your mind.  For 90 minutes, we’re offering Cloud Geeking as a Service (CGaaS).

Not in Seattle?  Never fear!  You can hook up with other members of my team at Dell in Boston on 11/29 & Austin 12/8.

Greg Althaus at 11/15 Austin Cloud User Group meeting (annotated 90 min audio)

Greg Althaus did a 90 minute Crowbar deep dive at this week’s Austin Cloud User Group.  Brad Knowles recorded audio and posted it so I thought I’d share the link and my annotations.  There are a lot more times to catch up with our team at Dell in Austin, Boston, and Seattle.

Video Annotations –  (##:## time stamp)

  • 00:00 Intros & Meeting Management
  • 12:00 Joseph George Introduction / Sponsorship
  • 16:00 Greg Starts – why Crowbar
  • 19:00 DevOps slides
  • 21:00 What does Crowbar do for DevOps
    • make it easier to manage
    • make it easier to repeat
  • 24:00 What’s included – how we grow / where to start
  • 27:20 Starting to show crowbar – what’s included as barclamps
    • pluggable / configuration
    • Barclamps!
  • 28:10 What is a barclamp
    • discussion about the barclamps in the base
  • 34:30: We ❤ Chef. Puppet vs Chef
  • 36:00 Why barclamps are more than cookbooks
  • 36:30 State machine & transitions
  • Q&A Section
    • 38:50 Reference Architectures
    • 43:00 Barclamps work outside of Crowbar?
    • 44:15 Hardware models supported
    • 47:30 Storage Queston
    • 49:00 HA progress
    • 53:00 Ceph as a distributed cloud on all nodes
    • 56:20 Deployer has a map of how to give out roles
  • 58:00 Demo Fails
  • 58:30 Crowbar Architecture
  • 62:00 How Crowbar can be extended
  • 63:00 Workflow & Proposals
  • 65:40 Meta Barclamps
  • 71:10 Chef Environments
  • 73:40 Taking OpenStack releases and Environments
  • 75:00 The case for remove recipes
  • 77:33 Git Hub Tour
  • 79:00 Network Barclamp deep dive
  • 84:00 Adding switch config (roadmap topic)
  • 86:30 Conduits
  • 87:40 Barclamp Extensions / Services
  • 89:00 Questions
    • 89:20 Proposal operations
    • 93:30 OpenStack Readiness & Crowbar Design Approach
    • 93:10 Network Teaming
    • 94:30 Which OS & Hypervisors
    • 96:30 Continuous Integration & Tools
    • 98:40 BDD (“cucumberesque”) & Testing
    • 99:40 Build approach & barclamp construction
  • 100:00 Wrap up by Joseph

Crowbar community support and 111111 sprint plan

The Dell Crowbar team is working to improve road map transparency. In the last few weeks, the Crowbar community has become more active on our lists, testing builds, and helping with documentation.

We love the engagement and continue to make supporting the list a priority.

Participation in Crowbar, OpenStack and Hadoop has been exceeding our expectations and we’re working to implement more community support and process. Thank you!!!

Our next steps:

  1. I’ve committed to post sprint plans and summary pages (this is the first)
  2. New Crowbar Twitter account
  3. I’m going to setup feature voting on the Crowbar Facebook page (like to vote)
  4. Continue to work the listserv and videos. We need help converting those to documentation on the crowbar wiki.
  5. Formalize collaborator agreements – we’re working with legal on this
  6. Exploring the option of a barclamp certification program and Crowbar support
  7. Moving to a gated trunk model for internal commits to improve quality
  8. Implementing a continuous integration system that includes core and barclamps. This will be part of our open source components.

We are working towards the 1.2 release (Beta 1) . That release is focused on supporting OpenStack but includes enhancements for upgrades, Hadoop, and additional OS support.

Our Sprint 111111 plan.

Source: Crowbar Wiki: [[sprint 111111]]

  • Theme: OpenStack Diablo Final release candidate.
  • Core Work: Refine Deployment for Nova, Glance, Nova Dashboard (horizon), keystone, swift
  • New additions: mySQL barclamp, Nova HA networking, kong
  • Crowbar internals: expose error states for proposals, allow packages to be included with barclamps to make upgrades easier, barclamp group pages
  • Operating system: added CentOS
  • Documentation: we’ve split the user guides into distinct books so Crowbar, OpenStack, and Hadoop each have their own user guide.
  • Pending action: expose the Hadoop barclamps
  • OS note: OpenStack is being tested (at Dell) against Ubuntu 10.10 only. Hadoop was tested against RHEL 5.7 and we expect it to work against CentOS also.

Dell is open sourcing Crowbar Apache Hadoop barclamps!

I’m very excited to announce that my team at Dell will be open sourcing our Apache Hadoop Crowbar barclamps by the end of the month.

This release raises the bar on open Hadoop deployments by making them faster, scalable, more integrated and repeatable.

These barclamps were developed in conjunction with our licensed Dell | Cloudera Solution. The licensed solution is for customers seeking large scale and professionally supported big data solutions. The purpose of the open barclamps (which pull the open source parts from the Cloudera distro) is to help you get started with Hadoop and reduce your learning curve. Our team invested significant testing effort in ensuring that these barclamps work smoothly because they are the foundational layer of our for-pay Hadoop solution.

Included in the Hadoop barclamp suite are Hadoop Map Reduce, Hive, Pig, ZooKeeper and Sqoop running on RHEL 5.7. These barclamps cover the core parts of the Hadoop suite. Like other Crowbar deployments (see OpenStack), the barclamps automatically discover the service configurations and interoperate. One of our team members (call him Scott Jensen) said it very simply “I can deploy a fully an integrated Hadoop cluster in a few hours. That friggin’ rocks!” I just can’t put it more eloquently than that!

I’ll post again when we flip the “open” bit and invite our community to dig in and help us continue to set the standards on open Hadoop deployments.

For more perspectives on this release, check out posts by Barton George (just for devs), Joseph George (About Hadoop) and Aurelian Dumitru

Barton posted these two videos of me talking about the release too:

Hadoop & Crowbar:

Dev’s Only Short:

The 451 Group Cloudscape report strikes chord misses harmony (DevOps, Hybrid Cloud, Orchestration)

It’s impossible to resist posting about this month’s  451 Group Cloudscape report when it calls me out by name as a leading cloud innovator:

… ProTier founders Dave McCrory and Rob Hirschfeld. ProTier [note: now part of Quest] was, indeed, the first VMware ecosystem vendor to be tracked by The 451 Group. In the face of a skeptical world, these entrepreneurs argued that virtualization needed automation in order to realize its full potential, and that the test lab was the low-hanging fruit. Subsequent events have more than vindicated their view (pg. 33).

It’s even better when the report is worth reading and offers insights into forces shaping the industry.  It’s nice to be “more than vindicated” on an amazing journey we started over 10 years ago!

Rather than recite 451’s points (hybrid cloud = automation + orchestration + devops + pixie dust), I’d rather look at the problem different way as a counterpoint.

The problem is “how do we deal with applications that are scattered over multiple data centers?”

I do not think orchestration is the complete answer.  Current orchestration is too focused on moving around virtual machines (aka workloads).

Ultimately, the solution lies in application architecture; however, I feel that is also a misdirection because cloud is redefining what an “application architecture” means.

Applications are a dynamic mix of compute, storage, and connectivity.

We’re entering an age when all of these ingredients will be delivered as elastic services that will be managed by the applications themselves.  The concept of self management is an extension of DevOps principles that fuse application function and deployment. There are missing pieces, but I’m seeing the innovation moving to fill those gaps.

If you want to see the future of cloud applications then look at the network and storage services that are emerging.  They will tell you more about the future than orchestration.

 

OpenStack: Five Challenges & Conference Observations

I was part of the Dell contingent at the OpenStack design conference earlier in the month.  The conference opened a new chapter for the project because the number of contributing companies reached critical mass.  That means that the core committers are no longer employed by just one or two entities; instead, there are more moneyed interests rubbing elbows and figuring out how to collaborate.

From my perspective (from interview with @Cote ), this changed the tone of the conference from more future looking to pragmatic.

That does not mean that everything is sunshine and rainbows for OpenStack clouds, there are real issues to be resolved.  IMHO, the top issues for OpenStack are:

  1. API implementation vs specification
  2. Building up coverage on continuous integration
  3. Ensuring that we can deploy consistently in multi-node systems
  4. Getting contributions from new members
  5. Figuring out which projects are core, satellite, missing or junk.  [xref 2014 DefCore]

Of these issues, I’ve been reconsidering my position favoring API via Implementation over specification (past position).  This has been a subject of debate on my team at Dell and I like Greg Althaus’ succinct articulation of the problem with implementation driven API: “it is not fair.”  This also ends up being a branding issues for OpenStack because governance needs to figure out which is a “real” OpenStack cloud deployment that can use the brand.  Does it have to be 100% of the source?  What about extensions?  What if it uses the API with an alternate implementation?

Of the other issues, most are related to maturity.  I think #2 needs pressure by and commitment from the larger players (Dell very much included).  Crowbar and the deployment blueprint is our answer #3.  Shouting the “don’t fork it up” chorus from the roof tops addresses contributions while #5 will require some strong governance and inevitably create some hurt feelings.

Addressing OpenStack API equality: rethinking API via implementation over specification

This post is an extension of my post about OpenStack’s top 5 challenges from my perspective working on Dell’s OpenStack team.  This issue has been the subject of much public debate on the OpenStack lists, at the conference and online (Wilamuna & Shuttleworth).  So far, I have held back from the community discussion because I think both sides are being well represented by active contributors to trunk.

I have been, and remain, convinced that a key to OpenStack success as a cloud API is having a proven scale implementation; however, we need to find a compromise between implementation and specification.

The community needs code that specifies an API and offers an implementation backing that API.  OpenStack has been building an API by implementing the backing functionality (instead of vice versa).  This ordering is important because the best APIs are based on doing real work not by trying to anticipate potential needs.  I’m also a fan of implementation driven API because it emerges quickly.  That does not mean it’s all wild west and cowboy coding.  OpenStack has the concept of API extensions; consequently, its possible to offer new services in a safe way allowing new features to literally surface overnight.

Unfortunately, API solely via implementation risks being fragile, unpredictable, and unfair. 

It is fragile because the scope is reflected by what the implementors choose to support.  They often either expose internals or restrict capabilities in ways that a leading with design specification would not.  This same challenge makes them less predictable because the API may change as the implementation progresses or become locked into exposing internals when downstream consumers depend on exposed functions.

I am ready to forgive fragile and unpredictable, but unfair may prove to be expensive for the OpenStack community.

Here’s the problem: there is more than on right way to solve a problem.  The benefit of an API is that it allows multiple implementations to emerge and prove their benefits.  We’ve seen many cases where, even with a weak API, a later implementation is the one that carries the standard (I’m thinking browsers here).  When your API is too bound to implementation it locks out alternatives that expand your market.  It makes it unfair to innovators who come to the party in the second wave.

I think that it is possible to find a COMPROMISE between API implementation and specification; however, it takes a lot of maturity to make that work.  First, it requires the implementors to slow down and write definitions outside of their implementation.  Second, it requires implementors to emotionally decouple from their code and accept other API implementations.  Of course, a dose of test driven design (TDD) and continuous integration (CI) discipline does not hurt either.

I’m interested in hearing more opinions about this… come to the 10/27 OpenStack meetup to discuss and/or please comment!

Austin OpenStack Cloud Meetup: Thursday 10/27 6:30 PM at TechRanch Austin

OpenStack Enthusiasts, you are OFFICIALLY INVITED to Austin’s first post-Diablo OpenStack community event.

Dell is sponsoring an Austin OpenStack Meet Up help connect the Austin community around OpenStack and open source clouds!

Link: http://www.meetup.com/OpenStack-Austin/events/37908242/

We’ve got members of the Rackspace Cloud Builders Training team in town and Dell’s own Crowbar team attending.  We’re planning to do OpenStack demos and talk about the project in detail – and we’ll have plenty of pizza and sodas to keep the cloud juices flowing.

This is a great way to learn about the OpenStack cloud project and meet other people who are developing/deploying the hottest open source cloud around.

PLEASE SPREAD THE WORD – we’re trying to make this inaugural OpenStack meetup a big success!

See you there,

Joseph @jbgeorge George & Rob @Zehicle Hirschfeld

Dell Crowbar Project: Open Source Cloud Deployer expands into the Community

Note: Cross posted on Dell Tech Center Blogs.

Background: Crowbar is an open source cloud deployment framework originally developed by Dell to support our OpenStack and Hadoop powered solutions.  Recently, it’s scope has increased to include a DevOps operations model and other deployments for additional cloud applications.

It’s only been a matter of months since we open sourced the Dell Crowbar Project at OSCON in June 2011; however, the progress and response to the project has been over whelming.  Crowbar is transforming into a community tool that is hardware, operating system, and application agnostic.  With that in mind, it’s time for me to provide a recap of Crowbar for those just learning about the project.

Crowbar started out simply as an installer for the “Dell OpenStack™-Powered Cloud Solution” with the objective of deploying a cloud from unboxed servers to a completely functioning system in under four hours.  That meant doing all the BIOS, RAID, Operations services (DNS, NTP, DHCP, etc.), networking, O/S installs and system configuration required creating a complete cloud infrastructure.  It was a big job, but one that we’d been piecing together on earlier cloud installation projects.  A key part of the project involved collaborating with Opscode Chef Server on the many system configuration tasks.  Ultimately, we met and exceeded the target with a complete OpenStack install in less than two hours.

In the process of delivering Crowbar as an installer, we realized that Chef, and tools like it, were part of a larger cloud movement known as DevOps.

The DevOps approach to deployment builds up systems in a layered model rather than using packaged images.  This layered model means that parts of the system are relatively independent and highly flexible.  Users can choose which components of the system they want to deploy and where to place those components.  For example, Crowbar deploys Nagios by default, but users can disable that component in favor of their own monitoring system.  It also allows for new components to identify that Nagios is available and automatically register themselves as clients and setup application specific profiles.  In this way, Crowbar’s use of a DevOps layered deployment model provides flexibility for BOTH modularized and integrated cloud deployments.

We believe that operations that embrace layered deployments are essential for success because they allow our customers to respond to the accelerating pace of change.  We call this model for cloud data centers “CloudOps.”

Based on the flexibility of Crowbar, our team decided to use it as the deployment model for our Apache™ Hadoop™ project (“Dell | Apache Hadoop Solution”).  While a good fit, adding Hadoop required expanding Crowbar in several critical ways.

  1. We had to make major changes in our installation and build processes to accommodate multi-operating system support (RHEL 5.6 and Ubuntu 10.10 as of Oct 2011).
  2. We introduced a modularization concept that we call “barclamps” that package individual layers of the deployment infrastructure.  These barclamps reach from the lowest system levels (IPMI, BIOS, and RAID) to the highest (OpenStack and Hadoop).

Barclamps are a very significant architecture pattern for Crowbar:

  1. They allow other applications to plug into the framework and leverage other barclamps in the solution.  For example, VMware created a Cloud Foundry barclamp and Dream Host has created a Ceph barclamp.  Both barclamps are examples of applications that can leverage Crowbar for a repeatable and predictable cloud deployment.
  2. They are independent modules with their own life cycle.  Each one has its own code repository and can be imported into a live system after initial deployment.  This allows customers to expand and manage their system after initial deployment.
  3. They have many components such as Chef Cookbooks, custom UI for configuration, dependency graphs, and even localization support.
  4. They offer services that other barclamps can consume.  The Network barclamp delivers many essential services for bootstrapping clouds including IP allocation, NIC teaming, and node VLAN configuration.
  5. They can provide extensible logic to evaluate a system and make deployment recommendations.  So far, no barclamps have implemented more than the most basic proposals; however, they have the potential for much richer analysis.

Making these changes was a substantial investment by Dell, but it greatly expands the community’s ability to participate in Crowbar development.  We believe these changes were essential to our team’s core values of open and collaborative development.

Most recently, our team moved Crowbar development into the open.  This change was reflected in our work on OpenStack Diablo (+ Keystone and Dashboard) with contributions by Opscode and Rackspace Cloud Builders.  Rather than work internally and push updates at milestones, we are now coding directly from the Crowbar repositories on Github.  It is important to note that for licensing reasons, Dell has not open sourced the optional BIOS and RAID barclamps.  This level of openness better positions us to collaborate with the crowbar community.

For a young project, we’re very proud of the progress that we’ve made with Crowbar.  We are starting a new chapter that brings new challenges such as expanding community involvement, roadmap transparency, and growing Dell support capabilities.  You will also begin to see optional barclamps that interact with proprietary and licensed hardware and software.  All of these changes are part of growing Crowbar in framework that can support a vibrant and rich ecosystem.

We are doing everything we can to make it easy to become part of the Crowbar community.  Please join our mailing list, download the open source code or ISO, create a barclamp, and make your voice heard.  Since Dell is funding the core development on this project, contacting your Dell salesperson and telling them how much you appreciate our efforts goes a long way too.