We need an OpenStack Reference Deployment (My objectives for Deploy Day)

I’m overwhelmed and humbled by the enthusiasm my team at Dell is seeing for the OpenStack Essex Deploy day on 5/31 (or 6/1 for Asia). What started as a day for our engineers to hack on Essex Cookbooks with a few fellow Crowbarians has morphed into an international OpenStack event spanning Europe, Americas & Asia.

If you want to read more about the event, check out my event logistics post (link pending).

I do not apologize for my promotion of the Dell-lead open source Crowbar as the deployment tool for the OpenStack Essex Deploy. For a community to focus on improving deployment tooling, there must be a stable reference infrastructure. Crowbar provides a fast and repeatable multi-node environment with scriptable networking and packaging.

I believe that OpenStack benefits from a repeatable multi-node reference deployment. I’ll go further and state that this requires DevOps tooling to ensure consistency both within and between deployments.

DevStack makes trunk development more canonical between different developers. I hope that Crowbar will help provide a similar experience for operators so that we can truly share deployment experience and troubleshooting. I think it’s already realistic for Crowbar deployments to a repeatable enough deployment that they provide a reference for defect documentation and reproduction.

Said more plainly, it’s a good thing if a lot of us use OpenStack in the same way so that we can help each out.

My team’s choice to accelerate releasing the Crowbar barclamps for OpenStack Essex makes perfect sense if you accept our rationale for creating a community baseline deployment.

Crowbar is Dell-lead, not Dell specific.

One of the reasons that Crowbar is open source and we do our work in the open (yes, you can see our daily development in github) is make it safe for everyone to invest in a shared deployment strategy. We encourage and welcome community participation.

PS: I believe the same is true for any large scale software project. Watch out for similar activity around Apache Hadoop as part of our collaboration with Cloudera!

Join us 5/31 for a OpenStack Deploy Hack-a-thon (all-day, world-wide online & multi-city)

An OpenStack Deploy Hack-a-thon is like 3-liter bottle of distilled open source community love.  Do you want direct access to my Dell team of OpenStack/Crowbar/Hadoop engineers?  Are you just getting started and want training about OpenStack and DevOps?  This is the event for you!

Here’s the official overview:

The OpenStack Deploy hack-a-thon focuses on automation for deploying OpenStack Essex with Dell Crowbar and Opscode Chef. This is a day-long, world-wide event bringing together developers, operators, users, ecosystem vendors and the open source cloud curious. (read below: We are looking for global sites and leaders to extend the event hours!)

OpenStack is the fastest growing open source cloud infrastructure project with broad market adoption from major hardware and software vendors. Crowbar is an Apache 2 licensed, open infrastructure deployment tool and is one of the leading multi-node deployers for OpenStack and Hadoop.

Learn first-hand how OpenStack and Crowbar can make it easy to deploy and operate your own cloud environments.

The Deploy day will offer two individual parallel tracks with something for both experts and beginners:

  • Newbies n00bs will learn the basics of OpenStack, Crowbar and DevOps and how they can benefit your organization. We’ll also have time for ecosystem vendors to discuss how they are leveraging OpenStack.
  • Experts l33ts will take a deep dive into new features of OpenStack Essex and Crowbar, and learn how Crowbar works under the hood, which will enable them to extend the product using Crowbar Barclamps.
Note: If you’re a n00b but want l33t content, we’ll be offering online training materials and videos to help get you up to speed.

Why now? We’ve validated our OpenStack Essex deployment against the latest release bits from Ubuntu. Now it’s time to reach out to the OpenStack and Crowbar communities for training, testing and collaborative development.

Join the event!  We’re organizing information on the Crowbar wiki.  (I highly recommend you join the Crowbar list to get access to support for prep materials).  You can also reach out to me via the @DellCrowbar handle.

We’d love to get you up to speed on the basics and dive deep into the core.

Hungry for Operational Excellence? ChefConf 2012 satisfies!

Since my team at Dell sponsored the inaugural ChefConf, we had the good fortune to get a handful of passes and show up at the event in force.  I was also tapped for a presentation (Chef+Crowbar gets Physical+OpenStack Cloud) and Ignite session (Crowbar history).

I live demo’ed using a single command window with knife to manage both physical and cloud infrastructure.    That’s freaking cool!  (and thanks to Matt Ray for helping to get this working)

It’s no surprise that I’m already a DevOps advocate and Opscode enthusiast, there were aspects of the conference that are worth reiterating:

  • Opscode is part of the cadre of leaders redefining how we operate infrastructure.  The energy is amazing.
  • The acknowledgement of the “snowflake” challenge where all Ops environments are alike, but no two are the same.
  • A tight integration between Operations and lean delivery because waterfall deployments are not sustainable
  • Opscode’s vision is rooted in utility.  You can be successful without design and then excel when you add it.  I find that refreshing.
  • There was a fun, friendly (“hug driven development?!”) and laid back vibe.  This group laughed A LOT.
  • For a first conference, Opscode did a good job with logistics and organization.
  • I saw that the back rooms and hallways are buzzing with activity.  This means that people are making money with the technology.

Crowbar + Chef installs & manages OpenStack Essex (Live Demo, 45 minutes):

 

Ignite Talk about Dell Crowbar History (5 minutes)

Crowbar’s emergence as a DevOps enabled Cloud Provisioner

I’m going to be talking Crowbar & OpenStack at Chef Conf next week.  While I’m always excited to wave the Crowbar flag, it’s humbling to see our vision for an open source based cloud provisioner picking up momentum in the community.

Dieter Plaetinck

…I think this tool deserves more attention and should be added to your devops toolchain for the cloud (triple buzzword bonus!!!)…

Hosting News

…As an alternative to proprietary, licensed software models, Dell continues to see heavy customer interest in the OpenStack-Powered Cloud Solution, which integrates the OpenStack cloud operating system, cloud-optimized Dell PowerEdge C servers, the Dell-developed Crowbar software framework, and services…

Sys-Con Post by Mirantis

…Finally, there is Dell and Crowbar. Dell’s approach to riding the OpenStack wave is, perhaps, the most creative. Crowbar is neither a hardware appliance nor an enterprise version of OpenStack. It’s a configuration management tool built around OpSource’s Chef, designed specifically to deploy OpenStack on Dell servers. Crowbar effectively serves as the glue between the hardware and any distribution of OpenStack (and not only OpenStack)…

Robert Booth w/ Zenoss

Well if you care to only give them the best then introduce them to a set of tools that will drastically change the way they do business in a way they want to. Introduce them to Dell’s Crowbar and OpsCode Chef and you will make their job easier, faster and possibly put a stop to the finger pointing! No longer will they have to pull out the IT secret decoder ring to understand what the dev team put in the deployment docs.

Dell Team at the OpenStack Spring 2012 Summit

It’s OpenStack Summit time again for my team at Dell and there’s deployment in the air. It’s been an amazing journey from the first Austin summit to Folsom today. Since those first heady days, the party has gotten a lot more crowded, founding members have faded away, recruiters became enriched as employees changed email TLDs and buckets of code was delivered.

Throughout, Dell has stayed the course: our focus from day-one has been ensuring OpenStack can be deployed into production in a way that was true to the OpenStack mission of community collaboration and Apache-2-licensed open source.

We’ve delivered on the making OpenStack deployable vision by collaborating broadly on the OpenStack components of the open source Crowbar project. I believe that our vision for sustainable open operations based on DevOps principles is the most complete strategy for production cloud deployments.

We are at the Folsom Summit in force and we’re looking forward to discussions with the OpenStack community. Here are some of the ways to engage with us:

  • Demos
    • During the summit (M-W), we’ll have our Crowbar OpenStack Essex deployments running. We kicked off Essex development with a world-wide event in early March and we want more people to come and join in.
    • During the conference (W-F), we’ll be showing off application deployments using enStratus and Chef against our field proven Diablo release.
  • Speakers
    • Thursday 1:00pm, OpenStack Gains Momentum: Customers are Speaking Up by Kamesh Pemmaraju (Dell)
    • Friday 9:50am, Deploy Apps on OpenStack using Dashboard, Chef and enStratus by Rob Hirschfeld (Dell), Matt Ray (Opscode) and Keith Hudgins (enStratus).
    • Friday 11:30am, Expanding the Community Panel
      including Joseph George (Dell)
    • This fun round trip road trip from Rackspace & Dell HQs in Austin to the summit and home again promises to be an odyssey of inclusion. Dell OpenStack/Crowbar engineer Andi Abes (@a_abes). Follow @RoadstackRV to follow along as they return home and share their thoughts about the summit!
  • Parties
    • Monday 6pm Mirantis Welcome Party, co-sponsored with Dell, at Sens Restaurant (RSVP)
    • Tuesday 5pm “Demos & Drinks” Happy Hour, co-hosted by Dell, Mirantis, Morphlabs, Canonical at the Hyatt Regency Hospitality Room off the Atrium

My team has been in the field talking to customers and doing OpenStack deployments. We are proud to talk about it and our approach.

Mostly importantly, we want to collaborate with you on our Essex deployments using Crowbar.  Get on our list, download/build crowbar, run the “essex-hack” branch and start banging on the deploy.  Let’s work together to make this one rock solid Essex deploy.

Seven Cloud Success Criteria to consider before you pick a platform

From my desk at Dell, I have a unique perspective.   In addition to a constant stream of deep customer interactions about our many cloud solutions (even going back pre-OpenStack to Joyent & Eucalyptus), I have been an active advocate for OpenStack, involved in many discussions with and about CloudStack and regularly talk shop with Dell’s VIS Creator (our enterprise focused virtualization products) teams.  And, if you go back ten years to 2002, patented the concept of hybrid clouds with Dave McCrory.

Rather than offering opinions in the Cloud v. Cloud fray, I’m suggesting that cloud success means taking a system view.

Platform choice is only part of the decision: operational readiness, application types and organization culture are critical foundations before platform.

Over the last two years at Dell, I found seven points outweigh customers’ choice of platform.

  1. Running clouds requires building operational expertise both at the application and infrastructure layers.  CloudOps is real.
  2. Application architectures matter for cloud deployment because they can redefine the SLA requirements and API expectations
  3. Development community and collaboration is a significant value because sharing around open operations offers significant returns.
  4. We need to build an accelerating pace of innovation into our core operating principles
  5. There are still significant technology gaps to fill (networking & storage) and we will discover new gaps as we go
  6. We can no longer discuss public and private clouds as distinct concepts.   True hybrid clouds are not here yet, but everyone can already see their massive shadow.
  7. There is always more than one right technological answer.  Avoid analysis paralysis by making incrementally correct decisions (committing, moving forward, learning and then re-evaluating).

Open Source Cloud Bootstrapping Revisised

At the OpenStack last design conference, Greg Althaus and I presented about updates (presentation here) we were making to a Nov 2010 cloud architecture white paper.

The revised “Bootstrapping Open Source Clouds” white paper has been out for a few months so I thought it was past time to throw out a link.

I’m really pleased about this update because it reflects real world experience my team has working with customers and partners on OpenStack (and Hadoop) deployments.

Executive Summary
Bringing a cloud infrastructure online can be a daunting bootstrapping challenge. Before
hanging out a shingle as a private or public cloud service provider, you must select a platform,
acquire hardware, configure your network, set up operations services, and integrate it to work
well together. That is a lot of moving parts before you have even installed a sellable application.
This white paper walks you through the decision process to get started with an open source
cloud infrastructure based on OpenStack™ and Dell™ PowerEdge™ C servers. At the end, you’ll
be ready to design your own trial system that will serve as the foundation of your hyperscale
cloud.
2011 Revision Notes
In the year since the the original publication of this white paper, we worked with many
customers building OpenStack clouds. These clouds range in size from small six-node lab
systems to larger production deployments. Based on these experiences, we updated this white
paper to reflect lessons learned.

CloudOps white paper explains “cloud is always ready, never finished”

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.

Extending Chef’s reach: “Managed Nodes” for External Entities.

Note: this post is very technical and relates to detailed Chef design patterns used by Crowbar. I apologize in advance for the post’s opacity. Just unleash your inner DevOps geek and read on. I promise you’ll find some gems.

At the Opscode Community Summit, Dell’s primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that there is an instance of the chef-client agent that can act as a delegate for the external entity. We’ve been reluctant to call it a “proxy” because that term is so overloaded.

My Crowbar vision is to manage an end-to-end cloud application life-cycle. This starts from power and network connections to hardware RAID and BIOS then up to the services that are installed on the node and ultimately reaches up to applications installed in VMs on those nodes.

Our design goal is that you can control a managed node with the same Chef semantics that we already use. For example, adding a Network proposal role to the Switch managed node will force the agent to update its configuration during the next chef-client run. During the run, the managed node will see that the network proposal has several VLANs configured in its attributes. The node will then update the actual switch entity to match the attributes.

Design Considerations

There are five key aspects of our managed node design. They are configuration, discovery, location, relationships, and sequence. Let’s explore each in detail.

A managed node’s configuration is different than a service or actuator pattern. The core concept of a node in chef is that the node owns the configuration. You make changes to the nodes configuration and it’s the nodes job to manage its state to maintain that configuration. In a service pattern, the consumer manages specific requests directly. At the summit (with apologies to Bill Clinton), I described Chef configuration as telling a node what it “is” while a service provide verbs that change a node. The critical difference is that a node is expected to maintain configuration as its composition changes (e.g.: node is now connected for VLAN 666) while a service responds to specific change requests (node adds tag for VLAN 666). Our goal is the maintain Chef’s configuration management concept for the external entities.

Managed nodes also have a resource discovery concept that must align with the current ohai discovery model. Like a regular node, the manage node’s data attributes reflect the state of the managed entity; consequently we’d expect a blade chassis managed node to enumerate the blades that are included. This creates an expectation that the manage node appears to be “root” for the entity that it represents. We are also assuming that the Chef server can be trusted with the sharable discovered data. There may be cases where these assumptions do not have to be true, but we are making them for now.

Another essential element of managed nodes is that their agent location matters because the external resource generally has restricted access. There are several examples of this requirement. Switch configuration may require a serial connection from a specific node. Blade SANs and PDUs management ports are restricted to specific networks. This means that the manage node agents must run from a specific location. This location is not important to the Chef server or the nodes’ actions against the managed node; however, it’s critical for the system when starting the managed node agent. While it’s possible for managed nodes to run on nodes that are outside the overall Chef infrastructure, our use cases make it more likely that they will run as independent processes from regular nodes. This means that we’ll have to add some relationship information for managed nodes and perhaps a barclamp to install and manage managed nodes.

All of our use cases for managed nodes have a direct physical linkage between the managed node and server nodes. For a switch, it’s the ports connected. For a chassis, it’s the blades installed. For a SAN, it’s the LUNs exposed. These links imply a hierarchical graph that is not currently modeled in Chef data – in fact, it’s completely missing and difficult to maintain. At this time, it’s not clear how we or Opscode will address this. My current expectation is that we’ll use yet more roles to capture the relationships and add some hierarchical UI elements into Crowbar to help visualize it. We’ll also need to comprehend node types because “managed nodes” are too generic in our UI context.

Finally, we have to consider the sequence of action for actions between managed nodes and nodes.  In all of our uses cases, steps to bring up a node requires orchestration with the managed node.  Specifically, there needs to be a hand-off between the managed node and the node.  For example, installing an application that uses VLANs does not work until the switch has created the VLAN,  There are the same challenges on LUNs and SAN and blades and chassis.  Crowbar provides orchestration that we can leverage assuming we can declare the linkages.

For now, a hack to get started…

For now, we’ve started on a workable hack for managed nodes. This involves running multiple chef-clients on the admin server in their own paths & processes. We’ll also have to add yet more roles to comprehend the relationships between the managed nodes and the things that are connected to them. Watch the crowbar listserv for details!

Extra Credit

Notes on the Opscode wiki from the Crowbar & Managed Node sessions

Opscode Summit Recap – taking Chef & DevOps to a whole new level

Opscode Summit Agenda created by open space

I have to say that last week’s Opscode Community Summit was one of the most productive summits that I have attended. Their use of the open-space meeting format proved to be highly effective for a team of motivated people to self-organize and talk about critical topics. I especially like the agenda negations (see picture for an agenda snapshot) because everyone worked to adjust session times and locations based on what else other sessions being offered. Of course, is also helped to have an unbelievable level of Chef expertise on tap.

Overall

Overall, I found the summit to be a very valuable two days; consequently, I feel some need to pay it forward with some a good summary. Part of the goal was for the community to document their sessions on the event wiki (which I have done).

The roadmap sessions were of particular interest to me. In short, Chef is converging the code bases of their three products (hosted, private and open). The primary change on this will moving from CouchBD to a SQL based DB and moving away the API calls away from Merb/Ruby to Erlang. They are also improving search so that we can make more fine-tuned requests that perform better and return less extraneous data.

I had a lot of great conversations. Some of the companies represented included: Monster, Oracle, HP, DTO, Opscode (of course), InfoChimps, Reactor8, and Rackspace. There were many others – overall >100 people attended!

Crowbar & Chef

Greg Althaus and I attended for Dell with a Crowbar specific agenda so my notes reflect the fact that I spent 80% of my time on sessions related to features we need and explaining what we have done with Chef.

Observations related to Crowbar’s use of Chef

  1. There is a class of “orchestration” products that have similar objectives as Crowbar. Ones that I remember are Cluster Chef, Run Deck, Domino
  2. Crowbar uses Chef in a way that is different than users who have a single application to deploy. We use roles and databags to store configuration that other users inject into their recipes. This is dues to the fact that we are trying to create generic recipes that can be applied to many installations.
  3. Our heavy use of roles enables something of a cookbook service pattern. We found that this was confusing to many chef users who rely on the UI and knife. It works for us because all of these interactions are automated by Crowbar.
  4. We picked up some smart security ideas that we’ll incorporate into future versions.

Managed Nodes / External Entities

Our primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that that there is an instance of the chef-client agent that can act as a delegate for the external entity. I had so much to say about that part of the session, I’m posting it as its own topic shortly.