OpenStack Essex Deploy Day 3/8 – Get involved and install with us

My team at Dell has been avidly tracking the upsdowns, and breakthroughs of the OpenStack Essex release.  While we still have a few milestones before the release is cut, we felt like the E4 release was a good time to begin the work on Essex deployment.  Of course, the final deployment scripts will need substantial baking time after the final release on April 5th; however, getting deployments working will help influence the quality efforts and expand the base of possible testers.

To rally behind Essex Deployments, we are hosting a public work day on Thursday March 8th.

For this work day, we’ll be hosting all-day community events online and physically in Austin and Boston.  We are getting commitments from other Dell teams, partners and customers around the world to collaborate.  The day is promising to deliver some real Essex excitement.

The purpose of these events is to deliver the core of a working OpenStack Essex deployment.  While my team is primarily focused on deploys via Crowbar/Chef, we are encouraging anyone interested in laying down OpenStack Essex to participate.  We will be actively engaged on the OpenStack IRC and mailing lists too.

We have experts in OpenStack, Chef, Crowbar and Operating Systems (Canonical, SUSE, and RHEL) engaged in these activities.

This is a great time to start learning about OpenStack (or Crowbar) with hands-on work.  We are investing substantial upfront time (checkout out the Crowbar wiki for details) to ensure that there is a working base OpenStack Essex deploy on Ubuntu 12.04 beta.  This deploy includes the Crowbar 1.3 beta with some new features specifically designed to make testing faster and easier than ever before.

In the next few days, I’ll cut a 12.04 ISO and OpenStack Barclamp TARs as the basis for the deploy day event.  I’ll also be creating videos that help you quickly get a test lab up and running.  Visit the wiki or meetup sites to register and stay tuned for details!

Austin OpenStack Meetup: Keystone & Knife (2/20 notes via Greg Althaus)

I could not make it to the recent Austin OpenStack Meetup, but Greg Althaus generously let me post his notes from the event.

Background

Matt Ray talks about Chef

Matt Ray from Opscode presented some of the work with Chef and OpenStack. He talked about the three main chef repos floating around. He called out Anso’s original cookbook set that is the basis for the Crowbar cookbooks (his second set), and his final set is the emerging set of cookbooks in OpenStack proper. The third one is interesting and what he plans to continue working on to make into his public openstack cookbooks. These are an amalgamation of smokestack, RCB, Anso improvements, and his (Crowbar’s).

He then demoed his knife plugin (slideshare) to build openstack virtual servers using the Openstack API. This is nice and works against TryStack.org (previously “Free Cloud”) and RCB’s demo cloud. All of that is on his github repo with instructions how to build and use. Matt and I talked about trying to get that into our Crowbar distro.

There were some questions about flow and choice of OpenStack API versus Amazon EC2 API because there was already an EC2 knife set of plugins.

Ziad Sawalha talks about Keystone

Ziad Sawalha is the PLT (Project Technical Lead) for Keystone. He works for Rackspace out of San Antonio. He drove up for the meeting.

He split his talk into two pieces, Incubation Process and Keystone Overview. He asked who was interested in what and focused his talk more towards overview than incubation.

Some key take-aways:

  • Keystone comes from Rackspace’s strong, flexible, and scalable API. It started as a known quantity from his perspective.
  • Community trusted nothing his team produced from an API perspective
  • Community is python or nothing
    • His team was ignored until they had a python prototype implementing the API
    • At this point, comments on API came in.
  • Churn in API caused problems with implementation and expectations around the close of Diablo.
    • Because comments were late, changes occurred.
    • Official implementation lagged and stalled into arriving.
  • API has been stable since Diablo final, but code is changing. that is good and shows strength of API.
  • Side note from Greg, Keystone represents to me the power of API over Code. You can have innovation around the implementation as long all the implementations have a fair ground work to plan under which is an API specification. The replacement of Keystone with the Keystone Light code base is an example of this. The only reason this is possible is that the API was sound and documented.  (Rob’s post on this)

Ziad spent the rest of his time talking about the work flow of Keystone and the API points. He covered the API points.

  • Client to Keystone, Keystone to Client for initial auth token
  • Client to Middleware API for the services to have a front.
  • Middleware to Keystone to verify and establish identity.
  • Middleware to Service to pass identity

Not many details other then flow and flexibility. He stressed the API design separated protocol from actions and data at all the layers. This allows for future variations and innovations while maintaining the APIs.

Ziad talked about the state of Essex.

  • Planned
    • RBAC (aka Role Based Access Control)
    • Stability
    • Many backends
  • Actual
    • Code replacement Keystone Light
    • Stability
    • LDAP backend
    • SQL backend

Folsum work:

  • RBAC
  • Stability
  • AD backend
  • Another backend
  • Federation was planned but will most likely be pushed to G
    • Federation is the ability for multiple independent Keystones to operate (bursting use case)
    • Dependent upon two other federation components (networking and billing/metering)

Superbowl Ad Bingo for either High IQ or Matrix Challenged

If you, like me, are going to a party to observe expensive ads interrupted by costumed men line dancing aggressively then you may enjoy the bingo card generator that I put together.

This spreadsheet creates TWO types of bingo cards,

  • Smart cards have TWO dimensions: the columns select the type of pitch while the rows select the item being pitched.
  • Dumb cards just present choices (there are more) for you to find

Note: to use, you need to press “F9” between each printing to generate new randomness.

Post Superbowl note: Apparently, I should have had “Factor Setting” instead of the “Sexy Model” column.  Disappointing!  Also, liquor & political ads are not shown during the Superbowl.  Suggestions for improvement are welcome.  If you got “Betty White” multiple times, I don’t want to hear about it.

Creative Commons licensed.  No warranty.

Crowbar+OpenStack Insights for the week: Food Fight Podcast & Boston Meetup 2/1

Please don’t confuse a lack of posts with a lack of activity!  I’ve been in the center of a whirlwind of Crowbar, OpenStack and Hadoop for my team at Dell.  I’ve also working on an interesting side project with Liquid Leadership author (and would-be star ship captain) Brad Szollose.

I just don’t have time to post all of the awesomeness.  I can tell you that my team is very focused on Hadoop (RHEL 6.2/CentOS 6.2 + open Cloudera Distro) barclamps as we get some Diablo deployments done.  Also the Crowbar list has been very active about Diablo.  If you’re looking for advanced information, there is  some inside scoop on the Crowbar FoodFight podcast I did with Bryan Berry & Matt Ray.

I’ll be in BOSTON THIS WEDNESDAY 2/1 for the OpenStack Meetup there.  We’re going to be talking about Quantum and the OpenStack Foundation.  I suspect that Keystone will come up too (but that’s the subject of another post).  Of course, it’s not just your humble blogger: the whole Dell CloudEdge OpenStack/Crowbar team will be on hand!  So put on your cloud geek hat and take a trip to Harvard for the meetup!

CloudOps white paper explains “cloud is always ready, never finished”

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.

Early crop of Crowbar 1.3 features popping up

My team at Dell is still figuring out some big items for the 1.3 release; however, somethings were just added that is worth calling out.

  1. Ubuntu 11.04 support!   Thanks to Justin Shepherd from Rackspace Cloud Builders!
  2. Alias names for nodes in the UI
  3. User managed node groups in the UI
  4. Ability to pre-populate the alias, description and group for a node (not integrated with DNS yet)
  5. Hadoop is working again – we addressed the missing Ganglia repo issue.  Thanks to Victor Lowther.
For items 2 – 4, I made a short video tour: Node Alias & Group

Also, I’ve spun new open source ISOs with the new features.  User beware!

Crowbar 1.2 released includes OpenStack Diablo Final

With the holiday rush, I neglected to post about Monday’s Crowbar v1.2 release (ISO here)!

The core focus for this release was to support the OpenStack Diablo Final bits (which my employer, Dell, includes as part of the “Dell OpenStack Powered Cloud Solution“); however, we added a lot of other capability as we continue to iterate on Crowbar.

I’m proud of our team’s efforts on this release on both on features and quality.  I’m equally delighted about the Crowbar community engagement via the Crowbar list server.  Crowbar is not hardware or operating system specific so it’s encouraging to hear about deployments on other gear and see the community helping us port to new operating system versions.

We driving more and more content to Crowbar’s Github as we are working to improve community visibility for Crowbar.  As such, I’ve been regularly updating the Crowbar Roadmap.  I’m also trying to make videos for Crowbar training (suggestions welcome!).  Please check back for updates about upcoming plans and sprint activity.

Crowbar Added Features in v1.2:

  • Central feature was OpenStack Diablo Final barclamps (tag “openstack-os-build”)
  • Improved barclamp packaging
  • Added concepts for “meta” barclamps that are suites of other barclamps
  • Proposal queue and ordering
  • New UI states for nodes & barclamps (led spinner!)
  • Install includes self-testing
  • Service monitoring (bluepill)

Looking forward

Dell has a long list of pending Hadoop and OpenStack deployments using these bits so you can expect to see updates and patches matching our field experiences.  We are very sensitive to community input and want to make Crowbar the best way to deliver a sustainable repeatable reference deployment of OpenStack, Hadoop and other cloud technologies.

Extending Chef’s reach: “Managed Nodes” for External Entities.

Note: this post is very technical and relates to detailed Chef design patterns used by Crowbar. I apologize in advance for the post’s opacity. Just unleash your inner DevOps geek and read on. I promise you’ll find some gems.

At the Opscode Community Summit, Dell’s primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that there is an instance of the chef-client agent that can act as a delegate for the external entity. We’ve been reluctant to call it a “proxy” because that term is so overloaded.

My Crowbar vision is to manage an end-to-end cloud application life-cycle. This starts from power and network connections to hardware RAID and BIOS then up to the services that are installed on the node and ultimately reaches up to applications installed in VMs on those nodes.

Our design goal is that you can control a managed node with the same Chef semantics that we already use. For example, adding a Network proposal role to the Switch managed node will force the agent to update its configuration during the next chef-client run. During the run, the managed node will see that the network proposal has several VLANs configured in its attributes. The node will then update the actual switch entity to match the attributes.

Design Considerations

There are five key aspects of our managed node design. They are configuration, discovery, location, relationships, and sequence. Let’s explore each in detail.

A managed node’s configuration is different than a service or actuator pattern. The core concept of a node in chef is that the node owns the configuration. You make changes to the nodes configuration and it’s the nodes job to manage its state to maintain that configuration. In a service pattern, the consumer manages specific requests directly. At the summit (with apologies to Bill Clinton), I described Chef configuration as telling a node what it “is” while a service provide verbs that change a node. The critical difference is that a node is expected to maintain configuration as its composition changes (e.g.: node is now connected for VLAN 666) while a service responds to specific change requests (node adds tag for VLAN 666). Our goal is the maintain Chef’s configuration management concept for the external entities.

Managed nodes also have a resource discovery concept that must align with the current ohai discovery model. Like a regular node, the manage node’s data attributes reflect the state of the managed entity; consequently we’d expect a blade chassis managed node to enumerate the blades that are included. This creates an expectation that the manage node appears to be “root” for the entity that it represents. We are also assuming that the Chef server can be trusted with the sharable discovered data. There may be cases where these assumptions do not have to be true, but we are making them for now.

Another essential element of managed nodes is that their agent location matters because the external resource generally has restricted access. There are several examples of this requirement. Switch configuration may require a serial connection from a specific node. Blade SANs and PDUs management ports are restricted to specific networks. This means that the manage node agents must run from a specific location. This location is not important to the Chef server or the nodes’ actions against the managed node; however, it’s critical for the system when starting the managed node agent. While it’s possible for managed nodes to run on nodes that are outside the overall Chef infrastructure, our use cases make it more likely that they will run as independent processes from regular nodes. This means that we’ll have to add some relationship information for managed nodes and perhaps a barclamp to install and manage managed nodes.

All of our use cases for managed nodes have a direct physical linkage between the managed node and server nodes. For a switch, it’s the ports connected. For a chassis, it’s the blades installed. For a SAN, it’s the LUNs exposed. These links imply a hierarchical graph that is not currently modeled in Chef data – in fact, it’s completely missing and difficult to maintain. At this time, it’s not clear how we or Opscode will address this. My current expectation is that we’ll use yet more roles to capture the relationships and add some hierarchical UI elements into Crowbar to help visualize it. We’ll also need to comprehend node types because “managed nodes” are too generic in our UI context.

Finally, we have to consider the sequence of action for actions between managed nodes and nodes.  In all of our uses cases, steps to bring up a node requires orchestration with the managed node.  Specifically, there needs to be a hand-off between the managed node and the node.  For example, installing an application that uses VLANs does not work until the switch has created the VLAN,  There are the same challenges on LUNs and SAN and blades and chassis.  Crowbar provides orchestration that we can leverage assuming we can declare the linkages.

For now, a hack to get started…

For now, we’ve started on a workable hack for managed nodes. This involves running multiple chef-clients on the admin server in their own paths & processes. We’ll also have to add yet more roles to comprehend the relationships between the managed nodes and the things that are connected to them. Watch the crowbar listserv for details!

Extra Credit

Notes on the Opscode wiki from the Crowbar & Managed Node sessions

Barclamps: now with added portability!

I had a question about moving barclamps between solutions.  Since Victor just changed the barclamp build to create a tar for each barclamp (with the debs/rpms), I thought it was the perfect time to explain the new feature.

You can find the barclamps on the Crowbar ISO under “/dell/barclamps” and you can install the TAR onto a Crowbar system using “./barclamp_install foo.tar.gz” where foo is the name of your barclamp.

Here’s a video of how to find and install barclamp tars:

Note: while you can install OpenStack into a Hadoop system, that combination is NOT tested.  We only test OpenStack on Ubuntu 10.10 and Hadoop on RHEL 5.7.   Community help in expanding support is always welcome!

Opscode Summit Recap – taking Chef & DevOps to a whole new level

Opscode Summit Agenda created by open space

I have to say that last week’s Opscode Community Summit was one of the most productive summits that I have attended. Their use of the open-space meeting format proved to be highly effective for a team of motivated people to self-organize and talk about critical topics. I especially like the agenda negations (see picture for an agenda snapshot) because everyone worked to adjust session times and locations based on what else other sessions being offered. Of course, is also helped to have an unbelievable level of Chef expertise on tap.

Overall

Overall, I found the summit to be a very valuable two days; consequently, I feel some need to pay it forward with some a good summary. Part of the goal was for the community to document their sessions on the event wiki (which I have done).

The roadmap sessions were of particular interest to me. In short, Chef is converging the code bases of their three products (hosted, private and open). The primary change on this will moving from CouchBD to a SQL based DB and moving away the API calls away from Merb/Ruby to Erlang. They are also improving search so that we can make more fine-tuned requests that perform better and return less extraneous data.

I had a lot of great conversations. Some of the companies represented included: Monster, Oracle, HP, DTO, Opscode (of course), InfoChimps, Reactor8, and Rackspace. There were many others – overall >100 people attended!

Crowbar & Chef

Greg Althaus and I attended for Dell with a Crowbar specific agenda so my notes reflect the fact that I spent 80% of my time on sessions related to features we need and explaining what we have done with Chef.

Observations related to Crowbar’s use of Chef

  1. There is a class of “orchestration” products that have similar objectives as Crowbar. Ones that I remember are Cluster Chef, Run Deck, Domino
  2. Crowbar uses Chef in a way that is different than users who have a single application to deploy. We use roles and databags to store configuration that other users inject into their recipes. This is dues to the fact that we are trying to create generic recipes that can be applied to many installations.
  3. Our heavy use of roles enables something of a cookbook service pattern. We found that this was confusing to many chef users who rely on the UI and knife. It works for us because all of these interactions are automated by Crowbar.
  4. We picked up some smart security ideas that we’ll incorporate into future versions.

Managed Nodes / External Entities

Our primary focus was creating an “External Entity” or “Managed Node” model. Matt Ray prefers the term “managed node” so I’ll defer to that name for now. This model is needed for Crowbar to manage system components that cannot run an agent such as a network switch, blade chassis, IP power distribution unit (PDU), and a SAN array. The concept for a managed node is that that there is an instance of the chef-client agent that can act as a delegate for the external entity. I had so much to say about that part of the session, I’m posting it as its own topic shortly.