OpenStack Summit: Let’s talk DevOps, Fog, Upgrades, Crowbar & Dell

If you are coming to the OpenStack summit in San Diego next week then please find me at the show! I want to hear from you about the Foundation, community, OpenStack deployments, Crowbar and anything else.  Oh, and I just ordered a handful of Crowbar stickers if you wanted some CB bling.

Matt Ray (Opscode), Jason Cannavale (Rackspace) and I were Ops track co-chairs. If you have suggestions, we want to hear. We managed to get great speakers and also some interesting sessions like DevOps panel and up streaming deploy working sessions. It’s only on Monday and Tuesday, so don’t snooze or you’ll miss it.

My team from Dell has a lot going on, so there are lots of chances to connect with us:

At the Dell booth, Randy Perryman will be sharing field experience about hardware choices. We’ve got a lot of OpenStack battle experience and we want to compare notes with you.

I’m on the board meeting on Monday so likely occupied until the Mirantis party.

See you in San Diego!

PS: My team is hiring for Dev, QA and Marketing. Let me know if you want details.

Big Data to tame Big Government? The answer is the Question.

Today my boss at Dell, John Igoe, is part of announcing of the report from the TechAmerica Federal Big Data Commission (direct pdf), I was fully expecting the report to be a real snoozer brimming with corporate synergies and win-win externalities. Instead, I found myself reading a practical guide to applying Big Data to government. Flipping past the short obligatory “what is…” section, the report drives right into a survey of practical applications for big data spanning nearly every governmental service. Over half of the report is dedicated to case studies with specific recommendations and buying criteria.

Ultimately, the report calls for agencies to treat data as an asset. An asset that can improve how government operates.

There are a few items that stand out in this report:

  1. Need for standards on privacy and governance. The report calls out a review and standardization of cross agency privacy policy (pg 35) and a Chief Data Officer position each agency (pg 37).
  2. Clear tables of case studies on page 16 and characteristics on page 11 that help pin point a path through the options.
  3. Definitive advice to focus on a single data vector (velocity, volume or variety) for initial success on page 28 (and elsewhere)

I strongly agree with one repeated point in the report: although there is more data available, our ability to comprehend this data is reduced. The sheer volume of examples the report cites is proof enough that agencies are, and will be continue to be, inundated with data.

One short coming of this report is that it does not flag the extreme storage of data scientists. Many of the cases discussed assume a ready army of engineers to implement these solutions; however, I’m uncertain how the government will fill positions in a very tight labor market. Ultimately, I think we will have to simply open the data for citizen & non-governmental analysis because, as the report clearly states, data is growing faster than capability to use it.

I commend the TechAmerica commission for their Big Data clarity: success comes from starting with a narrow scope. So the answer, ironically, is in knowing which questions we want to ask.

Do Be Dense! Dell C8000 unit merges best of bladed and rackable servers

“Double wide” is not a term I’ve commonly applied to servers, but that’s one of the cool things about this new class of servers that Dell, my employer, started shipping today.

My team has been itching for the chance to start cloud and big data reference architectures using this super dense and flexible chassis. You’ll see it included in our next Apache Hadoop release and we’ve already got customers who are making it the foundation of their deployments (Texas Adv Computing Center case study).

If you’re tracking the latest big data & cloud hardware then the Dell PowerEdge C8000 is worth some investigation.

Basically, the Dell C8000 is a chassis that holds a flexible configuration of compute or storage sleds. It’s not a blade frame because the sleds minimize shared infrastructure. In our experience, cloud customers like the dedicated i/o and independence of sleds (as per the Bootstrapping clouds white paper). Those attributes are especially well suited for Hadoop and OpenStack because they support a “flat edges” and scale out design. While i/o independence is valued, we also want shared power infrastructure and density for efficiency reasons. Using a chassis design seems to capture the best of both worlds.

The novelty for the Dell PowerEdge C8000 is that the chassis are scary flexible. You are not locked into a pre-loaded server mix.

There are a plethora of sled choices so that you can mix choices for power, compute density and spindle counts. That includes double-wide sleds positively brimming with drives and expanded GPU processers. Drive density is important for big data configurations that are disk i/o hungry; however, our experience is the customer deployments are highly varied based on the planned workload. There are also significant big data trends towards compute, network, and balanced hardware configurations. Using the C8000 as a foundation is powerful because it can cater to all of these use-case mixes.

That reminds me! Mike Pittaro (our team’s Hadoop lead architect) did an excellent Deploy Hadoop using Crowbar video.

Interested in more opinions about the C8000? Check out Barton George & David Meyer.

The power of a list with three things (reading recommendations)

Over breakfast with fellow Dell cloud thinker and general deep thinker Michael Cote, he suggested that we two options for something.  I was quick to add a third and Cote objected because “having three options is a marketing thing to make you happier with the choices.”   He’s right but didn’t have the sources to backup the comment.  Since I was looking them up for Cote, I figured they’d be worth posting too:

  1. Dan Ariely’s Predictably Irrational  (covers the theory and it funny)
  2. Rom Brafman’s Sway (very pragmatic)
  3. Leonard Mlodinow Drunkard’s Walk (if you like statistics)

Personally, I believe that there are always FIVE options.   People generally over look the extreme choices like “do nothing” or  “all in.”   For example, when discussing a product architecture the two options that IMHO should always be on the table include “scrap the project” and “live with the problem.”

We never take the extra options, but they help clarify the three moderate choices.  Of course, this makes more sense after you read the above books.  So read the books (there are 3), don’t read the books, or setup personal interviews with the authors to get the real story.   The choice is yours!

Why doesn’t Chef call them “bowls” instead of roles?

The extended Crowbar team (my employer Dell and community) recently had a bit of a controversy heated discussion over the renaming of “proposals” to “configurations.” It was pretty clear that the term “proposal” confused users because an “active proposal” seems like a bit of an oxymoron. Excepting Scott Jensen, our schedule-czar and director of engineering, we had relatively few die-hard “I love proposal” advocates; however, deciding on an alternative was not quite so easy.

Ultimately, the question came down to “do we use an invented term or an intuitive term.”  The answer lies in when to use or avoid the congruity theory as articulated by Roger Cauvin.

We considered many alternatives like calling them “fixtures” to go along with the Crowbar & Barclamp tool theme. Even “Chuck Norris” was considered until copyright issues were flagged. The top alternative, “configuration” seemed just too bland. Frankly and amazingly, we originally considered it “too descriptive!”

The crux of the argument really revolved around the users’ ability to intuitively grasp a concept or to force them learn a new term. For example, we specifically chose “barclamp” instead of “module” because we felt that there were more components to a barclamp than just being a Crowbar module. In many ways, module would be sufficiently descriptive; however, we saw that there was benefit to the user tax in introducing a new term. It also fit nicely within our tool theme.

Opscode Chef is an example of investing heavily in a naming theme. For example, the concept of “cookbooks” and “recipes” seems relatively intuitive for users but starts getting stretched for “knife” because it is not immediately clear to users what that component does (it executes instructions on nodes and the server). After learning Chef, I appreciate “knife” as the universal tool but still remember having to figure it out.

A good theme is awesome, but it can quickly encumber usability.

For example, what if Chef has used “bowl” instead of role. It’s logical: you put a group of ingredients to mix into a bowl that acts as a container. While it may be logical to the initiated, it mainly extends the learning curve for new users. A role is a commonly accepted term for an operational classification so it is a much better term for users. The same is true for “node” and “data bag.”

I love a good incongruent theme as much as any meme-enabled tech geek but themes must not hinder usability. After all, we all fight for the users.

I am seeking your vote(s) for the OpenStack Board

If registered, you have 8 votes to allocate as you wish.  You will get a link via email – you must use that link.

Joseph B George and I are cross-blogging this post because we are jointly seeking your vote(s) for individual member seats on the OpenStack Foundation board.  This is key point in the OpenStack journey and we strongly encourage eligible voters to participate no matter who you vote for!  As we have said before, success of the Foundation governance process matters just as much as the code because it ensures equal access and limits forking.

We think that OpenStack succeeds because it is collaboratively developed.  It is essential that we select board members who have a proven record of community development, a willingness to partner and have demonstrated investment in the project.

Our OpenStack vision favors production operations by being operator, user and ecosystem focused.  If elected, we will represent these interests by helping advance deployability, API specifications, open operations and both large and small scale cloud deployments.

Of the nominees, we best represent OpenStack users and operators (as opposed to developers).  We have the most diverse experience in real-world OpenStack deployments because our solution has been deployed broadly (both as Dell and through Crowbar.  We have a proven record of collaborating broadly with contributors, demonstrated skills at building the OpenStack community and doing real open source work to ensure that OpenStack is the most deployable cloud platform anywhere.

Let’s get specific about our leadership in the OpenStack project and community:

  • We have been active and vocal leaders in the OpenStack community
    • our team has established two very active user groups (Austin & Boston)
    • we have lead multiple world-wide deploy day events (March 2012  &  May 2012).
    • we have substantial experience in the field and know the challenges of running OpenStack for a wide variety of real-world deployments
    • our first solution came out on Cactus!  We’ve been delivering on Essex since OSCON 2012 (http://www.oscon.com/ ).
  • We represent a broad range of deployment scenarios ranging from hosting, government, healthcare, retail, education, media, financial and more!
  • We have broad engagements and partnerships at the infrastructure (SUSE, Canonical, Redhat), consulting (Canonical, Mirantis) and ecosystem layers (enStratus) and beyond!
  • We have a proven track record of collaboration instead of forking/disrupting – a critical skill for this project reflected by our consistent actions to preserve the integrity of the project.
  • We have led the “make OpenStack deployable” campaign with substantial investments (open source Crowbar, white papers, documentation & cookbooks.
  • We have very long and consistent history with the project starting even before the first OpenStack summit in Austin.

Of course, we’re asking for you to consider for both of us; however, if you want to focus on just one then here’s the balance between us.  Rob (bio) is a technologist with deep roots in cloud technology, data center operations and open source.  Joseph is a business professional with experience new product introduction and enterprise delivery.

Not sure if you can vote?  If you registered as an individual member then your name should be on the voting list.  In that case, you can vote between 8/20 and 8/24.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

  • Contributors need to sign CLAs
  • We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
  • Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
  • There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
  • Due to time limits, we have to stop discussions and continue them online.
  • We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

  • Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
  • We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
  • Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
  • Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

  • This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
  • We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
  • We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
  • The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

  • This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
  • It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
  • We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
  • The abstraction layer needs to have both import and export functions
  • Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
  • In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
  • There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

  • This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
  • Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
  • Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
  • We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
  • Critical use cases to consider:
    • an API for provisioning (not sure if it needs to be more than the current one)
    • pick which Operating Systems go on which nodes (potentially with a rules engine?)
    • inventory capabilities of available nodes (like ohai and factor) into a database
    • inventory available operating systems

Open Community Access to Crowbar 2 Efforts

We’re moving along on the Crowbar2 refactoring work (`/release/rails3anddb/master` branch for now) and it’s time to start making it easier for you to participate if you are interested.

We are planning to start having TWO weekly community sprint meetings.

NOTE: Times can still shift depending on community input! We are trying to a truly global community so we need your input.

The weekly design discussions will be on Tuesdays @ 10am Central (GMT -6). The topics will be relevant to the coming sprint and we expect dialog. The topics will be determined by Greg Althaus based on progress on the refactor. You’re welcome to contact him or the list with suggestions.

The purpose of these meetings is to

  • discuss/resolve design related to the refactor
  • Document use-cases
  • identify issues that need to be addressed in the next sprint

The weekly coordination meetings will be on Thursdays @ 8am Central (GMT -6). We want to respect everyone’s time and will strictly limit these to 1 hour. This meeting is in between our internal sprint review and planning so we have flexibility to adjust our plans based on your input. It is important to us that we make it possible for you to contribute and we need your input to make sure that you are not blocked!

The coordination meetings will be structured as follows (times approximate)

  • Voice & Screencast: https://join.me/dellcrowbar
  • 25 minutes for review of current progress
  • 10 minutes for feedback/adjustments on process and workflow
  • 25 minutes for planning of next sprint
  • Online discussion & notes on the http://crowbar.sync.in/sprintMMDD etherpads
  • Identification of working groups for further discussion and coding collaboration

These meetings are the primary way that we will be making sure the community is not blocked by our development efforts.

The purpose of these meetings is

  • to synchronize quickly so that we can connect the people who should be collaborating
  • eliminate blocking items for Crowbar contributors

Of course, we will attempt to record and post all of these meetings.

Stay tuned! We will likely announce additional meetings for community collaboration.

PS: These are design meetings, they are NOT Crowbar training meetings. Please consult http://bit.ly/crowbarwiki for links and videos about learning Crowbar.

Which side of the desk are the drawers on? (Dunkelisms)

Back in 2001, I had the pleasure to have some long conversations with Phil Dunkelberger.  The impact of those meetings still resonates with me today in a collection of “Dunkelisms” that are an invaluable part of my kick-ass-and-take-names tool box.  I can’t find any Internet source, so I’ll take it on myself to archive these jewels!

Which side of the desk are the drawers on?

I was sitting with Phil and complaining that our web site was not updated and the information was inaccurate.  He looked at me and asked me “which side of the desk are the drawers on?” and completely threw me for a loop.  He explained “when you’re the boss, you sit on the side of the desk with the drawers on it.  You have the power to make changes.”

I whined back that it was Marketing’s job to update the content.  I squirmed under his glare until he asked “as a programmer, do you have ability to update the web site?”  When I said, “yes, but…” his glare wilted my desk plant and the rest of my excuse died with it.

His reply was very crisp, “you have the power to fix it because you have access to the web servers.  If it’s a real problem then the drawers are on your side of the desk.  If it’s not a real problem then help Marketing solve it or move on.”

I’m not saying it would be the right move politically, but it was amazingly powerful to acknowledge that I had the power to fix the problem if I needed.

There are many situations where we voluntarily give up power even when the drawers are on our side of the desk.  For example, when my team is planning we expect Marketing to set priorities that drive development.  Engineering goes along with this for a lot of good reasons; however, the Engineering has the drawers on what really goes into the product.

This Dunkelism helps me align priorities and eliminate roadblocks.  As I dig deeper and deeper into community driven open source projects, I find that the idea behind this expression is a mantra that drives projects forward.

I find this expression very powerful in many situations.  I hope you find it helpful too!

Ops “Late Binding” is critical best practice and key to Crowbar differentiation

From OSCON 2012: Portland's light rail understands good dev practice.Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.

We believe that late binding is a best practice for CloudOps.

Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down.  The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.

If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”

Late binding arose from our design objectives. We started the project with a few critical operational design objectives:

  1. Treat the nodes and application layers as an interconnected system
  2. Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
  3. Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.

We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.

We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?

In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.

Late Binding Visualized