Crowbar near-term features: increasing DevOps mojo and brewing Diablo

We’ve been so busy working on getting RHEL support ready to drop into the Crowbar repos that I have not had time to post about what’s coming next for Crowbar. The RHEL addition has required a substantial amount of work to accommodate different packaging models and capabilities. This change moves Crowbar closer to being able allow nodes’ operating systems (the allocated TFTP Boot Image) to be unique per node.

I will post more forward looking details soon but wanted to prime the pump and invite suggestions from our community.

We are tracking two major features for delivery by the OpenStack October Design Conference

  1. OpenStack Diablo Barclamps. Expect to see individual barclamps for various components like Keystone, Dashboard, Glace, Nova, Swift, etc)
  2. Barclamp versioning / connected imports. This feature will enable Crowbar to pull in the latest components for barclamps from remote repositories. I consider this a critical feature for Crowbar’s core DevOps/CloudOps capabilities and to support more community development for barclamps.

We are also working on some UI enhancements

  • Merging together the barclamps/proposals/active views into a single view
  • Enabling bulk actions for nodes (description, BIOS types, and allocate)
  • Allowing users to set node names and showing the names throughout the UI
  • More clarity on state of proposal application process (stretch goal)

I am planning to post more about our design ideas as work begins.

If you want to help with Diablo barclamps, these will be worked in the open and we’d be happy to collaborate. We’re also open to suggestion for what’s next.

The Tao of Agile: focus on delivery while still dreaming BIG

This post is a continuation of the Agile Strategy post.

So, how do we get into the right frame of mind for roadmapping?

You must embrace the Tao of Planning.

There are two conflicting principles behind roadmapping: you must keep thinking out of the box while keeping work deliverable. Neither of these principles is difficult in isolation. The challenge is the keep them in balance and to make sure that the whole team is included.

For my team, we struggle to find group times when we can do some big thinking. The challenge is not the thinking – it’s the TEAM aspect of working on strategy together. Our sprint planning needs to focus on the “keeping work deliverable” objective; consequently, there is precious little time in planning to have big ideas. To make the meeting duration manageable, planning meetings should have a tactical focus. Unfortunately, that leaves a strategy gap.

So, where does a team go to dream?

I wish I had a clear answer to this problem. Ideally, sprint review meetings should extend into deep thinking about where things could go. Strategy during Review is a natural extension because a review mindset should be forward looking. Reviews help us think about how we’re going to use what we delivered and the audience should bring external perspectives. If we could do this then it would be very empowering and exciting during review.

That’s why it’s important to celebrate, play, reflect and pause. All work and no play leaves a team that makes very dull products

Note: the Agile decorations that I use are: Sprint Planning (commits that plan) -> Stand-up (daily sync meeting -> Review (demo/sprint close) -> Retrospective / Hats (team feedback, improvement).

Agile takes discipline: having a strategy means saying “no” more than saying “yes”

With the Crowbar release behind us, it’s time for my team at Dell to do some Capital “P” Planning. Planning for us includes both tactical (next release) and strategic (the releases beyond the one after next), but each type of planning looks very different. I’m going to call it “roadmapping” because planning means something specific and tactical in Agile.

I love roadmapping but I’m a pain to roadmap with because I’m a ruthless prioritizer.

When I sit down for roadmapping, I always do it from a 1 to N list without ties. That means that when marketing asks for a new feature (double the foo on the bar!) we put it on the list relative to other work that needs to get done. If you add something at the top then something else will fall off the bottom. Effectively, we’re using the list to say no to a lot of great ideas. This is essential because “the great is the enemy of the good (Voltaire).” It’s hard, but that’s the cold reality of delivering product.

The most important part of strategy is figuring out what to push down to make room for the precious few yes items.

Successful roadmapping is negotiating the splitting of big ideas into smaller ones. Decomposition is a circular process because one compromise may require another, but one change may force a cascading assumption fault. If you get too emotionally committed to one feature or subset then you’re going to slow down the process. It’s vital to approach roadmapping in free fall.

As always, my advice is to not mix meeting objectives. If you need more strategy then you’ve got to make time for it.

Interested in more…stay tuned for Agile Tao: balancing tactics & strategy

How we use Rally for Agile: it’s about going off the reservation to Rob some Banks.

Dell’s corporate choice of Agile Planning tool is Rally (if you’re wondering, my recommendation on Agile planning tools is ThoughtWorks’ Mingle). This post is rather detailed about how we use Rally, but hopefully useful more broadly. I should mention that I’ve been using Rally since 2005, so I know the tool pretty well. Our objective is to not spend time maintaining Rally (or, as we call it “feeding the Rally Monkey”) while still getting usable burn downs for our releases.

We do NOT use Rally to plan very more than 2 iterations in advance. Even if the tool made planning further easy, I would still recommend against it. I feel strongly that it’s better to have generally defined stories (aka Features or Epics) with general estimates that we call “BANKS.” Our work process is to create a wiki page for each feature that contains information about the goals for the feature and holds documentation for it as the work progresses. The wiki becomes the persistent place for the story, not our planning tool. We even embed [[wiki names]] into the story names to simplify linking.

Our planning process works like this: we create a placeholder story for each feature that we want and attach it to the release that we are working on. These features get a “BANK” suffix because they are the place holder and we put the story point estimate into these stories. You can ALWAYS see the remaining effort estimates by looking at the BANK stories remaining for the sprint. These banks are never assigned to a sprint – they are our backlog. We also maintain the priority order for these banks so we know which ones to work on first.

Before planning, marketing and engineering review the list together and make sure that our priorities are correct. If a story is finished, then we’ll accept the story. If an estimate changes, we may increase it. We NEVER lower the estimates unless the work scope changes! Reducing estimates create graphing artifacts in Rally. If we finish early, then the story is accepted and we burn off the remaining points (which shows as a progress jump towards completing the release).

On planning day, we go to the backlog and pick out the highest priority bank story. We then create another story with the same [[wiki name]] feature in the title and without the BANK suffix. We estimate the story points for this effort and remove that amount from the BANK story. Doing this credit/debit entry ensures that the release estimate remains the same. REPEATING: by removing points from the BANK story when we create a story for work in the sprint we keep the release estimate the same. This is VERY IMPORTANT if you want to show a burn up without creating a lot of stories in advance. Creating detailed stories in advance is a huge waste of time (queue the sound of a giant time sucking vortex vacuum machine). If you are doing this, stop. Really, you can stop because it is a huge waste of time on the scale of passing budget legislation in Congress.

In Rally, we do ALL of our sprint planning from the Track…Releases page (filter set to “defined” stories). This allows us to quickly see and edit the BANK stories that are in our backlog. When we want to talk about requirements or acceptance criteria, we pop over to the feature wiki page. This makes sure that we collect information across sprints. It also allows us to cross reference easily. The new stories are assigned to the sprint and we assign tasks/people to the story. We’ll continue this until we’ve assigned 100% of our team’s velocity for the sprint. At that point, we review the story point estimates and make sure that our time estimate aligns with the points (for us, 1 point ≈ 4 days). If they don’t match then we’ll adjust BOTH the story and the bank so the total is maintained.

If this sounds complicated then you’re reading it correctly. I’ve found this approach is much clearer, faster and simpler than the “right” way to do backlog planning with Rally. At the end of the sprint we accept stories and it shows a release burn up. If a BANK goes to zero then the release scope will show an increase every time we create a new story towards that feature. We do not delete BANK, we only accept them. If you’re BANK is 0 and the feature is not complete then your estimate was wrong. That is good information to track and the increasing in release scope is an accurate reflection of your backlog.

Wow – this post ended up with a lot of very technical Rallyisms. I’d be interested in hearing how you’re using the tool or what you think of these recommendations.

Videos about Crowbar, CloudOps, and Dell OpenStack Cloud

I’m not usually a big fan of launch videos (too much markitecture); however, these turned out to be nice and meaty.  The meaty part explains why it looks like I’m about to eat a big sandwich in the last video.  yum!

  • What is Crowbar: Dell Crowbar Software Overview  

Continue reading

Crowbar source released, includes OpenStack Cloud install

I’m delighted to announce (official version) that my team at Dell has opened the Crowbar source under the Apache 2 license. This action is part of the broader Dell OpenStack Cloud Solution which includes OpenStack install packages, Crowbar, reference hardware architectures, and services/consulting to support deployments.

There are two important components to this news:

  1. Dell is officially offering our OpenStack Solution and helping advance the community’s ability to implement OpenStack quickly and consistently.
  2. Dell is releasing the Crowbar code (which is included in the solution) as open source.

Both are significant items; however, my focus here is on the Crowbar release.

Crowbar started as a Dell OpenStack installer project and then grew beyond that in scope.  Now it can be extended to work with other vendors’ kits and other solutions bits.

We are contributing Crowbar to the community because we believe that everyone benefits by sharing in the operational practices that Crowbar embodies. These are rooted in Opscsode Chef (which Crowbar tightly integrates with) and the cloud & hyper-scale proven DevOps practices that are reflected in our deployment model.

Where to get it?

What’s included?

  • A comprehensive set of barclamps to set up an OpenStack cloud.
  • Crowbar UI and Remote APIs to make it easy to set up your cloud
  • Automated testing scripts for community members doing continuous integration with OpenStack.
  • Build scripts so you can create your own Crowbar install ISO
  • Switch discovery so you can create Chef Cookbooks that are network aware.
  • Open source Chef server that powers much of Crowar’s functionality

What’s not included?

  • Non-open source license components (BIOS+RAID config) that we could not distribute under the Apache 2 license.  We are working to address this and include them in our release.  They are available in the Dell Licensed version of Crowbar.
  • Dell Branded Components (skin + overview page).   Crowbar has an OpenSource skin with identical functionality.
  • Pre-built ISOs with install images (you must download the open source components yourself, we cannot redistribute them to you as a package)

Important notes:

  • Crowbar uses Chef Server as its database and relies on cookbooks for node deployments.  It is installed (using Chef Solo) automatically as part of the Crowbar install.
  • Crowbar has a modular architecture so individual components can be removed, extended, and added. These components are known individually as barclamps.
  • Each barclamp has its own Chef configuration, UI sub-component, deployment configuration, and documentation.

On the project roadmap:

  • Hadoop support
  • Additional operating system support (specifically RHEL)
  • Barclamp version repository
  • Network configuration
  • We’d like suggestions!  Please comment!

Sites for more information: Joseph George, Barton George (launch day), Dell

Austin CloudCamp 7/20 – Lightening Talk

If you’re in Austin on 7/20 then come to the 2011 ATX CloudCamp @ 6pm (Downtown)

In addition to the normal great unconference format, I’ll be giving one of the lightning talks.  My topic will be about Cloud Operations for OpenStack.

Here’s a copy of my CloudCamp 07 2011 preso.  Unfortunately, the video was not complete so I can’t include it.

 

Crowbar’s surprise value proposition: continous integration (#ci) testing

As part of our Agile/Lean methodologies, our team at Dell is highly invested in automated testing and continuous integration.  We’re running Jenkins to coordinate builds and EVERY CHECK-IN launches our full integration suite that tests our system end-to-end.  It may not be typical, but I don’t consider that to be particularly note worthy because it’s best practice.    (Rob’s note: if you write code and don’t think you have the authority then you need to geek-up and just do it – that’s our MO at Dell)

It’s important to understand that since Crowbar is an installer, every check-in does a FULL CLEAN INSTALL of all the Cactus OpenStack components.  Our verification requires that we test OpenStack because that’s our #1 exit requirement.  Consequently, we have built an automated build system that does a continuous integration test of a full, multi-node Nova/Glance/Swift deployment.

Automated end-to-end integration tests of OpenStack are a very handy thing!

In the last few weeks, we’ve heard from Dell internal groups and partners who are contributing to OpenStack Diablo that they want to leverage our work in continuous integration.  This will allow them to make sure that their development work does not regress other functions.  It’s a significant opportunity to ensure that we can collaborate between organizations.  It also promotes early development and distribution of Diablo installation scripts.

To support this in Crowbar, we are already planning incorporate more sophisticated revision control (likely based on Git) into Crowbar.

Note: YES, we consider our CI scripts to be part of our open source code.

Avoid false agreements and saying no with a yes. #TeamDeath

caution

One of my favorite things about Agile is how it helps teams get committed toward a shared goal.  There are so many distractions and confusions, that we need to double down ways to help people get and then stay on the same page.  In some cases, it comes down to something as simple as word choice!

First, I feel like I need some explanation…

There comes a time in any disagreement when the team needs everyone to get on the same page even if they don’t agree.  As a rule, this should be a relatively small window (maybe 20 minutes max) because the team can defer issues by having a sprint long spike* or exploration story that collects more information to settle arguments down the road. 

Personal Experience Note: A team should NEVER spend much time arguing about the mid or long-term future!  It’s just not worth the time to convince someone that your vision is more compelling.  It’s more efficient to accept that there are MULTIPLE VALID FUTURES and that the team needs to watch to see which one(s) is  taking shape.  There is no need to be “right” about the future.

So, back to the fake agreement phrases that effective teams avoid.

#1 “Yes, but…”

This statement really means “Will you shut up already?  I don’t agree.”  The speaker says “yes” to acknowledge the first person has finished; however, it does not mean that they agree.  The confusing thing is the speaker typically does not even realize that they are sending you into a discussion death spiral. 

Anytime someone says “but” then they are disagreeing.   Just for fun, trying have discussions where people are not allowed to say but – it creates a whole new positive dynamic.

#2 “I don’t disagree”

This statement really means “You are full of shit and my opinion is more right.”  The speaker is trying to avoid addressing your points directly and refocus discussion on their opinion.  Agreement means that everyone believes the same thing.  There are many ways to not agree and only one way to agree.

This is one of my pet peeves because the speaker thinks they are rewarding you with some back-handed pat on the head.  In reality, they shutting your ideas down without validation or acknowledgement.

There are many such statements that waste team time and mask disagreement.  If you have some that bug you, please comment on this post and add to the dialog.  I’m sure that I won’t disagree with any of them!

* Spike stories are time bounded stories that have specific research or opinion deliverables.  They are intended to collect enough information that the team can take action and move forward.   Sometimes these are also called “time box” stories.

The unexpected openness of OpenStack: why it’s important to learn from others’ operations experience.

During the OpenStack Design Conference, Forrester’s James Staten (@Staten7) raved about OpenStack’s transparency compared to AWS.  Within the enclave of OpenStack fan boys supports (Dell alone sent >14 people to the summit), his post drew a considerable attention but did little to really further the value proposition.

“Open deployments” are a much more significant value to implementors than transparency from open source code.

For any technology solution, there are significant challenges that will only be understood when the system is under stress.  In some cases, these challenges are code defects; however, many will be related to configuration and deployment choices that are site specific.  It is correcting these issues that result in design patterns and practices that create a robust infrastructure; consequently, the process of hardening a solution is critical to its ultimate stability and success.

When a solution, like AWS, is deployed and managed by a single entity, it is extremely rare for operational lessons learned and best practices to make it to the larger community.  Amazon’s recent post mortem is a welcome exception.   This is not a bad thing (Roman Stanek’s contrasting point), it is just the reality of a proprietary cloud.  AWS operates as a black box and I don’t believe that Amazon’s operational experience would be relevant to others unless they were also operationally transparent.

While it makes business sense to remain operationally opaque, service providers lose the benefit of external lessons learned when there is no community working in parallel with them.

OpenStack’s community has an opportunity to iterate on CloudOps patterns and practices at a dramatically faster rate than any single provider.  This creates distinct value for OpenStack adopters because they can shorten or eliminate their own challenges because other adopters will have the same pains and benefit from the same fixes.

It is critical to understand that the benefit is conferred to both the party sharing the problem (they get advice and support) and the party lending assistance (they avoid the problem).  This is distinctly different from proprietary clouds where sharing is likely to cause embarrassment  unlikely to create helpful outcomes.

I am not advocating that all OpenStack deployments be the same or follow a prescriptive patterns. 

I believe that each installation will be unique in some way; however, there will  be enough commonalities and shared code to make sharing worthwhile.  This is especially true for adopters who start with tools like Crowbar that leverage community based Chef Recipes and automating scripts.  Tools that encourage automation and shared scripts help accelerate the establishment of robust deployment patterns and practices.

Ultimately, the ability to collaborate on cloud operation practice does more to strengthen OpenStack than developers, code reviews or corporate endorsements.