Mayflies and Dinosaurs (extending Puppies and Cattle)

Dont Be FragileJosh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.

Competition should be core to OpenStack Technical Meritocracy

In my work at Dell, Technical Meritocracy means that we recognize and promote demonstrated talent into leadership roles. As a leader, one has to make technical judgments (OK, informed opinions) that focus limited resources in the (hopefully) right places. Being promoted does not automatically make someone right all the time.

I believe that good leaders recognize the value of a diverse set of opinions and the learning value of lean deliverables.

OpenStack is an amazingly diverse and evolving community. Leading in OpenStack requires a level of humility that forces me to reconsider my organization hierarchical thinking around “technical meritocracy.” Instead of a hierarchy where leadership chooses right and wrong, rising in the community meritocracy is about encouraging technical learning and user participation.

OpenStack is a melting pot of many interests and companies. Some of them naturally aligned (customers+vendors) and others are otherwise competitive (vendors). The vast majority of contribution to OpenStack is sponsored – companies pay people to participate and fund the foundation that organizes events. That does not diminish our enthusiasm for the community or open values, but it adds an additional dimension

If we are really seeking a Technical Meritocracy, we must create a place where ideas, teams, projects and companies can pursue different approaches within OpenStack. This is essential to our long term success because it provides a clear way for people to experiment within the project. Pushing away alternate approaches is likely to lead to forking. Specifically, I believe that the mostly likely competitor to any current OpenStack project will be that project’s .next version!

Calls for a “benevolent dictator” imply that our meritocracy has a single person with perspective on right and wrong. Not only is OpenStack simply too complex, I see our central design tenant as enabling multiple approaches to work it out in the community. This is especially important because many aspects of OpenStack are not one-size-fits all. The target diversity of our community requires that we enable multiple approaches so we can expand our user base.

The risk of anointing a single person, approach or project as “the OpenStack way” may appear to streamline the project, but it really stifles innovation. We have a healthy ecosystem of vendors who gladly express opinions about the right way to implement OpenStack. They help us test OpenStack technical merit by finding out which opinions appeal to users. It is essential to our success to enable a vibrant diversity because I don’t think there’s a single right answer or approach.

In every case, those vendor opinions are based on focused markets and customer needs; consequently, our job in the community is to respect and incorporate these divergent needs and find consensus.

7 takeaways from DevOps Days Austin

Block Tables

I spent Tuesday and Wednesday at DevOpsDays Austin and continue to be impressed with the enthusiasm and collaborative nature of the DOD events.  We also managed to have a very robust and engaged twitter backchannel thanks to an impressive pace set by Gene Kim!

I’ve still got a 5+ post backlog from the OpenStack summit, but wanted to do a quick post while it’s top of mind.

My takeaways from DevOpsDays Austin:

  1. DevOpsDays spends a lot of time talking about culture.  I’m a huge believer on the importance of culture as the foundation for the type of fundamental changes that we’re making in the IT industry; however, it’s also a sign that we’re still in the minority if we have to talk about culture evangelism.
  2. Process and DevOps are tightly coupled.  It’s very clear that Lean/Agile/Kanban are essential for DevOps success (nice job by Dominica DeGrandis).  No one even suggested DevOps+Waterfall as a joke (but Patrick Debois had a picture of a xeroxed butt in his preso which is pretty close).
  3. Still need more Devs people to show up!  My feeling is that we’ve got a lot of operators who are engaging with developers and fewer developers who are engaging with operators (the “opsdev” people).
  4. Chef Omnibus installer is very compelling.  This approach addresses issues with packaging that were created because we did not have configuration management.  Now that we have good tooling we separate the concerns between bits, configuration, services and dependencies.  This is one thing to watch and something I expect to see in Crowbar.
  5. The old mantra still holds: If something is hard, do it more often.
  6. Eli Goldratt’s The Goal is alive again thanks to Gene Kims’s smart new novel, The Phoenix project, about DevOps and IT  (I highly recommend both, start with Kim).
  7. Not DevOps, but 3D printing is awesome.  This is clearly a game changing technology; however, it takes some effort to get right.  Dell brought a Solidoodle 3D printer to the event to try and print OpenStack & Crowbar logos (watch for this in the future).

I’d be interested in hearing what other people found interesting!  Please comment here and let me know.

Crowbar and our Pivot (or, how we slipped and shipped Grizzly)

Crowbar Grizzly PostMy team at Dell uses Lean process because it forces us to be honest about making hard choices. Our recent decision to pivot back to Crowbar 1.x for the OpenStack Grizzly release is a great example how the pivot process works.

4/24 note: I have a longer post and ISO for Grizzly on Crowbar waiting until we enter QA. The Crowbar community is already very active around this work and you’re encouraged to join.

Like any refactor, there was schedule risk when we started the Crowbar 2.x release. To mitigate this risk, we made two critical choices. First, we choose to advance the OpenStack barclamps on the 1.x code base in parallel with the 2.x work. Second, we chose a pivot date for the team to choose releasing Grizzly on the 1.x or 2.x trunks.

Choosing to jump back to 1.x was one of the hardest choices I’ve made in my career. I’m proud that we had the foresight to keep that as an option and prouder that our team rallied to make it happen.

I acknowledge that 1.x has gaps; however, getting Grizzly into the field for PoCs and pilots with 1.x provide substantial benefits to the community.  That said, there are barclamps for HA deployments and other production features that are under development on the 1.x branch and will be available in the community.

The 2.x code base provides important features but we are building from on the 1.x deployment recipes. This means that development, testing and tuning applied to the Grizzly barclamps will translates directly into Crowbar 2.x field readiness. In fact, more completeness on OpenStack can dramatically simplify Crowbar 2.x testing efforts.  This is especially true on the OpenStack Networking (fka Quantum) barclamps because they are new work.

Delivering solutions is a balance between features, timing and field experience.  The Crowbar team’s preference is to collaborate with operators in the field and that means making workable software available quickly.

I hope that you’ll agree with our approach and help us make Grizzly the most deployable OpenStack yet.

What foo is “contribution” to open source? Mik Kersten & Tasktop @ SXSW

Nested

How do we really know who influences most in a software project?  We can easily track code commits, but there are more bits to the project than the commits.

I had the good fortune to attend Mik Kersten’s Code Graph presentation at SXSW last week. Mik started the Eclipse Mylyn project and went on to found Tasktop. Both are built on the very intriguing concepts that software development production (aka work) is organized around tasks.

His premise is that organizing around tasks provides a more manageable and actionable view of a project than a more traditional application life-cycle management (ALM) approaches.  I’m a sucker for any presentation about lean development process that includes references to both DevOps and industrial engineering (I have an MS in IE), but Mik surprised me by taking his code graph concept to a whole ‘nutha level.

The software value chain is much deeper than just the people who write code. Mik’s approach included managers, testers and operators in the interaction graphs for his projects.

By including all of the ALM artifacts in the analysis, you get a much richer picture of the influencers for a project.

For example, the development manager may never show up as a code committer; however, they are hugely influential in which work gets prioritized. If your graph includes who is touching the work assignments and stories then the manager’s influence jumps out immediately. That knowledge would completely change how and who you may interact with a team. It effectively brings a shadow contributor into the light.

The same is true for QA members who are running tests and opening defects and operators who are building deployment scripts. Ideally, it should include users who exercise different parts of the applications capabilities.

Mik’s graphs clearly showed the influence impact of managers because they touched all of the story cards for the project.  The people who own the story cards are the most potent influencers in a project, yet they are invisible in code repositories!

I would love to see an impact graph for a software project that equally reflected the wide range of contributions that people make to its life-cycle.  This type of information helps rebalance the power in a project.

Industrial engineering legend W.E. Demming‘s advice is to look at production as a system.  Finding ways to show everyone’s contributions is an important step towards bringing lean processes fully into software manufacturing.

Behavior Driven Development (BDD) and Crowbar

Test Test TestI’m a huge advocate of both behavior and test driven development (BDD & TDD). For the Crowbar 2 refactor, I’ve insisted (with community support) that new code has test coverage to the largest extent possible. While this inflicted some technical debt, it dramatically reduces the risk and effort for new developers to contribute.

For open source projects, they are even more important because they allow the community to contribute to the project with confidence.

A core part of this effort has been the Erlang BDD (“bravo delta”) tool that I had started before my team began Crowbar (code link).

I’m a big fan of BDD domain specific languages (DSL) because I think that they are descriptive. Ideally, everyone on the team including marketing and documentation authors should be able to understand and contribute to these tests.

I’ve been training our QA team on how the BDD system works and they are surprised at the clarity of the DSL. By reading the DSL for a feature, then can figure out what the developer had in mind for the system to do. Even better, they can see which use-cases the developer is already testing. Yet the real excitement comes from the potential to collaborate on the feature definitions before the code is written. My blue-sky-with-rainbows hope is that developers and testers will sit down together and review the BDD feature descriptions before code is written (perhaps even during planning). Even short of that nirvana, the BDD feature descriptions provide something that everyone can review and discuss where code (even with verbose documentation) falls short.

Ok, so you already know the benefits of BDD. Why didn’t I do this in the Cucumber? It’s the leading tool and a logical fit for a Rails project like Crowbar. Frankly, I have a love-hate relationship with Cucumber.

  1. It’s slow. And that does not scale for testing. I’m of the belief that slows tests destroy developer productivity because they encourage distractions.  Our BDD is fast and is not yet optimized.
  2. Too coupled to app framework – you can bypass the UI/API for testing if needed. If I’m doing behavior testing then I want to make sure that everything I test is accessible to the user too.
  3. While Cucumber has a lot of good “webrat” steps to validate basic web pages, I found that these were very basic and I quickly had to write my own.
  4. Erlang pattern matching made it much easier to define steps in a logical way with much less RegEx than Cucumber
  5. Erlang is designed to let us parallelize the tests.
  6. I like programming in Erlang (and I had started BDD before I started Crowbar)

And it goes beyond just testing our code…

We ultimately want to use the BDD infrastructure to gate Crowbar deployments not just code check-ins. That means that when Crowbar orchestrates an application deployment, the BDD infrastructure will execute tests to verify that the deployment is exhibiting the expected behaviors. If the installation does not pass then Crowbar would roll-back or hold the deployment.

This objective is not new or unique – it’s modus operandi at advanced companies who practice continuous deployment. Our position is that this should be an integral part of the orchestration framework.

One side benefit of the BDD system as designed is that it is also a simulator. We are able to take the same core infrastructure and turn it into a load generator and database populator. Unlike more coupled tools, you can run these from anywhere.

Post Script: Here’s the topic that I’m submitting for presentation at OSCON

Continue reading

I respectfully disagree – we are totally aligned on your lack of understanding

Team FacesOccasionally, my journeys into Agile and Lean process force me down to its foundation: cultural fit.  Frankly, there is nothing more central to the success of a team than culture. That’s especially true about Lean because of the humility and honesty required. If your team is not built on a foundation of trust and shared values then it’s impossible keep having the listening and responsive dialog with our customers.

Successful teams have to be honest about taking negative feedback and you cannot do that without trust.

Trust is built on working out differences. Ideally, it would be as simple as “we agree” or “we disagree.” In an ideal world, every team would be that binary.    Remember, that no team always agrees – it’s how we resolve those differences that makes the team successful.  That’s something we know as “diversity” and it’s like annealing of steel to increase its strength.

Unfortunately, there are four  modes of agreement and two are team poison.

  1. Yes: We agree! Let’s get to work!
  2. No: We disagree! Let’s figure out what’s different so that we’re stronger!
  3. Artificial Warfare:  We disagree!  While we are fundamentally aligned, everyone else thinks that the team does not have consensus and ignores the teams decisions.  We also waste a lot of time talking instead of acting.
  4. Artificial Harmony: We agree!  But then we don’t support each other in getting the work done or message alignment.  We never spend time talking about the real issues so we constantly have to redo our actions.

I’ve never seen a team that is as simple as agree/disagree but I’ve been at companies (Surgient) that tried to build a culture to support trust and conflict resolution (based on Lencioni’s excellent 5 dysfunctions book).  However, there’s a major gap between a team that needs to build trust through healthy conflict and one that wraps itself in the dysfunctions of artificial harmony and warfare.

If you find yourself on a team with this problem then you’ll need management by-in to fix it.  I have not seen it be a self-correcting problem.  I’d love to hear if you’ve gotten yourself healthy from a team with these issues.

Signs of artificial agreement syndrome include

  1. Lack of broad participation – discussions are dominated by a few voices
  2. Discussions that always seem to run to the meta topic instead of the actual problem
  3. Issues are not resolved and come up over and over
  4. People are still upset after the meeting because issues have not been resolved
  5. People have different versions of events
  6. Lack of trust for some people to speak for the group
  7. Outcomes of decision making meetings are surprises
  8. Lack of results or missed commitments by the team

Lean Process’ strength is being Honest and Humble

Lean process and methodology is important to me because I think it is central to the work that we are doing in the community.  Even more, it’s changing how my team at Dell creates and delivers products for customers?
This post may be long, but my answer to “why Lean” ends up being very simple: Lean process is honest and humble.
I believe Lean process is more honest because it assumes a lack of knowledge.  It’s more “truthy” to admit there are a lot of things that we don’t know (we can’t know!) until we’ve started doing the work.  It’s very hard to admit we don’t have answers for things until we are further along because we want to feel like  experts and we to lock deliveries.
The “building software is like building a house” analogy is often used to claim that Lean lacks the design “blue prints” that other processes have.  The argument goes that builders needed to understand how the entire house works with structural support, plumbing lines and electrical circuits and things like that.  However, if I was going to build a house I would still leave a lot of things to the last-minute.  The process of building a house evolves so that the basic outlines of structural elements are known. In a lot of cases the position of rooms, the outlets, air conditioning ducts, a lot of the functional components, even windows and doors while they are often placed in the design can easily be moved and changed as you go through.  You can do a walk-through of a house after it’s been framed out and make all sorts of changes and adjustments.  As things go forward in the design of the house things become more and more difficult to change. You are building a brick façade, moving the windows within the façade are very difficult. However, interior places they aren’t.  Likewise, I don’t want to order my life-gem counter-tops from the blue-prints – it’s much safer to order off actual measurements.
Software projects are also building projects. You build a façade, you build a structure and within that structure you have a lot of flexibility. As you go you make more decisions and your choices become more limited. But, that is the nature of building.  For that reason, saying “we don’t know everything we want” is not just good practice, it is much more honest.
But honesty is not enough for a strong Lean process.  The need for humility in Lean architects and business people really stands out.  The Lean process is humble because it starts with the assumption that we don’t really understand the value, drivers, interests and features that make our product special.
We need very strong ideas and a vision; however, we need to be motivated by making something that is significant to other people.  They are the ones who give it value.
We have to give up the idea that we can convince someone who our idea will be significant to them – we have to show and collaborate instead.  The most important thing in building any project and taking any product to market is listening to the people who are using your product and understanding what their needs.  Instead of telling them what they need;  show them something interesting, interact with them and get their opinion.
Contrast that to waterfall methodology where the assumption is that we can put smart people in a room, have them figure out what the requirements are, build a team, get everything ready to go and then start executing.  That assumption is highly optimized and seems very efficient, but it has a huge amount of hubris in the process.  The idea that we can sit down two years in advance of market need and identify what those features and capabilities are seems outrageous to me in the current technology market.  It is so much harder to try to get that information correct and then execute on it that get a directional statement and begin and then get feedback and interact, it is a world of difference between the two processes.
Ultimately, Lean process about having requirements that are less defined or well-known.  It’s driven by giving respect to the people consuming the product.  We can hear their ideas and their reactions.  Where the users’ input can be evaluated and taken in to account.  It’s about collaboration.
Humility it not just about listening and collecting feedback: it is about interacting and building relationships.
So just as our customers are building a relationship with our product, they are also building a relationship with the people creating that product. And that relationship is what drives the product forward and what makes it a great product and it is what gives you a strong and loyal customer base, rather than dictating, “This is what you wanted. Here it is. I hope you enjoy it.”
This is a completely different and powerful way of delivering product.  I believe that honesty and humility in a Lean process inherently creates stronger products and ones that are both faster delivered and better suited to their markets.

The Atlantic magazine explains why Lean process rocks (and saves companies $$)

GearsI’m certain that the Atlantic‘s Charles Fishman was not thinking software and DevOps when he wrote the excellent article about “The Insourcing Boom.”  However, I strongly recommend reading this report for anyone who is interested in a practical example of the inefficiencies of software lean process (If you are impatient, jump to page 2 and search for toaster).

It’s important to realize that this article is not about software! It’s an article about industrial manufacturing and the impact that lean process has when you are making stuff.  It’s about how US companies are using Lean to make domestic plants more profitable than Asian ones.  It turns out that how you make something really matters – you can’t really optimize the system if you treat major parts like a black box.

When I talk about Agile and Lean, I am talking about proven processes being applied broadly to companies that want to make profit selling stuff. That’s what this article is about

If you are making software then you are making stuff! Your install and deploy process is your assembly line. Your unreleased code is your inventory.

This article does a good job explaining the benefits of being close to your manufacturing (DevOps) and being flexible in deployment (Agile) and being connected to customers (Lean).  The software industry often acts like it’s inventing everything from scratch. When it comes to manufacturing processes, we can learn a lot from industry.

Unlike software, industry has real costs for scrap and lost inventory. Instead of thinking “old school” perhaps we should be thinking of it as the school of hard knocks.