Cloud Culture: Becoming L33T – Five ways to “go digital native” [Collaborative Series 7/8]

Posted on September 24, 2014 by Rob H

Subtitle: Five keys to earn Digital Natives’ trust

This post is #7 in an collaborative eight part series by Brad Szollose and I about how culture shapes technology.

WARNING: These are not universal rules! These are two cultures. What gets high scores for Digital Natives is likely to get you sacked with Digital Immigrants.

How do Digital Natives do business?

You don’t sell! You collaborate with them to help solve their problems. They’ll discredit everything say if you “go all marketing on them” and try to “sell them.”

Here are five ways that you can build a two-way collaborative relationship instead of one-way selling. These tips aren’t speculation: Brad has proven these ideas work in real-world business situations.

Interested in Digital Native Culture? We recommend reading (more books):

Brad Szollose: Liquid Leadership (Rob’s shameless plug for Brad)
Jane McGonagal: Reality is Broken
Danah Boyd: It’s Complicated
Eric Ries: Lean Startup
David Weinberger: Too Big To Know

1) Share, don’t tell.

Remember the cultural response in Rob’s presentation discussed in the introduction to this paper? The shift took place because Rob wanted to share his expertise instead of selling the awesomeness of his employeer. This is what changed the dynamic.

In a selling situation, the sales pitch doesn’t address our client’s needs. It addresses what we want to tell them and what we think they need. It is a one-way conversation. And if someone has a choice between saying “yes” or “no” in a sales meeting, a client can always have the choice to say “no.”

Sharing draws our customers in so we can hear their problems and solve them. We can also get a barometer on what they know versus what they need. When Rob is presenting to a customer, he’s qualifying the customer too. Solutions are not one size fits all and Digital Natives respect you more for admitting this to them.

Digital Native business is about going for a long-term solution-driven approach instead of just positioning a product. If you’ve collaborated with customers and they agree you’ve got a solution for them then it’s much easier to close the sale. And over the long term, it’s a more lucrative way to do business.

2) Eliminate bottlenecks.

Ten years ago, IT departments were the bottleneck to getting products into the market. If customers resisted, it could take years to get them to like something new. Today, Apple introduces new products every six month with a massive adoption rate because Digital Natives don’t wait for permission from an authority.

The IT buyer has made that sales cycle much more dynamic because our new buyers are Digital Natives. Where Digital Immigrants stayed entrenched in a process or technology, Digital Natives are more willing to try something unproven. Amazon’s EC2 public cloud presented a huge challenge to the authority of IT departments because developers were simply bypassing internal controls. Digital Natives have been trained to look for out-of-the-box solutions to problems.

Time-to-market has become the critical measure for success.

We now have IT end-user buyers who adopt and move faster through the decision process than ever before! We interfere with their decision process if we still treating new buyers as if they can’t keep up and we have to educate them.

Today’s Digital Workers are smart, self-starters who more than understand technology; they live it. Their intuitive nature toward technology and the capacity to use it without much effort has become a cultural skill set. Also they can look up, absorb, and comprehend products without much effort. They did their homework before we walked in the door.

Digital Natives are impatient. They want to skip over what they know and get to real purpose and collaboration. You add bottlenecks when you force them back into a traditional decision process that avoids risk; instead, they are looking to business partners to help them iterate and accelerate.

How did this apply to the Crowbar project?

Crowbar addresses a generation’s impatience to be up and running in record time. But there is more to it than that: we engage with customers differently too. Our open source collaboration and design flexibility mean that we can dialog with customers and partners to figure out the real wants and needs in record time.

3) Let go of linear.

Digital Natives do not want to be walked through detailed linear presentations. They do want the information but leave out the hand holding. The best strategy is to prepare to be a well-trained digital commando—plan a direction, be confident, be ready to respond, and be willing to admit knowledge gaps. It’s a strategy without a strategy.

Ask questions at the beginning of a meeting—this becomes a knowledge base “smell test.” Listening to what our clients know and don’t know gets us to the heart and purpose of why we are there. Take notes. Stay open to curve balls, tough questions, and—dare we say it—the client telling us we are off base. You should not be surprised at how much they know.

For open source projects at Dell (Rob’s Employeer), customers have often downloaded and installed the product before they have talked to the sales team. Rob has had to stop being surprised when they are better informed about our offerings than our well trained internal teams. Digital Natives love collecting information and getting started independently. This completely violates the normal linear sales process; instead, customers enter more engaged and ready if you can be flexible enough to meet them where they already are.

4) Be attentively interactive.

No one likes to sit in one meeting after another. Why are meetings boring? Meetings should be engaging and collaborative; unfortunately, most meetings are simply one-way presentations or status updates. When Digital Natives interrupt a presentation, it may mean they are not getting what they want but it also means they are paying attention.

Aren’t instant messaging, texting, and tweeting attention-stealing distractions?

Don’t confuse IMing, texting, emailing, and tweeting as lack of attention or engagement.

Digital Natives use these “back channels” to speed up knowledge sharing while eliminating the face-to-face meeting inertia of centralized communication.

Of course, sometimes we do check out and stop paying attention.

Time and attention are valuable commodities!

With all the distractions and multi-tasking for speed and connectivity, giving someone undivided attention is about respect, and paying attention is not passive! When we ask questions, it shows that we’re engaged and paying attention. When we compile all the answers from those questions, our intention leads us to solutions. Solving our client’s problems is about getting to the heart of the matter and becomes the driving force behind every action and solution.

Don’t be afraid to stray from the agenda—our attention is the agenda.

5) Stay open to happy accidents.

In Brad’s book, Liquid Leadership, the chapter titled “Have Laptop. Will Travel” points out how Digital Natives have been trained in virtualized work habits because they are more effective.

Our customers are looking for innovative solutions to their problems and may find them in places that we do not expect. It is our job to stay awake and open to solution serendipity. Let’s take this statement out of our vocabulary: “That’s not how we do it.” Let’s try a new approach: “That isn’t traditionally how we would do it, but let us see if it could improve things.”

McDonald’s uses numbers for their combo meals to make sure ordering is predictable and takes no more than 30 seconds. It sounds simple, but changes come from listening to customers’ habits. We need to stop judging and start adapting. Imagine a company that adapts to the needs of its customers?

Sales guru Jeffery Gitomer pays $100 in cash to any one of his employees who makes a mistake. This mistake is analyzed to figure out if it is worthy of application or to be discarded. He doesn’t pay $100 if they make the same mistake twice. Mistakes are where we can discover breakthrough ideas, products, and methods.

Making these kinds of leaps requires that we first let go of rigid rules and opinions and make it OK to make a few mistakes … as long as we look at them through a lens of possibility. Digital Natives have spent 10,000 hours playing learning to make mistakes, take risks, and reach mastery.

Keep Reading! Next post is Three Takeaways (previous Win by Failing)

Cloud Culture: Level up – You win the game by failing successfully [Collaborative Series 6/8]

Posted on September 17, 2014 by Rob H

Translation: Learn by playing, fail fast, and embrace risk.

This post is #6 in an collaborative eight part series by Brad Szollose and I about how culture shapes technology.

Digital Natives have been trained to learn the rules of the game by just leaping in and trying. They seek out mentors, learn the politics at each level, and fail as many times as possible in order to learn how NOT to do something. Think about it this way: You gain more experience when you try and fail quickly then carefully planning every step of your journey. As long as you are willing to make adjustments to your plans, experience always trumps prediction.

Just like in life and business, games no longer come with an instruction manual.

In Wii Sports, users learn the basic in-game and figure out the subtlety of the game as they level up. Tom Bissel, in Extra Lives: Why Video Games Matter, explains that the in-game learning model is core to the evolution of video games. Game design involves interactive learning through the game experience; consequently, we’ve trained Digital Natives that success comes from overcoming failure.

Early failure is the expected process for mastery.

You don’t believe that games lead to better decision making in real life? In a January 2010 article, WIRED magazine reported that observations of the new generation of football players showed they had adapted tactics learned in Madden NFL to the field. It is not just the number of virtual downs played; these players have gained a strategic field-level perspective on the game that was before limited only to coaches. Their experience playing video games has shattered the on-field hierarchy.

For your amusement…Here is a video about L33T versus N00B culture From College Humor “L33Ts don’t date N00Bs.” Youtu.be/JVfVqfIN8_c

Digital Natives embrace iterations and risk as a normal part of the life.

Risk is also a trait we see in entrepreneurial startups. Changing the way we did things before requires you to push the boundaries, try something new, and consistently discard what doesn’t work. In Lean Startup Lessons Learned, Eric Ries built his entire business model around the try-learn-adjust process. He’s shown that iterations don’t just work, they consistently out innovate the competition.

The entire reason Dell grew from a dorm to a multinational company is due to this type of fast-paced, customer-driven interactive learning. You are either creating something revolutionary or you will be quickly phased out of the Information Age. No one stays at the top just because he or she is cash rich anymore. Today’s Information Age company needs to be willing to reinvent itself consistently … and systematically.

Why do you think larger corporations that embrace entrepreneurship within their walls seem to survive through the worst of times and prosper like crazy during the good times?

Gamer have learned that Risk that has purpose will earn you rewards.

To improve flow, we must view OpenStack community as a Software Factory

Posted on September 15, 2014 by Rob H

This post was sparked by a conversation at OpenStack Atlanta between OpenStack Foundation board members Todd Moore (IBM) and Rob Hirschfeld (Dell/Community). We share a background in industrial and software process and felt that sharing lean manufacturing translates directly to helping face OpenStack challenges.

While OpenStack has done an amazing job of growing contributors, scale has caused our code flow processes to be bottlenecked at the review stage. This blocks flow throughout the entire system and presents a significant risk to both stability and feature addition. Flow failures can ultimately lead to vendor forking.

Fundamentally, Todd and I felt that OpenStack needs to address system flows to build an integrated product. The post expands on the “hidden influencers” issue and adds an additional challenge because improving flow requires that the community influences better understands the need to optimize work inter-project in a more systematic way.

Let’s start by visualizing the “OpenStack Factory”

Factory Floor from Alpha Industries Wikipedia page

Imagine all of OpenStack’s 1000s of developers working together in a single giant start-up warehouse. Each project in its own floor area with appropriate fooz tables, break areas and coffee bars. It’s easy to visualize clusters of intent developers talking around tables or coding in dark corners while PTLs and TC members dash between groups coordinating work.

Expand the visualization so that we can actually see the code flowing between teams as little colored boxes. Giving project has a unique color allows us to quickly see dependencies between teams. Some features are piled up waiting for review inside teams while others are waiting on pallets between projects waiting on needed cross features have not completed. At release time, we’d be able to see PTLs sorting through stacks of completed boxes to pick which ones were ready to ship.

Watching a factory floor from above is a humbling experience and a key feature of systems thinking enlightenment in both The Phoenix Project and The Goal. It’s very easy to be caught up in a single project (local optimization) and miss the broader system implications of local choices.

There is a large body of work about Lean Process for Manufacturing

You’ve already visualized OpenStack code creation as a manufacturing floor: it’s a small step to accept that we can use the same proven processes for software and physical manufacturing.

As features move between teams (work centers), it becomes obvious that we’ve created a very highly interlocked sequence of component steps needed to deliver product; unfortunately, we have minimal coordination between the owners of the work centers. If a feature is needs a critical resource (think programmer) to progress then we rely on the resource to allocate time to the work. Since that person’s manager may not agree to the priority, we have a conflict between system flow and individual optimization.

That conflict destroys flow in the system.

The number #1 lesson from lean manufacturing is that putting individual optimization over system optimization reduces throughput. Since our product and people managers are often competitors, we need to work doubly hard to address system concerns. Worse yet our inventory of work in process and the interdependencies between projects is harder to discern. Unlike the manufacturing floor, our developers and project leads cannot look down upon it and see the physical work as it progresses from station to station in one single holistic view. The bottlenecks that throttle the OpenStack workflow are harder to see but we can find them, as can be demonstrated later in this post.

Until we can engage the resource owners in balancing system flow, OpenStack’s throughput will decline as we add resources. This same principle is at play in the famous aphorism: adding developers makes a late project later.

Is there a solution?

There are lessons from Lean Manufacturing that can be applied

Make quality a priority (expand tests from function to integration)
Ensure integration from station to station (prioritize working together over features)
Make sure that owners of work are coordinating (expose hidden influencers)
Find and mange from the bottleneck (classic Lean says find the bottleneck and improve that)
Create and monitor a system view
Have everyone value finished product, not workstation output

Added Subscript: I highly recommend reading Daniel Berrange’s email about this.

Back of the Napkin to Presentation in 30 seconds

Posted on August 8, 2014 by Rob H

I wanted to share a handy new process for creating presentations that I’ve been using lately that involves using cocktail napkins, smart phones and Google presentations.

Here’s the Process:

sketch an idea out with my colleagues on a napkin, whiteboard or notebook during our discussion.
snap a picture and upload it to my Google drive from my phone,
import the picture into my presentation using my phone,
tell my team that I’ve updated the presentation using Slack on my phone.

Clearly, this is not a finished presentation; however, it does serve to quickly capture critical content from a discussion without disrupting the flow of ideas. It also alerts everyone that we’re adding content and helps frame what that content will be as we polish it. When we immediately position the napkin into a deck, it creates clear action items and reference points for the team.

While blindingly simple, having a quick feedback loop and visual placeholders translates into improved team communication.

a Ready State analogy: “roughed in” brings it Home for non-ops-nerds

Posted on July 15, 2014 by Rob H

I’ve been seeing great acceptance on the concept of ops Ready State. Technologists from both ops and dev immediately understand the need to “draw a line in the sand” between system prep and installation. We also admit that getting physical infrastructure to Ready State is largely taken for granted; however, it often takes multiple attempts to get it right and even small application changes can require a full system rebuild.

Since even small changes can redefine the ready state requirements, changing Ready State can feel like being told to tear down your house so you remodel the kitchen.

A friend asked me to explain “Ready State” in non-technical terms. So far, the best analogy that I’ve found is when a house is “Roughed In.” It’s helpful if you’ve ever been part of house construction but may not be universally accessible so I’ll explain.

Getting to Rough In means that all of the basic infrastructure of the house is in place but nothing is finished. The foundation is poured, the plumbing lines are placed, the electrical mains are ready, the roof on and the walls are up. The house is being built according to architectural plans and major decisions like how many rooms there are and the function of the rooms (bathroom, kitchen, great room, etc). For Ready State, that’s like having the servers racked and setup with Disk, BIOS, and network configured.

While we’ve built a lot, rough in is a relatively early milestone in construction. Even major items like type of roof, siding and windows can still be changed. Speaking of windows, this is like installing an operating system in Ready State. We want to consider this as a distinct milestone because there’s still room to make changes. Once the roof and exteriors are added, it becomes much more disruptive and expensive to make.

Once the house is roughed in, the finishing work begins. Almost nothing from roughed in will be visible to the people living in the house. Like a Ready State setup, the users interact with what gets laid on top of the infrastructure. For homes it’s the walls, counters, fixtures and following. For operators, its applications like Hadoop, OpenStack or CloudFoundry.

Taking this analogy back to where we started, what if we could make rebuilding an entire house take just a day?! In construction, that’s simply not practical; however, we’re getting to a place in Ops where automation makes it possible to reconstruct the infrastructure configuration much faster.

While we can’t re-pour the foundation (aka swap out physical gear) instantly, we should be able to build up from there to ready state in a much more repeatable way.

Supply Chain Transparency drives Open Source adoption, 6 reasons besides cost

Posted on June 11, 2014 by Rob H

Author’s note: If you don’t believe that software is manufactured then go directly to your TRS80, do not collect $200.

I’m becoming increasingly impatient with people stating that “open source is about free software” because it’s blatantly untrue as a primary driver for corporate adoption. Adopting open source often requires companies (and individuals) to trade-off one cost (license expense) for another (building expertise). It is exactly the same balance we make between insourcing, partnering and outsourcing.

Full Speed Ahead

When I probe companies about what motivates their use of open source, they universally talk about transparency of delivery, non-single-vendor ownership of the source and their ability to influence as critical selection factors. They are generally willing to invest more to build expertise if it translates into these benefits. Viewed in this light, licensed software or closed services both cost more and introduce significant business risks where open alternatives exist.

This is not new: its basic manufacturing applied to IT

We had this same conversation in the 90s around manufacturing as that industry joltingly shifted from batch to just-in-time (aka Lean) manufacturing. The key driver for that transformation was improved integration and management of supply chains. We review witty doctoral dissertations about inventory, drum-buffer-rope flow and economic order quantity; however, trust my summary that it all comes down to companies need supply chain transparency.

As technology becomes more and more integral to delivering any type of product, companies must extend their need for supply chain transparency into their IT systems too. That does not mean that companies expect to self-generate (insource) all of their technology. The goal is to manage the supply chain, not to own every step. Smart companies find a balance between control of owning their supply (making it themselves) and finding a reliable supply (multi-source is preferred). If you cannot trust your suppliers then you must create inventory buffers and rigid contracts. Both of these defenses limit agility and drive systemic dysfunction. This was the lesson learned from Lean Just-In-Time manufacturing.

What does this look like for IT supply chains?

A healthy supply chain allows companies to address these issues. They can:

Change vendors / suppliers and get equivalent supply
Check the status of deliveries (features)
Review and impact quality
Take deliverables in small frequent batches
Collaborate with suppliers to manage & control the process
Get visibility into the pipeline

None of these items are specific to software; instead, they are general attributes of a strong supply chain. In a closed system, companies lose these critical supply chain values. While tightly integrated partnerships can provide these benefits, they carry a cost premium and inherently limit vendor choice.

This sounds great! What’s the cost?

You need to consider the level of supply chain transparency that’s right for you. Most companies are no more likely to refine their own metal than to build from pure open source repositories. There are transparency benefits from open source even from a single supplier. Yet in some cases like the OpenStack community, systems are so essential that they are warrant investing as core competencies and joining the contributing community. Even in those cases, most rely on vendors to package and extend their chosen open source software.

But that misses the point: contributing to an open source project is not required in managing your IT supply chain. Instead, you need to build the operational infrastructure and processes that is open source ready. They may require investing in skills and capabilities related to underlying technologies like the operating system, database or configuration management. For cloud, it is likely to require more investment fault-tolerant architecture and API driven deployment. Companies that are strong in these skills are better able to manage an open source IT supply chain. In fact, they are better able to manage any IT supply chain because they have more control.

So, it’s not about cost…

When considering motivations for open source adoption, cost (or technology sizzle) should not be the primary factor. In my experience, the most successful implementations focus first about operational readiness and project stability, and program transparency. These questions indicate companies are thinking with an IT supply chain focus.

PS: If you found this interesting, you’ll also like my upstream imperative post.

OpenCrowbar Design Principles: Emergent services [Series 5 of 6]

Posted on May 30, 2014 by Rob H

This is part 5 of 6 in a series discussing the principles behind the “ready state” and other concepts implemented in OpenCrowbar. The content is reposted from the OpenCrowbar docs repo.

Emergent services

We see data center operations as a duel between conflicting priorities. On one hand, the environment is constantly changing and systems must adapt quickly to these changes. On the other hand, users of the infrastructure expect it to provide stable and consistent services for consumption. We’ve described that as “always ready, never finished.”

Our solution to this duality to expect that the infrastructure Crowbar builds is decomposed into well-defined service layers that can be (re)assembled dynamically. Rather than require any component of the system to be in a ready state, Crowbar design principles assume that we can automate the construction of every level of the infrastructure from bios to network and application. Consequently, we can hold off (re)making decisions at the bottom levels until we’ve figured out that we’re doing at the top.

Effectively, we allow the overall infrastructure services configuration to evolve or emerge based on the desired end use. These concepts are built on computer science principles that we have appropriated for Ops use; since we also subscribe to Opscode “infrastructure as code”, we believe that these terms are fitting in a DevOps environment. In the next pages, we’ll explore the principles behind this approach including concepts around simulated annealing, late binding, attribute injection and emergent design.

Emergent (aka iterative or evolutionary) design challenges the traditional assumption that all factors must be known before starting

Dependency graph – multidimensional relationship
High degree of reuse via abstraction and isolation of service boundaries.
Increasing complexity of deployments means more dependencies
Increasing revision rates of dependencies but with higher stability of APIs

OpenCrowbar Design Principles: Reintroduction [Series 1 of 6]

Posted on May 28, 2014 by Rob H

While “ready state” as a concept has been getting a lot of positive response, I forget that much of the innovation and learning behind that concept never surfaced as posts here. The Anvil (2.0) release included the OpenCrowbar team cataloging our principles in docs. Now it’s time to repost the team’s work into a short series over the next three days.

In architecting the Crowbar operational model, we’ve consistently ~~twisted~~ adapted traditional computer science concepts like late binding, simulated annealing, emergent behavior, attribute injection and functional programming to create a repeatable platform for sharing open operations practice (post 2).

Functional DevOps aka “FuncOps”

Ok, maybe that’s not going to be the 70’s era hype bubble name, but… the operational model behind Crowbar is entering its third generation and its important to understand the state isolation and integration principles behind that model is closer to functional than declarative programming.

Parliament is Crowbar’s official FuncOps sound track

The model is critical because it shapes how Crowbar approaches the infrastructure at a fundamental level so it makes it easier to interact with the platform if you see how we are approaching operations. Crowbar’s goal is to create emergent services.

We’ll expore those topics in this series to explain Crowbar’s core architectural principles. Before we get into that, I’d like to review some history.

The Crowbar Objective

Crowbar delivers repeatable best practice deployments. Crowbar is not just about installation: we define success as a sustainable operations model where we continuously improve how people use their infrastructure. The complexity and pace of technology change is accelerating so we must have an approach that embraces continuous delivery.

Crowbar’s objective is to help operators become more efficient, stable and resilient over time.

Background

When Greg Althaus (github @GAlhtaus) and Rob “zehicle” Hirschfeld (github @CloudEdge) started the project, we had some very specific targets in mind. We’d been working towards using organic emergent swarming (think ants) to model continuous application deployment. We had also been struggling with the most routine foundational tasks (bios, raid, o/s install, networking, ops infrastructure) when bringing up early scale cloud & data applications. Another key contributor, Victor Lowther (github @VictorLowther) has critical experience in Linux operations, networking and dependency resolution that lead to made significant contributions around the Annealing and networking model. These backgrounds heavily influenced how we approached Crowbar.

First, we started with best of field DevOps infrastructure: Opscode Chef. There was already a remarkable open source community around this tool and an enthusiastic following for cloud and scale operators . Using Chef to do the majority of the installation left the Crowbar team to focus on

Key Features

Heterogeneous Operating Systems – chose which operating system you want to install on the target servers.
CMDB Flexibility (see picture) – don’t be locked in to a devops toolset. Attribute injection allows clean abstraction boundaries so you can use multiple tools (Chef and Puppet, playing together).
Ops Annealer –the orchestration at Crowbar’s heart combines the best of directed graphs with late binding and parallel execution. We believe annealing is the key ingredient for repeatable and OpenOps shared code upgrades
Upstream Friendly – infrastructure as code works best as a community practice and Crowbar use upstream code
without injecting “crowbarisms” that were previously required. So you can share your learning with the broader DevOps community even if they don’t use Crowbar.
Node Discovery (or not) – Crowbar maintains the same proven discovery image based approach that we used before, but we’ve streamlined and expanded it. You can use Crowbar’s API outside of the PXE discovery system to accommodate Docker containers, existing systems and VMs.
Hardware Configuration – Crowbar maintains the same optional hardware neutral approach to RAID and BIOS configuration. Configuring hardware with repeatability is difficult and requires much iterative testing. While our approach is open and generic, the team at Dell works hard to validate a on specific set of gear: it’s impossible to make statements beyond that test matrix.
Network Abstraction – Crowbar dramatically extended our DevOps network abstraction. We’ve learned that a networking is the key to success for deployment and upgrade so we’ve made Crowbar networking flexible and concise. Crowbar networking works with attribute injection so that you can avoid hardwiring networking into DevOps scripts.
Out of band control – when the Annealer hands off work, Crowbar gives the worker implementation flexibility to do it on the node (using SSH) or remotely (using an API). Making agents optional means allows operators and developers make the best choices for the actions that they need to take.
Technical Debt Paydown – We’ve also updated the Crowbar infrastructure to use the latest libraries like Ruby 2, Rails 4, Chef 11. Even more importantly, we’re dramatically simplified the code structure including in repo documentation and a Docker based developer environment that makes building a working Crowbar environment fast and repeatable.

OpenCrowbar (CB2) vs Crowbar (CB1)?

Why change to OpenCrowbar? This new generation of Crowbar is structurally different from Crowbar 1 and we’ve investing substantially in refactoring the tooling, paying down technical debt and cleanup up documentation. Since Crowbar 1 is still being actively developed, splitting the repositories allow both versions to progress with less confusion. The majority of the principles and deployment code is very similar, I think of Crowbar as a single community.

Continue Reading > post 2

OpenCrowbar.Anvil released – hammering out a gold standard in open bare metal provisioning

Posted on April 30, 2014 by Rob H

I’m excited to be announcing OpenCrowbar’s first release, Anvil, for the community. Looking back on our original design from June 2012, we’ve accomplished all of our original objectives and more.

Now that we’ve got the foundation ready, our next release (OpenCrowbar Broom) focuses on workload development on top of the stable Anvil base. This means that we’re ready to start working on OpenStack, Ceph and Hadoop. So far, we’ve limited engagement on workloads to ensure that those developers would not also be trying to keep up with core changes. We follow emergent design so I’m certain we’ll continue to evolve the core; however, we believe the Anvil release represents a solid foundation for workload development.

There is no more comprehensive open bare metal provisioning framework than OpenCrowbar. The project’s focus on a complete operations model that comprehends hardware and network configuration with just enough orchestration delivers on a system vision that sets it apart from any other tool. Yet, Crowbar also plays nicely with others by embracing, not replacing, DevOps tools like Chef and Puppet.

Now that the core is proven, we’re porting the Crowbar v1 RAID and BIOS configuration into OpenCrowbar. By design, we’ve kept hardware support separate from the core because we’ve learned that hardware generation cycles need to be independent from the operations control infrastructure. Decoupling them eliminates release disruptions that we experienced in Crowbar v1 and makes it much easier to use to incorporate hardware from a broad range of vendors.

Here are some key components of Anvil

UI, CLI and API stable and functional
Boot and discovery process working PLUS ability to handle pre-populating and configuration
Chef and Puppet capabilities including Birk Shelf v3 support to pull in community upstream DevOps scripts
Docker, VMs and Physical Servers
Crowbar’s famous “late-bound” approach to configuration and, critically, networking setup
IPv6 native, Ruby 2, Rails 4, preliminary scale tuning
Remarkably flexible and transparent orchestration (the Annealer)
Multi-OS Deployment capability, Ubuntu, CentOS, or Different versions of the same OS

Getting the workloads ported is still a tremendous amount of work but the rewards are tremendous. With OpenCrowbar, the community has a new way to collaborate and integration this work. It’s important to understand that while our goal is to start a quarterly release cycle for OpenCrowbar, the workload release cycles (including hardware) are NOT tied to OpenCrowbar. The workloads choose which OpenCrowbar release they target. From Crowbar v1, we’ve learned that Crowbar needed to be independent of the workload releases and so we want OpenCrowbar to focus on maintaining a strong ops platform.

This release marks four years of hard-earned Crowbar v1 deployment experience and two years of v2 design, redesign and implementation. I’ve talked with DevOps teams from all over the world and listened to their pains and needs. We have a long way to go before we’re deploying 1000 node OpenStack and Hadoop clusters, OpenCrowbar Anvil significantly moves the needle in that direction.

Thanks to the Crowbar community (Dell and SUSE especially) for nurturing the project, and congratulations to the OpenCrowbar team getting us this to this amazing place.

Mayflies and Dinosaurs (extending Puppies and Cattle)

Posted on March 17, 2014 by Rob H

Josh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Category Archives: Lean