DefCore Update – slowly taming the Interop hydra.

Last month, the OpenStack board charged the DefCore committee to tighten the specification. That means adding more required capabilities to the guidelines and reducing the number of exceptions (“flags”).  Read the official report by Chris Hoge.

Cartography by Dave McAlister is licensed under a. Creative Commons Attribution 4.0 International License.

It turns out interoperability is really, really hard in heterogenous environments because it’s not just about API – implementation choices change behavior.

I see this in both the cloud and physical layers. Since OpenStack is setup as a multi-vendor and multi-implementation (private/public) ecosystem, getting us back to a shared least common denominator is a monumental challenge. I also see a similar legacy in physical ops with OpenCrowbar where each environment is a snowflake and operators constantly reinvent the same tooling instead of sharing expertise.

Lack of commonality means the industry wastes significant effort recreating operational knowledge for marginal return. Increasing interop means reducing variations which, in turn, increases the stakes for vendors seeking differentiation.

We’ve been working on DefCore for years so that we could get to this point. Our first real Guideline, 2015.03, was an intentionally low bar with nearly half of the expected tests flagged as non-required. While the latest guidelines do not add new capabilities, they substantially reduce the number of exceptions granted. Further, we are in process of adding networking capabilities for the planned 2016.01 guideline (ready for community review at the Tokyo summit).

Even though these changes take a long time to become fully required for vendors, we can start testing interoperability of clouds using them immediately.

While, the DefCore guidelines via Foundation licensing policy does have teeth, vendors can take up to three years [1] to comply. That may sounds slow, but the real authority of the program comes from customer and vendor participation not enforcement [2].

For that reason, I’m proud that DefCore has become a truly diverse and broad initiative.

I’m further delighted by the leadership demonstrated by Egle Sigler, my co-chair, and Chris Hoge, the Foundation staff leading DefCore implementation.  Happily, their enthusiasm is also shared by many other people with long term DefCore investments including mid-cycle attendees Mark Volker (VMware), Catherine Deip (IBM) who is also a RefStack PTL, Shamail Tahir (EMC), Carol Barrett (Intel), Rocky Grober (Huawei), Van Lindberg (Rackspace), Mark Atwood (HP), Todd Moore (IBM), Vince Brunssen (IBM). We also had four DefCore related project PTLs join our mid-cycle: Kyle Mestery (Neutron), Nikhil Komawar (Glance),  John Dickinson (Swift), and Matthew Treinish (Tempest).

Thank you all for helping keep DefCore rolling and working together to tame the interoperability hydra!

[1] On the current schedule – changes will now take 1 year to become required – vendors have a three year tail! Three years? Since the last two Guideline are active, the fastest networking capabilities will be a required option is after 2016.01 is superseded in January 2017. Vendors who (re)license just before that can use the mark for 12 months (until January 2018!)

[2] How can we make this faster? Simple, consumers need to demand that their vendor pass the latest guidelines. DefCore provides Guidelines, but consumers checkbooks are the real power in the ecosystem.

When Two Right Decisions Make Things Wrong [Digital Management Series, 7 of 8]


The Duality Trap is one digital management danger that’s so destructive, we felt this series would be incomplete without a discussion. It’s especially problematic for The Digital Native managers and often mishandled by traditionally trained ones too.

Each apple is delicious. Which would you choose?

Each apple is delicious. Which would you choose?

The Duality Trap occurs when there are multiple right answers to a question. How often does this happen? Every single time. In fact, it’s a side effect of good digital management. Why?

In hierarchical management, the boss is always right so there’s no duality. Since we’ve thrown out hierarchical decision making, every team action is potentially subject to review by everyone on the team. The very loose structure that allows individual autonomy and rapid response has the natural consequence of also creating cognitive friction when individuals approach problems differently.

These different approaches are generally all valid ways to progress.

Digital natives fundamentally understand choice duality and may present alternatives just to ensure team diversity. Unfortunately, while where may be multiple valid solutions, the team can only pick one [1]. Nine times out of ten, the team will simply pick and move on. In that outlier case, they are counting on you, their digital manager, to resolve the selection.

Here’s the trap: resolving a duality does not mean “picking the winner” because having a winner implies the choices were unequal. If you’re team is stuck then there are at least two good choices.

If you are a traditional manager, the temptation to become Ronald “the decider” Reagan is nearly irresistible. Under the title=authority to decide model, you must justify your salary with making a “right” decision. You’ve been waiting for this moment to exert your authority for days. But, unbeknownst to “the decider,” this big moment will immediately undermine the team’s autonomy. On the other hand, If you are a digital native then this is the moment you’ve been dreading because you’ve got to be decisive. Despite 5 to 10 really good choices, you have to make ONE. So, a digital native can appear to be indecisive. However, not deciding is the worst possible choice. So what should you do?

First, remember that teams are strengthened when they are clearly aligned around an intent.

Resolving the duality trap is an opportunity to emphasize your intent. The best approach is to ask your team to review the options again in light of your shared objectives. In many cases, they will be able to resolve the issue from that perspective. If not, then you should:

  1. validate all options could work
  2. have the team state desired outcomes that can be measured
  3. pick the option that most aligns with your intent
  4. ask if the option your team does choose fit the overall agenda of; speed of delivery but quality drops, quality of deep diving into the project (upping the quality) but you may miss a crucial deadline (this may narrow down your choices.
  5. ask the team to monitor for the results

In this case, even as you are driving a decision, you are still sharing the responsibility for the outcome with the team. It’s important for the team that you focus on the desired results and not on which course was chosen. It is very likely that any of the choices would work out and achieve positive outcomes.

So it’s OK to get out of the trap of picking “best” options when there are multiple right choices.  

In an age of ambiguity, it is easy to fall into the duality trap. Just remember, there is no one way to get it all done these days. Which means a GREAT people manager realizes 2 things; a) your people need more of your support than ever. This comes in the form of training, finding solutions, and building a team that has the right chemistry. And b) getting out of their way.

Get ready as we wrap up this series in post 8: Transitioning from a Bossy Boss into a Digital Age Leader.

[1] If you are in a situation where you an allow divergence for minimal cost (like which phone brand people use) then do not force your team to choose!

Setting The Tempo: 12 Tips for Winning at Digital Management [post 6 of 8]


Our advice comes down to very simple concept: Today’s leaders MUST walk the talk.

Drummers Get The GirlsManagement authority in digital work comes from being the owner of the intention. Your team is working towards a shared goal. That is their motivation and it’s required for digital managers to provide a clear goal – this is what we call the intent of your organization.  So a manager’s job comes down to sharing your organization’s intent.

Like the 80’s “management by walking around,” walking the intent means that you spend most of your time helping your team understand the goals, not telling them how to achieve greatness. Managers provide alignment, not direction.

What does digital management look like:

  1. Pick a tone and repeat, repeat, repeat – You are the Jazz leader setting the tempo and harmony, your consistency allows others to improvise. If you set the stage, you can encourage others to take the lead off your base. Strong management is not about control. Strong management is about support. Support that streamlines productivity.
  2. Encourage cross-communication – Better, make people talk to each other. it’s OK to proxy, but don’t carry opinions for your reports as if they were your own. And don’t be upset if someone goes “above” you in the hierarchy. There is no such thing anymore.
  3. 1-to-1 communication is healthy – do a lot of it. 1) Don’t make decisions that way. 2) Don’t get stuck having 1-to-1 with the same people. 3) a lot of informal/small interactions are OK. Diversity is key. You may have to reply/rehash/proxy a whole 1-to-1 discussion for your team
  4. Learn your Culture – This may be the hardest thing for leaders to do because if they always assumed that culture didn’t matter. In today’s work environments, culture matters more than you could imagine. Just ask Peter Drucker!  Knowing who does what is important. Knowing how each individual communicates and what their strengths and weaknesses are is even more important.
  5. “Yes, AND…” The cornerstone of Improv is about saying yes to ideas, even fragile ones. Then it becomes about testing, experimenting and pushing boundaries. This is where innovation comes from. Saying yes and, instead of no but, ensures things get customized. Yes, you might fail, but fail fast, and move on.
  6. Be forceful on time keeping – make sure debates and discussions have known upfront limitations. Movement is good, uncertainty is frustrating.
  7. Check and adjust – check and don’t change is just as important. The key is to involve your team in the check-ups.  When you decide not to adjust, that’s also a decision to communicate.
  8. Don’t apologize for or delay making top down decisions – not all actions are team discussions. Sometimes, the team process is tiring and hard so the most strident voice wins.  No team always agrees so don’t be afraid to play the role of arbitrator.
  9. Fix personnel issues quickly – allowing people to abuse the system drives away the behaviors that you want. Focus instead on strengths, and become the mediator.  Be very sensitive to stereotypes and even mild no name calling. Focus on the work, the outcomes and how everyone can do better. then hold them accountable to their word.
  10. Ask people to define their own expected results – then keep them accountable. When they miss, have no-blame a post-mortem that focus on improvement. A term called the Feedback Sandwich helps by starting a difficult conversation with something a team member did right, then work your way through the conversation to the “meat” part of the sandwich: what they did that needed help, improvement or an admission that they might NOT be the person best qualified for that task. Let them state this on their own by asking better questions.
  11. Assume failures are from system, not individual – work together to fix the system. Communication and hand off are usually the biggest fails when meeting deadlines. Find solutions from the team. after all, who knows development operations better than the people working in it.
  12. Be careful about highlighting “grenade divers” [1] – All organizations need heroes, but feeding them will erode team performance. Once, they may have saved the day. When it becomes a habit, they might be creating the chaos they are always solving in order to have job security. After all, they seem to be the only one who can solve that problem…every time. In a symphony only a few get the solo. In Jazz, you play both solo and support. That flexibility gives your team strength.

These ideas may push your outside your comfort zone.  Find a peer for support!  You need to to be strong to lead from the back.  

Even without formal hierarchies, manager roles are still needed to drive value and make the hard calls. Before, that translated into make all the decisions. The new challenge is to allow for free falls (post 4) while sharing the responsibility.

If you walk your intent and communicate goals consistently then your team will be able to follow your lead.

Next up: When Two Right Decisions Make Things Wrong

[1] Grenade Diving or “wearing the cape” is a team anti-pattern where certain individuals are compelled to take dramatic actions to rescue an adverse situation.  While they often appear to be team heroes (Brad saved the batch of cookies again!  Who forget to set the timer?), the result always distracts from the people who work hard to avoid emergencies.  We want people to step up when required but it should not become a pattern.

OpenCrowbar 2.3 (Drill) Overview Videos

Last week, Scott Jensen, RackN COO, uploaded a batch of OpenCrowbar install and demo videos.  I’ve presented them in reverse chronological order so you can see what OpenCrowbar looks like before you run the installation process.

But…If you want to start downloading while you watch, here are the docs.

Please reach out on chat, email or irc (Freenode #crowbar) channels during your install and let us know how it’s going!

OpenCrowbar Basics & Provisioning (recommended start)

OpenCrowbar Install

OpenCrowbar Setup the Environment (install prep)

As Docker rises above (and disrupts) clouds, I’m thinking about their community landscape

Watching the lovefest of DockerConf last week had me digging up my April 2014 “Can’t Contain(erize) the Hype” post.  There’s no doubt that Docker (and containers more broadly) is delivering on it’s promise.  I was impressed with the container community navigating towards an open platform in RunC and vendor adoption of the trusted container platforms.

I’m a fan of containers and their potential; yet, remotely watching the scope and exuberance of Docker partnerships seems out of proportion with the current capabilities of the technology.

The latest update to the Docker technology, v1.7, introduces a lot of important network, security and storage features.  The price of all that progress is disruption to ongoing work and integration to the ecosystem.

There’s always two sides to the rapid innovation coin: “Sweet, new features!  Meh, breaking changes to absorb.”

Docker Ecosystem Explained

Docker Ecosystem Explained

There remains a confusion between Docker the company and Docker the technology.  I like how the chart (right) maps out potential areas in the Docker ecosystem.  There’s clearly a lot of places for companies to monetize the technology; however, it’s not as clear if the company will be able to secede lucrative regions, like orchestration, to become a competitive landscape.

While Docker has clearly delivered a lot of value in just a year, they have a fair share of challenges ahead.  

If OpenStack is a leading indicator, we can expect to see vendor battlegrounds forming around networking and storage.  Docker (the company) has a chance to show leadership and build community here yet could cause harm by giving up the arbitrator role be a contender instead.

One thing that would help control the inevitable border skirmishes will be clear definitions of core, ecosystem and adjacencies.  I see Docker blurring these lines with some of their tools around orchestration, networking and storage.  I believe that was part of their now-suspended kerfuffle with CoreOS.

Thinking a step further, parts of the Docker technology (RunC) have moved over to Linux Foundation governance.  I wonder if the community will drive additional shared components into open governance.  Looking at Node.js, there’s clear precedent and I wonder if Joyent’s big Docker plans have them thinking along these lines.

Is there something between a Container and VM? Apparently, yes.

The RackN team has started designing reference architectures for containers on metal (discussed on with the hope of finding hardware design that is cost and performance optimized for containers instead of simply repurposing premium virtualized cloud infrastructure.  That discussion turned up something unexpected…

That post generated a twitter thread that surfaced and ClearLinux as hardware enabled (Intel VT-x) alternatives to containers.

This container alternative likely escapes notice of many because it requires hardware capabilities that are not/partially exposed inside cloud virtual machines; however, it could be a very compelling story for operators looking for containers on metal.

Here’s my basic understanding: these technologies offer container-like light-weight & elastic behavior with the isolation provided by virtual machines.  This is possible because they use CPU capabilities to isolate environments.

7/3 Update: Feedback about this post has largely been “making it easier for VMs to run docker automatically is not interesting.”  What’s your take on it?

Details behind RackN Kubernetes Workload for OpenCrowbar

Since I’ve already bragged about how this workload validates OpenCrowbar’s deep ops impact, I can get right down to the nuts and bolts of what RackN CTO Greg Althaus managed to pack into this workload.

Like any scale install, once you’ve got a solid foundation, the actual installation goes pretty quickly.  In Kubernetes’ case, that means creating strong networking and etcd configuration.

Here’s a 30 minute video showing the complete process from O/S install to working Kubernetes:

Here are the details:

Clustered etcd – distributed key store

etcd is the central data service that maintains the state for the Kubernetes deployment.  The strength of the installation rests on the correctness of etcd.  The workload builds an etcd cluster and synchronizes all the instances as nodes are added.

Networking with Flannel and Proxy

Flannel is the default overlay network for Kubernetes that handles IP assignment and intercontainer communication with UDP encapsulation.  The workload configures Flannel as for networking with etcd as the backing store.

An important part of the overall networking setup is the configuration of a proxy so that the nodes can get external access for Docker image repos.

Docker Setup

We install the latest Docker on the system.  That may not sound very exciting; however, Docker iterates faster than most Linux images so it’s important that we keep you current.

Master & Minion Kubernetes Nodes

Using etcd as a backend, the workload sets up one (or more) master nodes with the API server and other master services.  When the minions are configured, they are pointed to the master API server(s).  You get to choose how many masters and which systems become masters.  If you did not choose correctly, it’s easy to rinse and repeat.

Highly Available using DNS Round Robin

As the workload configures API servers, it also adds them to a DNS round robin pool (made possible by [new DNS integrations]).  Minions are configured to use the shared DNS name so that they automatically round-robin all the available API servers.  This ensures both load balancing and high availability.  The pool is automatically updated when you add or remove servers.

Installed on Real Metal

It’s worth including that we’ve done cluster deployments of 20 physical nodes (with 80 in process!).  Since OpenCrowbar architecture abstracts the vendor hardware, the configuration is multi-vendor and heterogenous.  That means that this workload (and our others) delivers tangible scale implementations quickly and reliably.

Future Work for Advanced Networking

Flannel is really very basic SDN.  We’d like to see additional networking integrations including OpenContrail as per Pedro Marques work.

At this time, we are not securing communication with etcd.  This requires advanced key management is a more advanced topic.

Why is RackN building this?  We are a physical ops automation company.

We are seeking to advance the state of data center operations by helping get complex scale platforms operationalized.  We want to work with the relevant communities to deliver repeatable best practices around next-generation platforms like Kubernetes.  Our speciality is in creating a general environment for ops success: we work with partners who are experts on using the platforms.

We want to engage with potential users before we turn this into a open community project; however, we’ve chosen to make the code public.  Please get us involved (community forum)!  You’ll need a working OpenCrowbar or RackN Enterprise install as a pre-req and we want to help you be successful.