How do platforms die? One step at a time [the Fidelity Gap]

The RackN team is working on the “Start to Scale” position for Digital Rebar that targets the IT industry-wide “fidelity gap” problem.  When we started on the Digital Rebar journey back in 2011 with Crowbar, we focused on “last mile” problems in metal and operations.  Only in the last few months did we recognize the importance of automating smaller “first mile” desktop and lab environments.

A fidelityFidelity Gap gap is created when work done on one platform, a developer laptop, does not translate faithfully to the next platform, a QA lab.   Since there are gaps at each stage of deployment, we end up with the ops staircase of despair.

These gaps hide defects until they are expensive to fix and make it hard to share improvements.  Even worse, they keep teams from collaborating.

With everyone trying out Container Orchestration platforms like Kubernetes, Docker Swarm, Mesosphere or Cloud Foundry (all of which we deploy, btw), it’s important that we can gracefully scale operational best practices.

For companies implementing containers, it’s not just about turning their apps into microservice-enabled immutable-rock stars: they also need to figure out how to implement the underlying platforms at scale.

My example of fidelity gap harm is OpenStack’s “all in one, single node” DevStack.  There is no useful single system OpenStack deployment; however, that is the primary system for developers and automated testing.  This design hides production defects and usability issues from developers.  These are issues that would be exposed quickly if the community required multi-instance development.  Even worse, it keeps developers from dealing with operational consequences of their decisions.

What are we doing about fidelity gaps?  We’ve made it possible to run and faithfully provision multi-node systems in Digital Rebar on a relatively light system (16 Gb RAM, 4 cores) using VMs or containers.  That system can then be fully automated with Ansible, Chef, Puppet and Salt.  Because of our abstractions, if deployment works in Digital Rebar then it can scale up to 100s of physical nodes.

My take away?  If you want to get to scale, start with the end in mind.

Introducing Digital Rebar. Building strong foundations for New Stack infrastructure

digital_rebarThis week, I have the privilege to showcase the emergence of RackN’s updated approach to data center infrastructure automation that is container-ready and drives “cloud-style” DevOps on physical metal.  While it works at scale, we’ve also ensured it’s light enough to run a production-fidelity deployment on a laptop.

You grow to cloud scale with a ready-state foundation that scales up at every step.  That’s exactly what we’re providing with Digital Rebar.

Over the past two years, the RackN team has been working on microservices operations orchestration in the OpenCrowbar code base.  By embracing these new tools and architecture, Digital Rebar takes that base into a new directions.  Yet, we also get to leverage a scalable heterogeneous provisioner and integrations for all major devops tools.  We began with critical data center automation already working.

Why Digital Rebar? Traditional data center ops is being disrupted by container and service architectures and legacy data centers are challenged with gracefully integrating this new way of managing containers at scale: we felt it was time to start a dialog the new foundational layer of scale ops.

Both our code and vision has substantially diverged from the groundbreaking “OpenStack Installer” MVP the RackN team members launched in 2011 from inside Dell and is still winning prizes for SUSE.

We have not regressed our leading vendor-neutral hardware discovery and configuration features; however, today, our discussions are about service wrappers, heterogeneous tooling, immutable container deployments and next generation platforms.

Over the next few days, I’ll be posting more about how Digital Rebar works (plus video demos).

RackN fills holes with Drill Release

Originally posted on :

Drill Man! by [creative commons] Drill Man! by [creative commons] We’re so excited about our in-process release that we’ve been relatively quiet about the last OpenCrowbar Drill release (video tour here).  That’s not a fair reflection of the level of capability and maturity reflected in the code base; yes, Drill’s purpose was to set the stage for truly ground breaking ops automation work in the next release (“Epoxy”).

So, what’s in Drill?  Scale and Containers on Metal Workloads!  [official release notes]

The primary focus for this release was proving our functional operations architectural pattern against a wide range of workloads and that is exactly what the RackN team has been doing with Ceph, Docker Swarm, Kubernetes, CloudFoundry and StackEngine workloads.

In addition to workloads, we put the platform through its paces in real ops environments at scale.  That resulted in even richer network configurations and options plus performance…

View original 100 more words

Deploy to Metal? No sweat with RackN new Ansible Dynamic Inventory API

Content originally posted by Ansibile & RackN so I added a video demo.  Also, see Ansible’s original post for more details about the multi-vendor “Simple OpenStack Initiative.”

The RackN team takes our already super easy Ansible integration to a new level with added SSH Key control and dynamic inventory with the recent OpenCrowbar v2.3 (Drill) release.  These two items make full metal control more accessible than ever for Ansible users.

The platform offers full key management.  You can add keys at the system. deployment (group of machines) and machine levels.  These keys are operator settable and can be added and removed after provisioning has been completed.  If you want to control access to groups on a servers or group of server basis, OpenCrowbar provides that control via our API, CLI and UI.

We also provide a API path for Ansible dynamic inventory.  Using the simple Python client script (reference example), you can instantly a complete upgraded node inventory of your system.  The inventory data includes items like number of disks, cpus and amount of RAM.  If you’ve grouped machines in OpenCrowbar, those groups are passed to Ansible.  Even better, the metadata schema includes the networking configuration and machine status.

With no added configuration, you can immediately use Ansible as your multi-server CLI for ad hoc actions and installation using playbooks.

Of course, the OpenCrowbar tools are also available if you need remote power control or want a quick reimage of the system.

RackN respects that data centers are heterogenous.  Our vision is that your choice of hardware, operating system and network topology should not break devops deployments!  That’s why we work hard to provide useful abstracted information.  We want to work with you to help make sure that OpenCrowbar provides the right details to create best practice installations.

For working with bare metal, there’s no simpler way to deliver consistent repeatable results

Transitioning from a Bossy Boss into a Digital Age Leader [Series Conclusion]


We hope you’ve enjoyed our discussion about digital management over the last seven posts. This series was born of our frustration with patterns of leadership in digital organizations: overly directing leaders stifle their team while hands-off leaders fail to provide critical direction. Neither culture is leading effectively!

Digital managers have to be two things at once

We felt that our “cultural intuition” is failing us.  That drove us to describe what’s broken and how to fix it.

Digital work and workers operate in a new model where top-down management is neither appropriate nor effective. To point, many digital workers actively resist being given too much direction, rules or structure. No, we are not throwing out management; on the contrary, we believe management is more important than ever, but changes to both work and workers has made it much harder than before.

That’s especially true when Boomers and Millennials try to work together because of differences in leadership experience and expectation. As Brad is always pointing out in his book Liquid Leadership, “what motivates a Millennial will not motivate a Boomer,” or even a Gen Xer.

Millennials may be so uncomfortable having to set limits and enforce decisions that they avoid exerting the very leadership that digital workers need! While GenX and Boomers may be creating and expecting unrealistic deadlines simply because they truly do not understand the depth of the work involved.

So who’s right and who’s wrong? As we’ve pointed out in previous posts, it’s neither! Why? Because unlike Industrial Age Models, there is no one way to get something done in The Information Age.

We desperately need a management model that works for everyone. How does a digital manager know when it’s time to be directing? If you’ve communicated a shared purpose well then you are always at liberty to 1) ask your team if this is aligned and 2) quickly stop any activity that is not aligned.

The trap we see for digital managers who have not communicated the shared goals is that they lack the team authority to take the lead.

We believe that digital leadership requires finding a middle ground using these three guidelines:

  1. Clearly express your intent and trust, don’t force, your team will follow it
  2. Respect your teams’ ability to make good decisions around the intent.
  3. Don’t be shy to exercise your authority when your team needs direction

Digital management is hard: you don’t get the luxury of authority or the comfort of certainty.

If you are used to directing then you have to trust yourself to communicate clearly at an abstract level and then let go of the details. If you are used to being hands-off then you have to get over being specific and assertive when the situation demands it.

Our frustration was that neither Boomer nor Millennial culture is providing effective management. Instead, we realized that elements of both are required. It’s up to the digital manager to learn when each mode is required.

Thank you for following along. It has been an honor.

DefCore Update – slowly taming the Interop hydra.

Last month, the OpenStack board charged the DefCore committee to tighten the specification. That means adding more required capabilities to the guidelines and reducing the number of exceptions (“flags”).  Read the official report by Chris Hoge.

Cartography by Dave McAlister is licensed under a. Creative Commons Attribution 4.0 International License.

It turns out interoperability is really, really hard in heterogenous environments because it’s not just about API – implementation choices change behavior.

I see this in both the cloud and physical layers. Since OpenStack is setup as a multi-vendor and multi-implementation (private/public) ecosystem, getting us back to a shared least common denominator is a monumental challenge. I also see a similar legacy in physical ops with OpenCrowbar where each environment is a snowflake and operators constantly reinvent the same tooling instead of sharing expertise.

Lack of commonality means the industry wastes significant effort recreating operational knowledge for marginal return. Increasing interop means reducing variations which, in turn, increases the stakes for vendors seeking differentiation.

We’ve been working on DefCore for years so that we could get to this point. Our first real Guideline, 2015.03, was an intentionally low bar with nearly half of the expected tests flagged as non-required. While the latest guidelines do not add new capabilities, they substantially reduce the number of exceptions granted. Further, we are in process of adding networking capabilities for the planned 2016.01 guideline (ready for community review at the Tokyo summit).

Even though these changes take a long time to become fully required for vendors, we can start testing interoperability of clouds using them immediately.

While, the DefCore guidelines via Foundation licensing policy does have teeth, vendors can take up to three years [1] to comply. That may sounds slow, but the real authority of the program comes from customer and vendor participation not enforcement [2].

For that reason, I’m proud that DefCore has become a truly diverse and broad initiative.

I’m further delighted by the leadership demonstrated by Egle Sigler, my co-chair, and Chris Hoge, the Foundation staff leading DefCore implementation.  Happily, their enthusiasm is also shared by many other people with long term DefCore investments including mid-cycle attendees Mark Volker (VMware), Catherine Deip (IBM) who is also a RefStack PTL, Shamail Tahir (EMC), Carol Barrett (Intel), Rocky Grober (Huawei), Van Lindberg (Rackspace), Mark Atwood (HP), Todd Moore (IBM), Vince Brunssen (IBM). We also had four DefCore related project PTLs join our mid-cycle: Kyle Mestery (Neutron), Nikhil Komawar (Glance),  John Dickinson (Swift), and Matthew Treinish (Tempest).

Thank you all for helping keep DefCore rolling and working together to tame the interoperability hydra!

[1] On the current schedule – changes will now take 1 year to become required – vendors have a three year tail! Three years? Since the last two Guideline are active, the fastest networking capabilities will be a required option is after 2016.01 is superseded in January 2017. Vendors who (re)license just before that can use the mark for 12 months (until January 2018!)

[2] How can we make this faster? Simple, consumers need to demand that their vendor pass the latest guidelines. DefCore provides Guidelines, but consumers checkbooks are the real power in the ecosystem.

When Two Right Decisions Make Things Wrong [Digital Management Series, 7 of 8]


The Duality Trap is one digital management danger that’s so destructive, we felt this series would be incomplete without a discussion. It’s especially problematic for The Digital Native managers and often mishandled by traditionally trained ones too.

Each apple is delicious. Which would you choose?

Each apple is delicious. Which would you choose?

The Duality Trap occurs when there are multiple right answers to a question. How often does this happen? Every single time. In fact, it’s a side effect of good digital management. Why?

In hierarchical management, the boss is always right so there’s no duality. Since we’ve thrown out hierarchical decision making, every team action is potentially subject to review by everyone on the team. The very loose structure that allows individual autonomy and rapid response has the natural consequence of also creating cognitive friction when individuals approach problems differently.

These different approaches are generally all valid ways to progress.

Digital natives fundamentally understand choice duality and may present alternatives just to ensure team diversity. Unfortunately, while where may be multiple valid solutions, the team can only pick one [1]. Nine times out of ten, the team will simply pick and move on. In that outlier case, they are counting on you, their digital manager, to resolve the selection.

Here’s the trap: resolving a duality does not mean “picking the winner” because having a winner implies the choices were unequal. If you’re team is stuck then there are at least two good choices.

If you are a traditional manager, the temptation to become Ronald “the decider” Reagan is nearly irresistible. Under the title=authority to decide model, you must justify your salary with making a “right” decision. You’ve been waiting for this moment to exert your authority for days. But, unbeknownst to “the decider,” this big moment will immediately undermine the team’s autonomy. On the other hand, If you are a digital native then this is the moment you’ve been dreading because you’ve got to be decisive. Despite 5 to 10 really good choices, you have to make ONE. So, a digital native can appear to be indecisive. However, not deciding is the worst possible choice. So what should you do?

First, remember that teams are strengthened when they are clearly aligned around an intent.

Resolving the duality trap is an opportunity to emphasize your intent. The best approach is to ask your team to review the options again in light of your shared objectives. In many cases, they will be able to resolve the issue from that perspective. If not, then you should:

  1. validate all options could work
  2. have the team state desired outcomes that can be measured
  3. pick the option that most aligns with your intent
  4. ask if the option your team does choose fit the overall agenda of; speed of delivery but quality drops, quality of deep diving into the project (upping the quality) but you may miss a crucial deadline (this may narrow down your choices.
  5. ask the team to monitor for the results

In this case, even as you are driving a decision, you are still sharing the responsibility for the outcome with the team. It’s important for the team that you focus on the desired results and not on which course was chosen. It is very likely that any of the choices would work out and achieve positive outcomes.

So it’s OK to get out of the trap of picking “best” options when there are multiple right choices.  

In an age of ambiguity, it is easy to fall into the duality trap. Just remember, there is no one way to get it all done these days. Which means a GREAT people manager realizes 2 things; a) your people need more of your support than ever. This comes in the form of training, finding solutions, and building a team that has the right chemistry. And b) getting out of their way.

Get ready as we wrap up this series in post 8: Transitioning from a Bossy Boss into a Digital Age Leader.

[1] If you are in a situation where you an allow divergence for minimal cost (like which phone brand people use) then do not force your team to choose!