Open Source as Reality TV and Burning Data Centers [gcOnDemand podcast notes]

During the OpenStack summit, Eric Wright (@discoposse) and I talked about a wide range of topics from scoring success of OpenStack early goals to burning down traditional data centers.

Why burn down your data center (and move to public cloud)? Because your ops process are too hard to change. Rob talks about how hybrid provides a path if we can made ops more composable.

Here are my notes from the audio podcast (source):

1:30 Why “zehicle” as a handle? Portmanteau from electrics cars… zero + vehicle

Let’s talk about OpenStack & Cloud…

  • OpenStack History
    • 2:15 Rob’s OpenStack history from Dell and Hyperscale
    • 3:20 Early thoughts of a Cloud API that could be reused
    • 3:40 The practical danger of Vendor lock-in
    • 4:30 How we implemented “no main corporate owner” by choice
  • About the Open in OpenStack
    • 5:20 Rob decomposes what “open” means because there are multiple meanings
    • 6:10 Price of having all open tools for “always open” choice and process
    • 7:10 Observation that OpenStack values having open over delivering product
    • 8:15 Community is great but a trade off. We prioritize it over implementation.
  • Q: 9:10 What if we started later? Would Docker make an impact?
    • Part of challenge for OpenStack was teaching vendors & corporate consumers “how to open source”
  • Q: 10:40 Did we accomplish what we wanted from the first summit?
    • Mixed results – some things we exceeded (like growing community) while some are behind (product adoption & interoperability).
  • 13:30 Interop, Refstack and Defcore Challenges. Rob is disappointed on interop based on implementations.
  • Q: 15:00 Who completes with OpenStack?
    • There are real alternatives. APIs do not matter as much as we thought.
    • 15:50 OpenStack vendor support is powerful
  • Q: 16:20 What makes OpenStack successful?
    • Big tent confuses the ecosystem & push the goal posts out
    • “Big community” is not a good definition of success for the project.
  • 18:10 Reality TV of open source – people like watching train wrecks
  • 18:45 Hybrid is the reality for IT users
  • 20:10 We have a need to define core and focus on composability. Rob has been focused on the link between hybrid and composability.
  • 22:10 Rob’s preference is that OpenStack would be smaller. Big tent is really ecosystem projects and we want that ecosystem to be multi-cloud.

Now, about RackN, bare metal, Crowbar and Digital Rebar….

  • 23:30 (re)Intro
  • 24:30 VC market is not metal friendly even though everything runs on metal!
  • 25:00 Lack of consistency translates into lack of shared ops
  • 25:30 Crowbar was an MVP – the key is to understand what we learned from it
  • 26:00 Digital Rebar started with composability and focus on operations
  • 27:00 What is hybrid now? Not just private to public.
  • 30:00 How do we make infrastructure not matter? Multi-dimensional hybrid.
  • 31:00 Digital Rebar is orchestration for composable infrastructure.
  • Q: 31:40 Do people get it?
    • Yes. Automation is moving to hybrid devops – “ops is ops” and it should not matter if it’s cloud or metal.
  • 32:15 “I don’t want to burn down my data center” – can you bring cloud ops to my private data center?

my 8 steps that would improve OpenStack Interop w/ AWS

I’ve been talking with a lot of OpenStack people about frustrating my attempted hybrid work on seven OpenStack clouds [OpenStack Session Wed 2:40].  This post documents the behavior Digital Rebar expects from the multiple clouds that we have integrated with so far.  At RackN, we use this pattern for both cloud and physical automation.

Sunday, I found myself back in front of the the Board talking about the challenge that implementation variation creates for users.  Ultimately, the question “does this harm users?” is answered by “no, they just leave for Amazon.”

I can’t stress this enough: it’s not about APIs!  The challenge is twofold: implementation variance between OpenStack clouds and variance between OpenStack and AWS.

The obvious and simplest answer is that OpenStack implementers need to conform more closely to AWS patterns (once again, NOT the APIs).

Here are the eight Digital Rebar node allocation steps [and my notes about general availability on OpenStack clouds]:

  1. Add node specific SSH key [YES]
  2. Get Metadata on Networks, Flavors and Images [YES]
  3. Pick correct network, flavors and images [NO, each site is distinct]
  4. Request node [YES]
  5. Get node PUBLIC address for node [NO, most OpenStack clouds do not have external access by default]
  6. Login into system using node SSH key [PARTIAL, the account name varies]
  7. Add root account with Rebar SSH key(s) and remove password login [PARTIAL, does not work on some systems]
  8. Remove node specific SSH key [YES]

These steps work on every other cloud infrastructure that we’ve used.  And they are achievable on OpenStack – DreamHost delivered this experience on their new DreamCompute infrastructure.

I think that this is very achievable for OpenStack, but we’re doing to have to drive conformance and figure out an alternative to the Floating IP (FIP) pattern (IPv6, port forwarding, or adding FIPs by default) would all work as part of the solution.

For Digital Rebar, the quick answer is to simply allocate a FIP for every node.  We can easily make this a configuration option; however, it feels like a pattern fail to me.  It’s certainly not a requirement from other clouds.

I hope this post provides specifics about delivering a more portable hybrid experience.  What critical items do you want as part of your cloud ops process?

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.


At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!


One Cloud, Many Providers: The OpenStack Interop Challenge

We want OpenStack to work as a universal cloud API but it’s hard!  What’s the problem? 

Clouds DownThis post, written before the Tokyo Summit but not published, talks about how we got here without a common standard and offers some pointers.  At the Austin Summit, I’ve got a talk on hybrid Open Infrastructure Wednesday @ 2:40 where I talk specifically about solutions.  I’ve been working on multi-infrastructure hybrid – that means making ops portable between OpenStack, Google, Amazon, Physical and other options.

The Problem: At a fundamental level, OpenStack has yet to decide if it’s an infrastructure (IaaS) product or a open software movement.

Is there a path to be both? There are many vendors who are eager to sell you their distinct flavor of OpenStack; however, lack of consistency has frustrated users and operators. OpenStack faces internal and external competition if we do not address this fragmentation. Over the next few paragraphs, we’ll explore the path the Foundation has planned to offer users a consistent core product while fostering its innovative community.

How did we get down this path?  Here’s some background how how we got here.

Before we can discuss interoperability (interop), we need to define success for OpenStack because interop is the means, not the end. My mark for success is when OpenStack has created a sustainable market for products that rely on the platform. In business plan speak, we’d call that a serviceable available market (SAM). In practical terms, OpenStack is successful when businesses targets the platform as the first priority when building integrations over cloud behemoths: Amazon and VMware.

The apparent dominance of OpenStack in terms of corporate contribution and brand position does not translate into automatic long term success.  

While apparently united under a single brand, intentional technical diversity in OpenStack has lead to incompatibilities between different public and private implementations. While some of these issues are accidents of miscommunication, others were created by structural choices inherent to the project’s formation. No matter the causes, they frustrate users and limit the network effect of widespread adoption.

Technical diversity was both a business imperative and design objective for OpenStack during formation.

In order to quickly achieve critical mass, the project needed to welcome a diverse and competitive set of corporate sponsors. The commitment to support operating systems, multiple hypervisors, storage platforms and networking providers has been essential to the project’s growth and popularity. Unfortunately, it also creates combinatorial complexity and political headaches.

With all those variables, it’s best to think of interop as a spectrum.  

At the top of that spectrum is basic API compatibility and the boom is fully integrated operation where an application could run site unaware in multiple clouds simultaneously.  Experience shows that basic API compatibility is not sufficient: there are significant behavioral impacts due to implementation details just below the API layer that must also be part of any interop requirement.  Variations like how IPs are assigned and machines are initialized matter to both users and tools.  Any effort to ensure consistency must go beyond simple API checks to validate that these behaviors are consistent.

OpenStack enforces interop using a process known as DefCore which vendors are required to follow in order to use the trademark “OpenStack” in their product name.

The process is test driven – vendors are required to pass a suite of capability tests defined in DefCore Guidelines to get Foundation approval. Guidelines are published on a 6 month cadence and cover only a “core” part of OpenStack that DefCore has defined as the required minimum set. Vendors are encouraged to add and extend beyond that base which then leads for DefCore to expand the core based on seeing widespread adoption.

What is DefCore?  Here’s some background about that too!

By design, DefCore started with a very small set of OpenStack functionality.  So small in fact, that there were critical missing pieces like networking APIs from the initial guideline.  The goal for DefCore is to work through the coabout mmunity process to expand based identified best practices and needed capabilities.  Since OpenStack encourages variation, there will be times when we have to either accept or limit variation.  Like any shared requirements guideline, DefCore becomes a multi-vendor contract between the project and its users.

Can this work?  The reality is that Foundation enforcement of the Brand using DefCore is really a very weak lever. The real power of DefCore comes when customers use it to select vendors.

Your responsibility in this process is to demand compliance from your vendors. OpenStack interoperability efforts will fail if we rely on the Foundation to enforce compliance because it’s simply too late at that point. Think of the healthy multi-vendor WiFi environment: vendors often introduce products on preliminary specifications to get ahead of market. For success, OpenStack vendors also need to be racing to comply with the upcoming guidelines. Only customers can create that type of pressure.

From that perspective, OpenStack will only be as interoperable as the market demands.

That creates a difficult conundrum: without common standards, there’s no market and OpenStack will simply become vertical vendor islands with limited reach. Success requires putting shared interests ahead of product.

That brings us full circle: does OpenStack need to be both a product and a community?  Yes, it clearly does.  

The significant distinction for interop is that we are talking about a focus on the user community voice over the vendor and developer community.  For that, OpenStack needs to focus on product definition to grow users.

I want to thank Egle Sigler and Shamail Tahir for their early review of this post.  Even beyond the specific content, they have helped shape my views on this topics.  Now, I’d like to hear your thoughts about this!  We need to work together to address Interoperability – it’s a community thing.

OpenStack Brief Histories: Austin 2011 and DefCore

These two short items are sidebars for my “One Cloud, Many Providers: The OpenStack Interp Challenge” post.  They provide additional context for the more focused question in the post: “At a fundamental level, OpenStack has yet to decide if it’s an infrastructure product or a open software movement. Is there a path to be both?” 

Background 1: OpenStack, The Early Days

How did we get here?  It’s worth noting that 2011 OpenStack was structured as a heterogenous vendor playground.  At the inaugural OpenStack summit in Austin when the project was just forming around NASA’s Nova and Rackspace’s Swift projects, monolithic cloud stacks were a very real threat.  VMware and Amazon were the de facto standards but closed and proprietary.  The open alternatives, CloudStack (, Eucalyptus and OpenNebula were too tied to single vendors or lacking in scale.  Having a multi-vendor, multi-contributor project without a dictatorial owner was a critical imperative for the community and it continues to be one of the most distinctive OpenStack traits.

Background 2:  DefCore, The Community Interoperability Process

What is DefCore?  The name DefCore is a portmanteau of the committee’s job to “define core” functions of OpenStack.  The official explanation says “DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack.”  Fundamentally, it’s an OpenStack Board committee with membership open to the community.  In very practical terms, DefCore picks which features and implementation details of OpenStack are required by the vendors; consequently, we’ve designed a governance process to ensure transparency and, hopefully, prevent individual vendors from exerting too much influence.

Post-OpenStack DefCore, I’m Chasing “open infrastructure” via cross-platform Interop

Like my previous DefCore interop windmill tilting, this is not something that can be done alone. Open infrastructure is a collaborative effort and I’m looking for your help and support. I believe solving this problem benefits us as an industry and individually as IT professionals.

2013-09-13_18-56-39_197So, what is open infrastructure?   It’s not about running on open source software. It’s about creating platform choice and control. In my experience, that’s what defines open for users (and developers are not users).

I’ve spent several years helping lead OpenStack interoperability (aka DefCore) efforts to ensure that OpenStack cloud APIs are consistent between vendors. I strongly believe that effort is essential to build an ecosystem around the project; however, in talking to enterprise users, I’ve learned that that their  real  interoperability gap is between that many platforms, AWS, Google, VMware, OpenStack and Metal, that they use everyday.

Instead of focusing inward to one platform, I believe the bigger enterprise need is to address automation across platforms. It is something I’m starting to call hybrid DevOps because it allows users to mix platforms, service APIs and tools.

Open infrastructure in that context is being able to work across platforms without being tied into one platform choice even when that platform is based on open source software. API duplication is not sufficient: the operational characteristics of each platform are different enough that we need a different abstraction approach.

We have to be able to compose automation in a way that tolerates substitution based on infrastructure characteristics. This is required for metal because of variation between hardware vendors and data center networking and services. It is equally essential for cloud because of variation between IaaS capabilities and service delivery models. Basically, those  minor  differences between clouds create significant challenges in interoperability at the operational level.

Rationalizing APIs does little to address these more structural differences.

The problem is compounded because the differences are not nicely segmented behind abstraction layers. If you work to build and sustain a fully integrated application, you must account for site specific needs throughout your application stack including networking, storage, access and security. I’ve described this as all deployments have 80% of the work common but the remaining 20% is mixed in with the 80% instead of being nicely layers. So, ops is cookie dough not vinaigrette.

Getting past this problem for initial provisioning on a single platform is a false victory. The real need is portable and upgrade-ready automation that can be reused and shared. Critically, we also need to build upon the existing foundations instead of requiring a blank slate. There is openness value in heterogeneous infrastructure so we need to embrace variation and design accordingly.

This is the vision the RackN team has been working towards with open source Digital Rebar project. We now able to showcase workload deployments (Docker, Kubernetes, Ceph, etc) on multiple cloud platforms that also translate to full bare metal deployments. Unlike previous generations of this tooling (some will remember Crowbar), we’ve been careful to avoid injecting external dependencies into the DevOps scripts.

While we’re able to demonstrate a high degree of portability (or fidelity) across multiple platforms, this is just the beginning. We are looking for users and collaborators who want to want to build open infrastructure from an operational perspective.

You are invited to join us in making open cross-platform operations a reality.

APIs and Implementations collide at OpenStack Interop: The Oracle Zones vs VMs Debate

I strive to stay neutral as OpenStack DefCore co-chair; however, as someone asking for another Board term, it’s important to review my thinking so that you can make an informed voting decision.

DefCore, while always on the edge of controversy, recently became ground zero for the “what is OpenStack” debate [discussion write up]. My preferred small core “it’s an IaaS product” answer is only one side. Others favor “it’s an open cloud community” while another faction champions an “open cloud platform.” I’m struggling to find a way that it can be all together at the same time.

The TL;DR is that, today, OpenStack vendors are required to implement a system that can run Linux guests. This is an example of an implementation over API bias because there’s nothing in the API that drives that specific requirement.

From a pragmatic “get it done” perspective, OpenStack needs to remain implementation driven for now. That means that we care that “OpenStack” clouds run VMs.

While there are pragmatic reasons for this, I think that long term success will require OpenStack to become an API specification. So today’s “right answer” actually undermines the long term community value. This has been a long standing paradox in OpenStack.

Breaking the API to implementation link allows an ecosystem to grow with truly alternate implementations (not just plug-ins). This is a threat to the community “upstream first” mentality.  OpenStack needs to be confident enough in the quality and utility of the shared code base that it can allow competitive implementations. Open communities should not need walls to win but they do need clear API definition.

What is my posture for this specific issue?  It’s complicated.

First, I think that the user and ecosystem expectations are being largely ignored in these discussions. Many of the controversial items here are vendor initiatives, not user needs. Right now, I’ve heard clearly that those expectations are for OpenStack to be an IaaS the runs VMs. OpenStack really needs to focus on delivering a reliably operable VM based IaaS experience. Until that’s solid, the other efforts are vendor noise.

Second, I think that there are serious test gaps that jeopardize the standard. The fundamental premise of DefCore is that we can use the development tests for API and behavior validation. We chose this path instead of creating an independent test suite. We either need to address tests for interop within the current body of tests or discuss splitting the efforts. Both require more investment than we’ve been willing to make.

We have mechanisms in place to collects data from test results and expand the test base.  Instead of creating new rules or guidelines, I think we can work within the current framework.

The simple answer would be to block non-VM implementations; however, I trust that cloud consumers will make good decisions when given sufficient information.  I think we need to fix the tests and accept non-VM clouds if they pass the corrected tests.

For this and other reasons, I want OpenStack vendors to be specific about the configurations that they test and support. We took steps to address this in DefCore last year but pulled back from being specific about requirements.  In this particular case, I believe we should require the official OpenStack vendor to state clear details about their supported implementation.  Customers will continue vote with their wallet about which configuration details are important.

This is a complex issue and we need community input.  That means that we need to hear from you!  Here’s the TC Position and the DefCore Patch.