As Docker rises above (and disrupts) clouds, I’m thinking about their community landscape

Watching the lovefest of DockerConf last week had me digging up my April 2014 “Can’t Contain(erize) the Hype” post.  There’s no doubt that Docker (and containers more broadly) is delivering on it’s promise.  I was impressed with the container community navigating towards an open platform in RunC and vendor adoption of the trusted container platforms.

I’m a fan of containers and their potential; yet, remotely watching the scope and exuberance of Docker partnerships seems out of proportion with the current capabilities of the technology.

The latest update to the Docker technology, v1.7, introduces a lot of important network, security and storage features.  The price of all that progress is disruption to ongoing work and integration to the ecosystem.

There’s always two sides to the rapid innovation coin: “Sweet, new features!  Meh, breaking changes to absorb.”

Docker Ecosystem Explained

Docker Ecosystem Explained

There remains a confusion between Docker the company and Docker the technology.  I like how the chart (right) maps out potential areas in the Docker ecosystem.  There’s clearly a lot of places for companies to monetize the technology; however, it’s not as clear if the company will be able to secede lucrative regions, like orchestration, to become a competitive landscape.

While Docker has clearly delivered a lot of value in just a year, they have a fair share of challenges ahead.  

If OpenStack is a leading indicator, we can expect to see vendor battlegrounds forming around networking and storage.  Docker (the company) has a chance to show leadership and build community here yet could cause harm by giving up the arbitrator role be a contender instead.

One thing that would help control the inevitable border skirmishes will be clear definitions of core, ecosystem and adjacencies.  I see Docker blurring these lines with some of their tools around orchestration, networking and storage.  I believe that was part of their now-suspended kerfuffle with CoreOS.

Thinking a step further, parts of the Docker technology (RunC) have moved over to Linux Foundation governance.  I wonder if the community will drive additional shared components into open governance.  Looking at Node.js, there’s clear precedent and I wonder if Joyent’s big Docker plans have them thinking along these lines.

Is there something between a Container and VM? Apparently, yes.

The RackN team has started designing reference architectures for containers on metal (discussed on TheNewStack.io) with the hope of finding hardware design that is cost and performance optimized for containers instead of simply repurposing premium virtualized cloud infrastructure.  That discussion turned up something unexpected…

That post generated a twitter thread that surfaced Hyper.sh and ClearLinux as hardware enabled (Intel VT-x) alternatives to containers.

This container alternative likely escapes notice of many because it requires hardware capabilities that are not/partially exposed inside cloud virtual machines; however, it could be a very compelling story for operators looking for containers on metal.

Here’s my basic understanding: these technologies offer container-like light-weight & elastic behavior with the isolation provided by virtual machines.  This is possible because they use CPU capabilities to isolate environments.

7/3 Update: Feedback about this post has largely been “making it easier for VMs to run docker automatically is not interesting.”  What’s your take on it?

Details behind RackN Kubernetes Workload for OpenCrowbar

Since I’ve already bragged about how this workload validates OpenCrowbar’s deep ops impact, I can get right down to the nuts and bolts of what RackN CTO Greg Althaus managed to pack into this workload.

Like any scale install, once you’ve got a solid foundation, the actual installation goes pretty quickly.  In Kubernetes’ case, that means creating strong networking and etcd configuration.

Here’s a 30 minute video showing the complete process from O/S install to working Kubernetes:

Here are the details:

Clustered etcd – distributed key store

etcd is the central data service that maintains the state for the Kubernetes deployment.  The strength of the installation rests on the correctness of etcd.  The workload builds an etcd cluster and synchronizes all the instances as nodes are added.

Networking with Flannel and Proxy

Flannel is the default overlay network for Kubernetes that handles IP assignment and intercontainer communication with UDP encapsulation.  The workload configures Flannel as for networking with etcd as the backing store.

An important part of the overall networking setup is the configuration of a proxy so that the nodes can get external access for Docker image repos.

Docker Setup

We install the latest Docker on the system.  That may not sound very exciting; however, Docker iterates faster than most Linux images so it’s important that we keep you current.

Master & Minion Kubernetes Nodes

Using etcd as a backend, the workload sets up one (or more) master nodes with the API server and other master services.  When the minions are configured, they are pointed to the master API server(s).  You get to choose how many masters and which systems become masters.  If you did not choose correctly, it’s easy to rinse and repeat.

Highly Available using DNS Round Robin

As the workload configures API servers, it also adds them to a DNS round robin pool (made possible by [new DNS integrations]).  Minions are configured to use the shared DNS name so that they automatically round-robin all the available API servers.  This ensures both load balancing and high availability.  The pool is automatically updated when you add or remove servers.

Installed on Real Metal

It’s worth including that we’ve done cluster deployments of 20 physical nodes (with 80 in process!).  Since OpenCrowbar architecture abstracts the vendor hardware, the configuration is multi-vendor and heterogenous.  That means that this workload (and our others) delivers tangible scale implementations quickly and reliably.

Future Work for Advanced Networking

Flannel is really very basic SDN.  We’d like to see additional networking integrations including OpenContrail as per Pedro Marques work.

At this time, we are not securing communication with etcd.  This requires advanced key management is a more advanced topic.

Why is RackN building this?  We are a physical ops automation company.

We are seeking to advance the state of data center operations by helping get complex scale platforms operationalized.  We want to work with the relevant communities to deliver repeatable best practices around next-generation platforms like Kubernetes.  Our speciality is in creating a general environment for ops success: we work with partners who are experts on using the platforms.

We want to engage with potential users before we turn this into a open community project; however, we’ve chosen to make the code public.  Please get us involved (community forum)!  You’ll need a working OpenCrowbar or RackN Enterprise install as a pre-req and we want to help you be successful.

From Metal Foundation to FIVE new workloads in five weeks

SpinningOpenCrowbar Drill release (will likely become v2.3) is wrapping up in the next few weeks and it’s been amazing to watch the RackN team validate our designs by pumping out workloads and API integrations (list below).

I’ve posted about the acceleration from having a ready state operations base and we’re proving that out.  Having an automated platform is critical for metal deployment because there is substantial tuning and iteration needed to complete installations in the field.

Getting software setup once is not a victory: that’s called a snowflake   

Real success is tearing it down and having work the second, third and nth times.  That’s because scale ops is not about being able to install platforms.  It’s about operationalizing them.

Integration: the difference between install and operationalization.

When we build a workload, we are able to build up the environment one layer at a time.  For OpenCrowbar, that starts with a hardware inventory and works up though RAID/BIOS and O/S configuration.  After the OS is ready, we are able to connect into the operational environment (SSH keys, NTP, DNS, Proxy, etc) and build real multi-switch/layer2 topographies.  Next we coordinate multi-node actions like creating Ceph, Consul and etcd clusters so that the install is demonstrably correct across nodes and repeatable at every stage.  If something has to change, you can repeat the whole install or just the impacted layers.  That is what I consider integrated operation.

It’s not just automating a complex install.  We design to be repeatable site-to-site.

Here’s the list of workloads we’ve built on OpenCrowbar and for RackN in the last few weeks:

  1. Ceph (OpenCrowbar) with advanced hardware optimization and networking that synchronizes changes in monitors.
  2. Docker Swarm (RackN) (or DIY with Docker Machine on Metal)
  3. StackEngine (RackN) builds a multi-master cluster and connects all systems together.
  4. Kubernetes (RackN) that includes automatic high available DNS configuration, flannel networking and etcd cluster building.
  5. CloudFoundry on Metal via BOSH (RackN) uses pools of hardware that are lifecycle managed OpenCrowbar including powering off systems that are idle.
  6. I don’t count the existing RackN OpenStack via Packstack (RackN) workload because it does not directly leverage OpenCrowbar clustering or networking.  It could if someone wanted to help build it.

And… we also added a major DNS automation feature and updated the network mapping logic to work in environments where Crowbar does not manage the administrative networks (like inside clouds).  We’ve also been integrating deeply with Hashicorp Consul to allow true “ops service discovery.”

DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

OpenStack Vancouver six observations: partners, metal, tents, defore, brands & breakage

As always, OpenStack conferences/summits are packed with talks and discussions.  Any one of these six points could be a full post; however, I would rather post now and start discussions.  Let me know what you think!

1. Partnering Everywhere – it’s froth, not milk

Everyone is partnering with everyone! It’s a good way to appear to cover more around and appear more open. Right now, I believe these partnerships are for show and very shallow. There will be blood when money is flowing and both partners want the lion’s share.

2. Metal is Hot! attention on Ironic & MaaS

Metal is very hot topic. No surprise, but I do not think that either MaaS or Ironic have the right architecture to deal with the real complexity of automating metal in a generalized way. The consequence is that they are limited and hard to operate.

Container talks were also very hot and I believe are ultimately disruptive.  The very fact that all the container talks were overflowing is an indication of the challenges facing virtualization.

3. DefCore – Just in the Nick of Time

I think that the press and analysts were ready to proclaim that OpenStack was fragmenting and being unable to deliver the “one cloud, multiple vendors” vision. DefCore (presented as Interopability by Jonathan Bryce, DefCore shout out!) came in on the buzzer to buy us more time.

4. Big Tent Concerns – what is ecosystem & release?

Big Tent is shorthand for project governance changes that make it easier for new projects to become OpenStack projects and removes the concept of integrated releases.  The exact definition is still a work in progress.

The top concerns I have are:

  1. We cannot tell difference between community & ecosystem. We’re back to anointed projects because we’re now telling projects they have to join OpenStack to work with OpenStack.
  2. We’re changing the definition of the release but have not defined how it will change. I acknowledge that continuous release is ideal but we’re confusing people again.

5. Brands are battling – will they destroy the city?

OpenStack is hard for startups – read the full post here.  The short version is that big companies are taking up all the air.

While some are leading, others they are learning how to collaborate.  Those new to open source are slow to trust and uncertain about where to invest.  Unfortunately, we’ve created a visible contributions economy that does not reward doing the scut work so it’s no surprise that there are concerns that some of the bigger companies are free riding.

6. OpenStack is broken talks – could we reboot?  no.

It’s a sign of OpenStack’s age that Bias, Termie and others suggested we need clean slate.  Frankly, I think that OpenStack would be irrelevant by the time a rewrite was completed and it not helpful to suggest it.

What would I suggest?  I’d promote a strong core (doing!), ensure big companies collaborate on roadmap (doing!) and stop having a single node install as gate and dev reference (I’d happily help use OCB for this with partners)

PS: Apparently Neutron is not broken.

I’m very excited about the “just give me a network” work to make Neutron duplicate Nova-Net functionality.  Finally.

OpenSource.com Interview on DefCore, project management, and the future of OpenStack

Reposted from My Interview with RedHat’s OpenSource.com Jason Baker

Rob Hirschfeld has been involved with OpenStack since before the project was even officially formed, and so he brings a rich perspective as to the project’s history, its organization, and where it may be headed next. Recently, he has focused primarily on the physical infrastructure automation space, working with an an enterprise version of OpenCrowbar, an “API-driven metal” project which started as an OpenStack installer and moved to a generic workload underlay.

Rob is speaking on two panels at the upcoming OpenStack Summit in Vancouver, including DefCore 2015 and the State of OpenStack Project Management. We caught up with Rob to get updates about these two topics and what else lies ahead for OpenStack.

We asked you to help walk us through DefCore as it was being developed last year; just as a reminder, what is DefCore and why should people care about it?

DefCore creates a minimal definition for OpenStack vendors to help ensure interoperability and stability for the user community. While DefCore definitions apply only to vendors asking to use the OpenStack trademark, there are technical impacts on the tests and APIs that we select as required. We’ve worked hard to make sure the that selection process for picking “core” is transparent and fair.

What did the changes approved by the OpenStack Foundation membership earlier this year mean for DefCore?

The by-laws changes approved by the community were important to allow us to use DefCore more granular definition of Core. The previous by-laws were much more project focused. The changes allow us to select specific APIs and code components from a project as required instead of picking everything blindly. That allows projects to have both stable and new innovative components.

What can we expect from OpenStack’s structure and organization as we move forward towards the next release?

There are a lot of changes still to come. The technical leadership is making it easier to become part of the OpenStack code base. I’ve written about this change having potentially both positive and negative impacts on OpenStack to make it appear more like a suite of projects than a tightly integrate product. In many ways, DefCore helps vendors define OpenStack as a product as the community is expanding to include more capabilities. In my discussions, this is a good balance.

Switching gears a bit, you’ve also been heavily involved in the OpenStack project management working group. How has that group been progressing since they convened at the Paris Summit?

This group has made a lot of progress. We’ve seen non-board leadership step in and lead the group. That leadership is more organic and based in the companies that are directly contributing. I think that’s resulted in a lot of good ideas and documentation from the group. We’ll see some excellent results in Vancouver from them. It’s going to come back to the community and technical leadership to leverage that work. I think that’s the real test: we have to share ownership of direction between multiple perspectives. The first step in doing that is writing it down (which is what they have been doing).

Aside from the organization, let’s talk about the software itself. What are you hoping to see from the Liberty release?

I’m hoping to see adoption of Neutron accelerate. Having two network approaches makes it impossible to really have an interoperability story. That means Neutron has to be working technically, but also for operators and users. To be brutally honest, it also has to overcome its own reputation. If Neutron does not become the dominate choice, we are going to effectively have two major flavors of OpenStack. From the DefCore, vendor, or user perspective, that’s a very challenging position.

Anything else you’d like to add?

We’ve accomplished a lot together. In some ways, chasing too many targets is our biggest threat. I think that container workloads and orchestration are already being very disruptive for OpenStack. I’m hoping that we focus on delivering a stable core infrastructure. That’s why I’ve been working so hard on DefCore. Looking forward, there’s an increasing risk of trying to chase too many targets and losing the core of what users want.

This article is part of the Speaker Interview Series for OpenStack Summit Vancouver, a five-day conference for developers, users, and administrators of OpenStack Cloud Software.