Why Fork Docker? Complexity Wack-a-Mole and Commercial Open Source

Update 12/14/16: Docker announced that they would create a container engine only project, ContinainerD, to decouple the engine from management layers above.  Hopefully this addresses this issues outlined in the post below.

Monday, The New Stack broke news about a possible fork of the Docker Engine and prominently quoted me saying “Docker consistently breaks backend compatibility.”  The technical instability alone is not what’s prompting industry leaders like Google, Red Hat and Huawei to take drastic and potentially risky community action in a central project.

So what’s driving a fork?  It’s the intersection of Cash, Complexity and Community.

hamsterIn fact, I’d warned about this risk over a year ago: Docker is both a core infrastucture technology (the docker container runner, aka Docker Engine) and a commercial company that manages the Docker brand.  The community formed a standard, runC, to try and standardize; however, Docker continues to deviate from (or innovate faster) that base.

It’s important for me to note that we use Docker tools and technologies heavily.  So far, I’ve been a long-time advocate and user of Docker’s innovative technology.  As such, we’ve also had to ride the rapid release roller coaster.

Let’s look at what’s going on here in three key areas:

1. Cash

The expected monetization of containers is the multi-system orchestration and support infrastructure.  Since many companies look to containers as leading the disruptive next innovation wave, the idea that Docker is holding part of their plans hostage is simply unacceptable.

So far, the open source Docker Engine has been simply included without payment into these products.  That changed in version 1.12 when Docker co-mingled their competitive Swarm product into the Docker Engine.  That effectively forces these other parties to advocate and distribute their competitors product.

2. Complexity

When Docker added cool Swarm Orchestration features into the v1.12 runtime, it added a lot of complexity too.  That may be simple from a “how many things do I have to download and type” perspective; however, that single unit is now dragging around a lot more code.

In one of the recent comments about this issue, Bob Wise bemoaned the need for infrastructure to be boring.  Even as we look to complex orchestration like Swarm, Kubernetes, Mesos, Rancher and others to perform application automation magic, we also need to reduce complexity in our infrastructure layers.

Along those lines, operators want key abstractions like containers to be as simple and focused as possible.  We’ve seen similar paths for virtualization runtimes like KVM, Xen and VMware that focus on delivering a very narrow band of functionality very well.  There is a lot of pressure from people building with containers to have a similar experience from the container runtime.

This approach both helps operators manage infrastructure and creates a healthy ecosystem of companies that leverage the runtimes.

Note: My company, RackN, believes strongly in this need and it’s a core part of our composable approach to automation with Digital Rebar.

3. Community

Multi-vendor open source is a very challenging and specialized type of community.  In these communities, most of the contributors are paid by companies with a vested (not necessarily transparent) interest in the project components.  If the participants of the community feel that they are not being supported by the leadership then they are likely to revolt.

Ultimately, the primary difference between Docker and a fork of Docker is the brand and the community.  If there companies paying the contributors have the will then it’s possible to move a whole community.  It’s not cheap, but it’s possible.

Developers vs Operators

One overlooked aspect of this discussion is the apparent lock that Docker enjoys on the container developer community.  The three Cs above really focus on the people with budgets (the operators) over the developers.  For a fork to succeed, there needs to be a non-Docker set of tooling that feeds the platform pipeline with portable application packages.

In Conclusion…

The world continues to get more and more heterogeneous.  We already had multiple container runtimes before Docker and the idea of a new one really is not that crazy right now.  We’ve already got an explosion of container orchestration and this is a reflection of that.

My advice?  Worry less about the container format for now and focus on automation and abstractions.

 

Notes from OSCON Container Podcast: Dan Berg, Phil Estes and Rob Hirschfeld

At OSCON, I had the pleasure of doing a IBM Dojo Podcast with some deep experts in the container and data center space: Dan Berg (@DanCBerg) and Phil Estes (@estesp).

ibm-dojo-podcast-show-art-16x9-150x150We dove into a discussion around significant trends in the container space, how open technology relates to containers and looked toward the technology’s future. We also previewed next month’s DockerCon, which is set for June 19-21 in Seattle.

Highlights!  We think containers will be considered MORE SECURE next year and also have some comments about the linguistic shift from Docker to CONTAINERS.”

Here are my notes from the recording with time stamps if you want to skip ahead:

  • 00:35 – What are the trends in Containers?
    • Rob: We are still figuring out how to make them work in terms of networking & storage
    • Dan: There are still a lot of stateful work moving into containers that need storage
    • Phil: We need to use open standards to help customers navigate options
  • 2:45 – Are the changes keeping people from moving forward?
    • Phil: Not if you start with the right guidelines and architecture
    • Dan: It’s OK to pick one and keep going because you need to build expertise
    • Rob: RackN experience changed Digital Rebar to microservices was an iterative experience
  • 5:00 Dan likes that there is so much experimentation that’s forcing us to talk about how applications are engineered
  • 5:45  Rob points out that we got 5 minutes in without saying “Docker”
    • There are a lot of orchestration choices but there’s confusion between Docker and the container ecosystem.
  • 7:00 We’re at OSCON, how far has the technology come in being open?
    • Phil thinks that open container initiative (OCI) is helping bring a lot of players to the field.
    • Dan likes that IBM is experimenting in community and drive interactions between projects.
    • Rob is not sure that we need to get everyone on the same page: open source allows people to pursue their own path.
  • 10:50 We have to figure out how to compensate companies & individuals for their work
    • Dan: if you’ve got any worthwhile product, you’ve got some open source component of it.  There are various ways to profit around that.
  • 13:00 What are we going to be talking about this time next year?
    • Rob (joking) we’ll say containers are old and microkernels are great!
    • Rob wants to be talking about operations but knows that it’s never interesting
    • Phil moving containers way from root access into more secure operations
    • Dan believes that we’ll start to consider containers as more secure than what we have today.  <- Rob strongly agrees!
  • 17:20 What is the impact of Containers on Ops?  Aka DevOps
    • Dan said “Impact is HUGE!”  > Developers are going to get Ops & Capabilities for free
    • Rob brings up impact of Containers on DevOps – the discussion has really gone underground
  • 19:30 Role of Service Registration (Consul & Etcd)
    • Life cycle management of Containers has really changed (Dan)
    • Rob brings up the importance of Service Registration in container management
  • 20:30 2016.Dockercon Docket- what are you expecting?
    • Phil is speaking there on the contribute track & OCI.
    • Rob is doing the hallway track and looking to talk about the “underlay” ops and the competitive space around Docker and Container.
    • Dan will be talking to customers and watching how the community is evolving and experimenting
    • Rob & Dan will be at Open Cloud Technology Summit, June 22 in Seattle

 

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.

 

At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!

 

SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters

Originally posted on Kubernetes Blog.  I wanted to repost here because it’s part of the RackN ongoing efforts to focus on operational and fidelity gap challenges early.  Please join us in this effort!

openWe think Kubernetes is an awesome way to run applications at scale! Unfortunately, there’s a bootstrapping problem: we need good ways to build secure & reliable scale environments around Kubernetes. While some parts of the platform administration leverage the platform (cool!), there are fundamental operational topics that need to be addressed and questions (like upgrade and conformance) that need to be answered.

Enter Cluster Ops SIG – the community members who work under the platform to keep it running.

Our objective for Cluster Ops is to be a person-to-person community first, and a source of opinions, documentation, tests and scripts second. That means we dedicate significant time and attention to simply comparing notes about what is working and discussing real operations. Those interactions give us data to form opinions. It also means we can use real-world experiences to inform the project.

We aim to become the forum for operational review and feedback about the project. For Kubernetes to succeed, operators need to have a significant voice in the project by weekly participation and collecting survey data. We’re not trying to create a single opinion about ops, but we do want to create a coordinated resource for collecting operational feedback for the project. As a single recognized group, operators are more accessible and have a bigger impact.

What about real world deliverables?

We’ve got plans for tangible results too. We’re already driving toward concrete deliverables like reference architectures, tool catalogs, community deployment notes and conformance testing. Cluster Ops wants to become the clearing house for operational resources. We’re going to do it based on real world experience and battle tested deployments.

Connect with us.

Cluster Ops can be hard work – don’t do it alone. We’re here to listen, to help when we can and escalate when we can’t. Join the conversation at:

The Cluster Ops Special Interest Group meets weekly at 13:00PT on Thursdays, you can join us via the video hangout and see latest meeting notes for agendas and topics covered.

Hybrid & Container Disruption [Notes from CTP Mike Kavis’ Interview]

Last week, Cloud Technology Partner VP Mike Kavis (aka MadGreek65) and I talked for 30 minutes about current trends in Hybrid Infrastructure and Containers.

leadership-photos-mike

Mike Kavis

Three of the top questions that we discussed were:

  1. Why Composability is required for deployment?  [5:45]
  2. Is Configuration Management dead? [10:15]
  3. How can containers be more secure than VMs? [23:30]

Here’s the audio matching the time stamps in my notes:

  • 00:44: What is RackN? – scale data center operations automation
  • 01:45: Digital Rebar is… 3rd generation provisioning to manage data center ops & bring up
  • 02:30: Customers were struggling on Ops more than code or hardware
  • 04:00: Rethinking “open” to include user choice of infrastructure, not just if the code is open source.
  • 05:00: Use platforms where it’s right for users.
  • 05:45: Composability – it’s how do we deal with complexity. Hybrid DevOps
  • 06:40: How do we may Ops more portable
  • 07:00: Five components of Hybrid DevOps
  • 07:27: Rob has “Rick Perry” Moment…
  • 08:30: 80/20 Rule for DevOps where 20% is mixed.
  • 10:15: “Is configuration management dead” > Docker does hurt Configuration Management
  • 11:00: How Service Registry can replace Configuration.
  • 11:40: Reference to John Willis on the importance of sequence.
  • 12:30: Importance of Sequence, Services & Configuration working together
  • 12:50: Digital Rebar intermixes all three
  • 13:30: The race to have orchestration – “it’s always been there”
  • 14:30: Rightscale Report > Enterprises average SIX platforms in use
  • 15:30: Fidelity Gap – Why everyone will hybrid but need to avoid monoliths
  • 16:50: Avoid hybrid trap and keep a level of abstraction
  • 17:41: You have to pay some “abstraction tax” if you want to hybrid BUT you can get some additional benefits: hybrid + ops management.
  • 18:00: Rob gives a shout out to Rightscale
  • 19:20: Rushing to solutions does not create secure and sustained delivery
  • 20:40: If you work in a silo, you loose the ability to collaborate and reuse other works
  • 21:05: Rob is sad about “OpenStack explosion of installers”
  • 21:45: Container benefit from services containers – how they can be MORE SECURE
  • 23:00: Automation required for security
  • 23:30: How containers will be more secure than containers
  • 24:30: Rob bring up “cheese” again…
  • 26:15: If you have more situationalleadership-photos-mike awareness, you can be more secure WITHOUT putting more work for developers.
  • 27:00: Containers can help developers worry about as many aspects of Ops
  • 27:45: Wrap up

What do you think?  I’d love to hear your opinion on these topics!

Smaller Nodes? Just the Right Size for Docker!

Container workloads have the potential to redefine how we think about scale and hosted infrastructure.

Last Fall, Ubiquity Hosting and RackN announced a 200 node Docker Swarm cluster as a phase one of our collaboration. Unlike cloud-based container workloads demonstrations, we chose to run this cluster directly on the bare metal.  

Why bare metal instead of virtualized? We believe that metal offers additional performance, availability and control.  

With the cluster automation ready, we’re looking for customers to help us prove those assumptions. While we could simply build on many VMs, our analysis is the a lot of smaller nodes will distribute work more efficiently. Since there is no virtualization overhead, lower RAM systems can still give great performance.

The collaboration with RackN allows us to offer customers a rapid, repeatable cluster capability. Their Digital Rebar automation works on a broad spectrum of infrastructure allow our users to rehearse deployments on cloud, quickly change components and iteratively tune the cluster.

We’re finding that these dedicated metal nodes have much better performance than similar VMs in AWS?  Don’t believe us – you can use Digital Rebar to spin up both and compare.   Since Digital Rebar is an open source platform, you can explore and expand on it.

The Docker Swarm deployment is just a starting point for us. We want to hear your provisioning ideas and work to turn them into reality.

2015 Container Review

It’s been a banner year for container awareness and adoption so we wanted to recap 2015.  For RackN, container acceleration is near to our heart because we both enable and use them in fundamental ways.   Look for Rob’s 2016 predictions on his blog.

The RackN team has truly deep and broad experience with containers in practical use.  In the summer, we delivered multiple container orchestration workloads including Docker Swarm, Kubernetes, Cloud Foundry, StackEngine and others.  In the fall, we refactored Digital Rebar to use Docker Compose with dramatic results.  And we’ve been using Docker since 2013 (yes, “way back”) for ops provisioning and development.

To make it easier to review that experience, we are consolidating a list of our container related posts for 2015.

General Container Commentary

RackN & Digital Rebar Related

¡Sí, Sí! That’s a Two Hundred Node Metal Docker Swarm Deployment

Today, RackN and Ubiquity Hosting announced a 200 node Docker Swarm deployment on hosted bare metal.

Leveraging the current Digital Rebar core and the RackN Swarm workload, this reference deployment was automatically configured using the same components that also work on a desktop VM deployment. That high fidelity deployment allows operators to start learning quickly on small systems then grow to AWS and if warranted, potentially smoothly transition to scale metal.

This deployment represents RackN starting a new chapter with Digital Rebar because it demonstrates a commitment deploy on any infrastructure: cloud, metal or something in between.

The RackN team started this journey with a “composable ops” vision that allows operators to mix and match. That spans both vendor physical resources and software components such as operating systems, software defined networking and platforms. In the 200 node Swarm cluster, physical infrastructure is provisioned by Ubiquity Hosting not Digital Rebar or RackN.  Historically, RackN focused on private infrastructure.  Now, users get the option of best-in-class metal deployment without having to own the infrastructure.

We experienced the futility of making Ops homogeneous and declared defeat.

Accepting that each data center has individual Ops was pivotal. Digital Rebar embraces heterogeneity at the most fundamental architectural level. Our system approach and unique composable abstractions allow users to make deployments portable between any infrastructure with existing tooling and operational processes. Portability means that we can both eliminate the fidelity gap as we scale and between deployments.

When multiple scales and sites can share deployment automation, we can finally work together on addressing critical operational issues like scale, high availability and upgrade

This 200 node deployment demonstrates more than scale and the deployment of the latest Docker technology. It is a milestone on the path toward sharable production operations.

Faster, Simpler AND Smaller – Immutable Provisioning with Docker Compose!

Nearly 10 TIMES faster system resets – that’s the result of fully enabling an multi-container immutable deployment on Digital Rebar.

Docker ComposeI’ve been having a “containers all the way down” month since we launched Digital Rebar deployment using Docker Compose. I don’t want to imply that we rubbed Docker on the platform and magic happened. The RackN team spent nearly a year building up the Consul integration and service wrappers for our platform before we were ready to fully migrate.

During the Digital Rebar migration, we took our already service-oriented code base and broke it into microservices. Specifically, the Digital Rebar parts (the API and engine) now run in their own container and each service (DNS, DHCP, Provisioning, Logging, NTP, etc) also has a dedicated container. Likewise, supporting items like Consul and PostgreSQL are, surprise, managed in dedicated containers too. All together, that’s over nine containers and we continue to partition out services.

We use Docker Compose to coordinate the start-up and Consul to wire everything together. Both play a role, but Consul is the critical glue that allows Digital Rebar components to find each other. These were not random choices. We’ve been using a Docker package for over two years and using Consul service registration as an architectural choice for over a year.

Service registration plays a major role in the functional ops design because we’ve been wrapping datacenter services like DNS with APIs. Consul is a separation between providing and consuming the service. Our previous design required us to track the running service. This worked until customers asked for pluggable services (and every customer needs pluggable services as they scale).

Besides being a faster to reset the environment, there are several additional wins:

  1. more transparent in how it operates – it’s obvious which containers provide each service and easy to monitor them as individuals.
  2. easier to distribute services in the environment – we can find where the service runs because of the Consul registration, so we don’t have to manage it.
  3. possible to have redundant services – it’s easy to spin up new services even on the same system
  4. make services pluggable – as long as the service registers and there’s an API, we can replace the implementation.
  5. no concern about which distribution is used – all our containers are Ubuntu user space but the host can be anything.
  6. changes to components are more isolated – changing one service does not require a lot of downloading.

Docker and microservices are not magic but the benefits are real. Be prepared to make architectural investments to realize the gains.

Introducing Digital Rebar. Building strong foundations for New Stack infrastructure

digital_rebarThis week, I have the privilege to showcase the emergence of RackN’s updated approach to data center infrastructure automation that is container-ready and drives “cloud-style” DevOps on physical metal.  While it works at scale, we’ve also ensured it’s light enough to run a production-fidelity deployment on a laptop.

You grow to cloud scale with a ready-state foundation that scales up at every step.  That’s exactly what we’re providing with Digital Rebar.

Over the past two years, the RackN team has been working on microservices operations orchestration in the OpenCrowbar code base.  By embracing these new tools and architecture, Digital Rebar takes that base into a new directions.  Yet, we also get to leverage a scalable heterogeneous provisioner and integrations for all major devops tools.  We began with critical data center automation already working.

Why Digital Rebar? Traditional data center ops is being disrupted by container and service architectures and legacy data centers are challenged with gracefully integrating this new way of managing containers at scale: we felt it was time to start a dialog the new foundational layer of scale ops.

Both our code and vision has substantially diverged from the groundbreaking “OpenStack Installer” MVP the RackN team members launched in 2011 from inside Dell and is still winning prizes for SUSE.

We have not regressed our leading vendor-neutral hardware discovery and configuration features; however, today, our discussions are about service wrappers, heterogeneous tooling, immutable container deployments and next generation platforms.

Over the next few days, I’ll be posting more about how Digital Rebar works (plus video demos).