yes, we are papering over Container ops [from @TheNewStack #DockerCon]

thenewstackIn this brief 7 minute interview made at DockerCon 16, Alex Williams and I cover a lot of ground ranging from operations’ challenges in container deployment to the early seeds of the community frustration with Docker 1.12 embedding swarm.

I think there’s a lot of pieces we’re still wishing away that aren’t really gone. (at 4:50)

Rather than repeat TheNewStack summary; I want to highlight the operational and integration gaps that we continue to ignore.

It’s exciting to watch a cluster magically appear during a keynote demo, but those demos necessarily skip pass the very real provisioning, networking and security work needed to build sustained clusters.

These underlay problems are general challenges that we can address in composable, open and automated ways.  That’s the RackN goal with Digital Rebar and we’ll be showcasing how that works with some new Kubernetes automation shortly.

Here is the interview on SoundCloud or youtube:

 

Why Fork Docker? Complexity Wack-a-Mole and Commercial Open Source

Update 12/14/16: Docker announced that they would create a container engine only project, ContinainerD, to decouple the engine from management layers above.  Hopefully this addresses this issues outlined in the post below.

Monday, The New Stack broke news about a possible fork of the Docker Engine and prominently quoted me saying “Docker consistently breaks backend compatibility.”  The technical instability alone is not what’s prompting industry leaders like Google, Red Hat and Huawei to take drastic and potentially risky community action in a central project.

So what’s driving a fork?  It’s the intersection of Cash, Complexity and Community.

hamsterIn fact, I’d warned about this risk over a year ago: Docker is both a core infrastucture technology (the docker container runner, aka Docker Engine) and a commercial company that manages the Docker brand.  The community formed a standard, runC, to try and standardize; however, Docker continues to deviate from (or innovate faster) that base.

It’s important for me to note that we use Docker tools and technologies heavily.  So far, I’ve been a long-time advocate and user of Docker’s innovative technology.  As such, we’ve also had to ride the rapid release roller coaster.

Let’s look at what’s going on here in three key areas:

1. Cash

The expected monetization of containers is the multi-system orchestration and support infrastructure.  Since many companies look to containers as leading the disruptive next innovation wave, the idea that Docker is holding part of their plans hostage is simply unacceptable.

So far, the open source Docker Engine has been simply included without payment into these products.  That changed in version 1.12 when Docker co-mingled their competitive Swarm product into the Docker Engine.  That effectively forces these other parties to advocate and distribute their competitors product.

2. Complexity

When Docker added cool Swarm Orchestration features into the v1.12 runtime, it added a lot of complexity too.  That may be simple from a “how many things do I have to download and type” perspective; however, that single unit is now dragging around a lot more code.

In one of the recent comments about this issue, Bob Wise bemoaned the need for infrastructure to be boring.  Even as we look to complex orchestration like Swarm, Kubernetes, Mesos, Rancher and others to perform application automation magic, we also need to reduce complexity in our infrastructure layers.

Along those lines, operators want key abstractions like containers to be as simple and focused as possible.  We’ve seen similar paths for virtualization runtimes like KVM, Xen and VMware that focus on delivering a very narrow band of functionality very well.  There is a lot of pressure from people building with containers to have a similar experience from the container runtime.

This approach both helps operators manage infrastructure and creates a healthy ecosystem of companies that leverage the runtimes.

Note: My company, RackN, believes strongly in this need and it’s a core part of our composable approach to automation with Digital Rebar.

3. Community

Multi-vendor open source is a very challenging and specialized type of community.  In these communities, most of the contributors are paid by companies with a vested (not necessarily transparent) interest in the project components.  If the participants of the community feel that they are not being supported by the leadership then they are likely to revolt.

Ultimately, the primary difference between Docker and a fork of Docker is the brand and the community.  If there companies paying the contributors have the will then it’s possible to move a whole community.  It’s not cheap, but it’s possible.

Developers vs Operators

One overlooked aspect of this discussion is the apparent lock that Docker enjoys on the container developer community.  The three Cs above really focus on the people with budgets (the operators) over the developers.  For a fork to succeed, there needs to be a non-Docker set of tooling that feeds the platform pipeline with portable application packages.

In Conclusion…

The world continues to get more and more heterogeneous.  We already had multiple container runtimes before Docker and the idea of a new one really is not that crazy right now.  We’ve already got an explosion of container orchestration and this is a reflection of that.

My advice?  Worry less about the container format for now and focus on automation and abstractions.

 

Notes from OSCON Container Podcast: Dan Berg, Phil Estes and Rob Hirschfeld

At OSCON, I had the pleasure of doing a IBM Dojo Podcast with some deep experts in the container and data center space: Dan Berg (@DanCBerg) and Phil Estes (@estesp).

ibm-dojo-podcast-show-art-16x9-150x150We dove into a discussion around significant trends in the container space, how open technology relates to containers and looked toward the technology’s future. We also previewed next month’s DockerCon, which is set for June 19-21 in Seattle.

Highlights!  We think containers will be considered MORE SECURE next year and also have some comments about the linguistic shift from Docker to CONTAINERS.”

Here are my notes from the recording with time stamps if you want to skip ahead:

  • 00:35 – What are the trends in Containers?
    • Rob: We are still figuring out how to make them work in terms of networking & storage
    • Dan: There are still a lot of stateful work moving into containers that need storage
    • Phil: We need to use open standards to help customers navigate options
  • 2:45 – Are the changes keeping people from moving forward?
    • Phil: Not if you start with the right guidelines and architecture
    • Dan: It’s OK to pick one and keep going because you need to build expertise
    • Rob: RackN experience changed Digital Rebar to microservices was an iterative experience
  • 5:00 Dan likes that there is so much experimentation that’s forcing us to talk about how applications are engineered
  • 5:45  Rob points out that we got 5 minutes in without saying “Docker”
    • There are a lot of orchestration choices but there’s confusion between Docker and the container ecosystem.
  • 7:00 We’re at OSCON, how far has the technology come in being open?
    • Phil thinks that open container initiative (OCI) is helping bring a lot of players to the field.
    • Dan likes that IBM is experimenting in community and drive interactions between projects.
    • Rob is not sure that we need to get everyone on the same page: open source allows people to pursue their own path.
  • 10:50 We have to figure out how to compensate companies & individuals for their work
    • Dan: if you’ve got any worthwhile product, you’ve got some open source component of it.  There are various ways to profit around that.
  • 13:00 What are we going to be talking about this time next year?
    • Rob (joking) we’ll say containers are old and microkernels are great!
    • Rob wants to be talking about operations but knows that it’s never interesting
    • Phil moving containers way from root access into more secure operations
    • Dan believes that we’ll start to consider containers as more secure than what we have today.  <- Rob strongly agrees!
  • 17:20 What is the impact of Containers on Ops?  Aka DevOps
    • Dan said “Impact is HUGE!”  > Developers are going to get Ops & Capabilities for free
    • Rob brings up impact of Containers on DevOps – the discussion has really gone underground
  • 19:30 Role of Service Registration (Consul & Etcd)
    • Life cycle management of Containers has really changed (Dan)
    • Rob brings up the importance of Service Registration in container management
  • 20:30 2016.Dockercon Docket- what are you expecting?
    • Phil is speaking there on the contribute track & OCI.
    • Rob is doing the hallway track and looking to talk about the “underlay” ops and the competitive space around Docker and Container.
    • Dan will be talking to customers and watching how the community is evolving and experimenting
    • Rob & Dan will be at Open Cloud Technology Summit, June 22 in Seattle

 

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.

 

At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!

 

SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters

Originally posted on Kubernetes Blog.  I wanted to repost here because it’s part of the RackN ongoing efforts to focus on operational and fidelity gap challenges early.  Please join us in this effort!

openWe think Kubernetes is an awesome way to run applications at scale! Unfortunately, there’s a bootstrapping problem: we need good ways to build secure & reliable scale environments around Kubernetes. While some parts of the platform administration leverage the platform (cool!), there are fundamental operational topics that need to be addressed and questions (like upgrade and conformance) that need to be answered.

Enter Cluster Ops SIG – the community members who work under the platform to keep it running.

Our objective for Cluster Ops is to be a person-to-person community first, and a source of opinions, documentation, tests and scripts second. That means we dedicate significant time and attention to simply comparing notes about what is working and discussing real operations. Those interactions give us data to form opinions. It also means we can use real-world experiences to inform the project.

We aim to become the forum for operational review and feedback about the project. For Kubernetes to succeed, operators need to have a significant voice in the project by weekly participation and collecting survey data. We’re not trying to create a single opinion about ops, but we do want to create a coordinated resource for collecting operational feedback for the project. As a single recognized group, operators are more accessible and have a bigger impact.

What about real world deliverables?

We’ve got plans for tangible results too. We’re already driving toward concrete deliverables like reference architectures, tool catalogs, community deployment notes and conformance testing. Cluster Ops wants to become the clearing house for operational resources. We’re going to do it based on real world experience and battle tested deployments.

Connect with us.

Cluster Ops can be hard work – don’t do it alone. We’re here to listen, to help when we can and escalate when we can’t. Join the conversation at:

The Cluster Ops Special Interest Group meets weekly at 13:00PT on Thursdays, you can join us via the video hangout and see latest meeting notes for agendas and topics covered.

Smaller Nodes? Just the Right Size for Docker!

Container workloads have the potential to redefine how we think about scale and hosted infrastructure.

Last Fall, Ubiquity Hosting and RackN announced a 200 node Docker Swarm cluster as a phase one of our collaboration. Unlike cloud-based container workloads demonstrations, we chose to run this cluster directly on the bare metal.  

Why bare metal instead of virtualized? We believe that metal offers additional performance, availability and control.  

With the cluster automation ready, we’re looking for customers to help us prove those assumptions. While we could simply build on many VMs, our analysis is the a lot of smaller nodes will distribute work more efficiently. Since there is no virtualization overhead, lower RAM systems can still give great performance.

The collaboration with RackN allows us to offer customers a rapid, repeatable cluster capability. Their Digital Rebar automation works on a broad spectrum of infrastructure allow our users to rehearse deployments on cloud, quickly change components and iteratively tune the cluster.

We’re finding that these dedicated metal nodes have much better performance than similar VMs in AWS?  Don’t believe us – you can use Digital Rebar to spin up both and compare.   Since Digital Rebar is an open source platform, you can explore and expand on it.

The Docker Swarm deployment is just a starting point for us. We want to hear your provisioning ideas and work to turn them into reality.

12 Predictions for ’16: mono-cloud ambitions die as containers drive more hybrid IT

I expect 2016 to be a confusing year for everyone in IT.  For 2015, I predicted that new uses for containers are going to upset cloud’s apple cart; however, the replacement paradigm is not clear yet.  Consequently, I’m doing a prognostication mix and match: five predictions and seven items on a “container technology watch list.”

TL;DR: In 2016, Hybrid IT arrives on Containers’ wings.

Considering my expectations below, I think it’s time to accept that all IT is heterogeneous and stop trying to box everything into a mono-cloud.  Accepting hybrid as current state unblocks many IT decisions that are waiting for things to settle down.

Here’s the memo: “Stop waiting.  It’s not going to converge.”

2016 Predictions

  1. Container Adoption Seen As Two Stages:  We will finally accept that Containers have strength for both infrastructure (first stage adoption) and application life-cycle (second stage adoption) transformation.  Stage one offers value so we will start talking about legacy migration into containers without shaming teams that are not also rewriting apps as immutable microservice unicorns.
  2. OpenStack continues to bump and grow.  Adoption is up and open alternatives are disappearing.  For dedicated/private IaaS, OpenStack will continue to gain in 2016 for basic VM management.  Both competitive and internal pressures continue to threaten the project but I believe they will not emerge in 2016.  Here’s my complete OpenStack 2016 post?
  3. Amazon, GCE and Azure make everything else questionable.  These services are so deep and rich that I’d question anyone who is not using them.  At least one of them simply have to be part of everyone’s IT strategy for financial, talent and technical reasons.
  4. Cloud API becomes irrelevant. Cloud API is so 2011!  There are now so many reasonable clients to abstract various Infrastructures that Cloud APIs are less relevant.  Capability, interoperability and consistency remain critical factors, but the APIs themselves are not interesting.
  5. Metal aaS gets interesting.  I’m a big believer in the power of operating metal via an API and the RackN team delivers it for private infrastructure using Digital Rebar.  Now there are several companies (Packet.net, Ubiquity Hosting and others) that offer hosted metal.

2016 Container Tech Watch List

I’m planning posts about all these key container ecosystems for 2016.  I think they are all significant contributors to the emerging application life-cycle paradigm.

  1. Service Containers (& VMs): There’s an emerging pattern of infrastructure managed containers that provide critical host services like networking, logging, and monitoring.  I believe this pattern will provide significant value and generate it’s own ecosystem.
  2. Networking & Storage Services: Gaps in networking and storage for containers need to get solved in a consistent way.  Expect a lot of thrash and innovation here.
  3. Container Orchestration Services: This is the current battleground for container mind share.  Kubernetes, Mesos and Docker Swarm get headlines but there are other interesting alternatives.
  4. Containers on Metal: Removing the virtualization layer reduces complexity, overhead and cost.  Container workloads are good choices to re-purpose older servers that have too little CPU or RAM to serve as VM hosts.  Who can say no to free infrastructure?!  While an obvious win to many, we’ll need to make progress on standardized scale and upgrade operations first.
  5. Immutable Infrastructure: Even as this term wins the “most confusing” concept in cloud award, it is an important one for container designers to understand.  The unfortunate naming paradox is that immutable infrastructure drives disciplines that allow fast turnover, better security and more dynamic management.
  6. Microservices: The latest generation of service oriented architecture (SOA) benefits from a new class of distribute service registration platforms (etcd and consul) that bring new life into SOA.
  7. Paywall Registries: The important of container registries is easy to overlook because they seem to be version 2.0 of package caches; however, container layering makes these services much more dynamic and central than many realize.  (more?  Bernard Golden and I already posted about this)

What two items did not make the 2016 cut?  1) Special purpose container-focused operating systems like CoreOS or RancherOS.  While interesting, I don’t think these deployment technologies have architectural level influence.  2) Container Security via VMs. I’m seeing patterns where containers may actually be more secure than VMs.  This is FUD created by people with a vested interest in virtualization.

Did I miss something? I’d love to know what you think I got right or wrong!

2015 Container Review

It’s been a banner year for container awareness and adoption so we wanted to recap 2015.  For RackN, container acceleration is near to our heart because we both enable and use them in fundamental ways.   Look for Rob’s 2016 predictions on his blog.

The RackN team has truly deep and broad experience with containers in practical use.  In the summer, we delivered multiple container orchestration workloads including Docker Swarm, Kubernetes, Cloud Foundry, StackEngine and others.  In the fall, we refactored Digital Rebar to use Docker Compose with dramatic results.  And we’ve been using Docker since 2013 (yes, “way back”) for ops provisioning and development.

To make it easier to review that experience, we are consolidating a list of our container related posts for 2015.

General Container Commentary

RackN & Digital Rebar Related

¡Sí, Sí! That’s a Two Hundred Node Metal Docker Swarm Deployment

Today, RackN and Ubiquity Hosting announced a 200 node Docker Swarm deployment on hosted bare metal.

Leveraging the current Digital Rebar core and the RackN Swarm workload, this reference deployment was automatically configured using the same components that also work on a desktop VM deployment. That high fidelity deployment allows operators to start learning quickly on small systems then grow to AWS and if warranted, potentially smoothly transition to scale metal.

This deployment represents RackN starting a new chapter with Digital Rebar because it demonstrates a commitment deploy on any infrastructure: cloud, metal or something in between.

The RackN team started this journey with a “composable ops” vision that allows operators to mix and match. That spans both vendor physical resources and software components such as operating systems, software defined networking and platforms. In the 200 node Swarm cluster, physical infrastructure is provisioned by Ubiquity Hosting not Digital Rebar or RackN.  Historically, RackN focused on private infrastructure.  Now, users get the option of best-in-class metal deployment without having to own the infrastructure.

We experienced the futility of making Ops homogeneous and declared defeat.

Accepting that each data center has individual Ops was pivotal. Digital Rebar embraces heterogeneity at the most fundamental architectural level. Our system approach and unique composable abstractions allow users to make deployments portable between any infrastructure with existing tooling and operational processes. Portability means that we can both eliminate the fidelity gap as we scale and between deployments.

When multiple scales and sites can share deployment automation, we can finally work together on addressing critical operational issues like scale, high availability and upgrade

This 200 node deployment demonstrates more than scale and the deployment of the latest Docker technology. It is a milestone on the path toward sharable production operations.

Faster, Simpler AND Smaller – Immutable Provisioning with Docker Compose!

Nearly 10 TIMES faster system resets – that’s the result of fully enabling an multi-container immutable deployment on Digital Rebar.

Docker ComposeI’ve been having a “containers all the way down” month since we launched Digital Rebar deployment using Docker Compose. I don’t want to imply that we rubbed Docker on the platform and magic happened. The RackN team spent nearly a year building up the Consul integration and service wrappers for our platform before we were ready to fully migrate.

During the Digital Rebar migration, we took our already service-oriented code base and broke it into microservices. Specifically, the Digital Rebar parts (the API and engine) now run in their own container and each service (DNS, DHCP, Provisioning, Logging, NTP, etc) also has a dedicated container. Likewise, supporting items like Consul and PostgreSQL are, surprise, managed in dedicated containers too. All together, that’s over nine containers and we continue to partition out services.

We use Docker Compose to coordinate the start-up and Consul to wire everything together. Both play a role, but Consul is the critical glue that allows Digital Rebar components to find each other. These were not random choices. We’ve been using a Docker package for over two years and using Consul service registration as an architectural choice for over a year.

Service registration plays a major role in the functional ops design because we’ve been wrapping datacenter services like DNS with APIs. Consul is a separation between providing and consuming the service. Our previous design required us to track the running service. This worked until customers asked for pluggable services (and every customer needs pluggable services as they scale).

Besides being a faster to reset the environment, there are several additional wins:

  1. more transparent in how it operates – it’s obvious which containers provide each service and easy to monitor them as individuals.
  2. easier to distribute services in the environment – we can find where the service runs because of the Consul registration, so we don’t have to manage it.
  3. possible to have redundant services – it’s easy to spin up new services even on the same system
  4. make services pluggable – as long as the service registers and there’s an API, we can replace the implementation.
  5. no concern about which distribution is used – all our containers are Ubuntu user space but the host can be anything.
  6. changes to components are more isolated – changing one service does not require a lot of downloading.

Docker and microservices are not magic but the benefits are real. Be prepared to make architectural investments to realize the gains.