Open Source as Reality TV and Burning Data Centers [gcOnDemand podcast notes]

During the OpenStack summit, Eric Wright (@discoposse) and I talked about a wide range of topics from scoring success of OpenStack early goals to burning down traditional data centers.

Why burn down your data center (and move to public cloud)? Because your ops process are too hard to change. Rob talks about how hybrid provides a path if we can made ops more composable.

Here are my notes from the audio podcast (source):

1:30 Why “zehicle” as a handle? Portmanteau from electrics cars… zero + vehicle

Let’s talk about OpenStack & Cloud…

  • OpenStack History
    • 2:15 Rob’s OpenStack history from Dell and Hyperscale
    • 3:20 Early thoughts of a Cloud API that could be reused
    • 3:40 The practical danger of Vendor lock-in
    • 4:30 How we implemented “no main corporate owner” by choice
  • About the Open in OpenStack
    • 5:20 Rob decomposes what “open” means because there are multiple meanings
    • 6:10 Price of having all open tools for “always open” choice and process
    • 7:10 Observation that OpenStack values having open over delivering product
    • 8:15 Community is great but a trade off. We prioritize it over implementation.
  • Q: 9:10 What if we started later? Would Docker make an impact?
    • Part of challenge for OpenStack was teaching vendors & corporate consumers “how to open source”
  • Q: 10:40 Did we accomplish what we wanted from the first summit?
    • Mixed results – some things we exceeded (like growing community) while some are behind (product adoption & interoperability).
  • 13:30 Interop, Refstack and Defcore Challenges. Rob is disappointed on interop based on implementations.
  • Q: 15:00 Who completes with OpenStack?
    • There are real alternatives. APIs do not matter as much as we thought.
    • 15:50 OpenStack vendor support is powerful
  • Q: 16:20 What makes OpenStack successful?
    • Big tent confuses the ecosystem & push the goal posts out
    • “Big community” is not a good definition of success for the project.
  • 18:10 Reality TV of open source – people like watching train wrecks
  • 18:45 Hybrid is the reality for IT users
  • 20:10 We have a need to define core and focus on composability. Rob has been focused on the link between hybrid and composability.
  • 22:10 Rob’s preference is that OpenStack would be smaller. Big tent is really ecosystem projects and we want that ecosystem to be multi-cloud.

Now, about RackN, bare metal, Crowbar and Digital Rebar….

  • 23:30 (re)Intro
  • 24:30 VC market is not metal friendly even though everything runs on metal!
  • 25:00 Lack of consistency translates into lack of shared ops
  • 25:30 Crowbar was an MVP – the key is to understand what we learned from it
  • 26:00 Digital Rebar started with composability and focus on operations
  • 27:00 What is hybrid now? Not just private to public.
  • 30:00 How do we make infrastructure not matter? Multi-dimensional hybrid.
  • 31:00 Digital Rebar is orchestration for composable infrastructure.
  • Q: 31:40 Do people get it?
    • Yes. Automation is moving to hybrid devops – “ops is ops” and it should not matter if it’s cloud or metal.
  • 32:15 “I don’t want to burn down my data center” – can you bring cloud ops to my private data center?

Notes from OSCON Container Podcast: Dan Berg, Phil Estes and Rob Hirschfeld

At OSCON, I had the pleasure of doing a IBM Dojo Podcast with some deep experts in the container and data center space: Dan Berg (@DanCBerg) and Phil Estes (@estesp).

ibm-dojo-podcast-show-art-16x9-150x150We dove into a discussion around significant trends in the container space, how open technology relates to containers and looked toward the technology’s future. We also previewed next month’s DockerCon, which is set for June 19-21 in Seattle.

Highlights!  We think containers will be considered MORE SECURE next year and also have some comments about the linguistic shift from Docker to CONTAINERS.”

Here are my notes from the recording with time stamps if you want to skip ahead:

  • 00:35 – What are the trends in Containers?
    • Rob: We are still figuring out how to make them work in terms of networking & storage
    • Dan: There are still a lot of stateful work moving into containers that need storage
    • Phil: We need to use open standards to help customers navigate options
  • 2:45 – Are the changes keeping people from moving forward?
    • Phil: Not if you start with the right guidelines and architecture
    • Dan: It’s OK to pick one and keep going because you need to build expertise
    • Rob: RackN experience changed Digital Rebar to microservices was an iterative experience
  • 5:00 Dan likes that there is so much experimentation that’s forcing us to talk about how applications are engineered
  • 5:45  Rob points out that we got 5 minutes in without saying “Docker”
    • There are a lot of orchestration choices but there’s confusion between Docker and the container ecosystem.
  • 7:00 We’re at OSCON, how far has the technology come in being open?
    • Phil thinks that open container initiative (OCI) is helping bring a lot of players to the field.
    • Dan likes that IBM is experimenting in community and drive interactions between projects.
    • Rob is not sure that we need to get everyone on the same page: open source allows people to pursue their own path.
  • 10:50 We have to figure out how to compensate companies & individuals for their work
    • Dan: if you’ve got any worthwhile product, you’ve got some open source component of it.  There are various ways to profit around that.
  • 13:00 What are we going to be talking about this time next year?
    • Rob (joking) we’ll say containers are old and microkernels are great!
    • Rob wants to be talking about operations but knows that it’s never interesting
    • Phil moving containers way from root access into more secure operations
    • Dan believes that we’ll start to consider containers as more secure than what we have today.  <- Rob strongly agrees!
  • 17:20 What is the impact of Containers on Ops?  Aka DevOps
    • Dan said “Impact is HUGE!”  > Developers are going to get Ops & Capabilities for free
    • Rob brings up impact of Containers on DevOps – the discussion has really gone underground
  • 19:30 Role of Service Registration (Consul & Etcd)
    • Life cycle management of Containers has really changed (Dan)
    • Rob brings up the importance of Service Registration in container management
  • 20:30 2016.Dockercon Docket- what are you expecting?
    • Phil is speaking there on the contribute track & OCI.
    • Rob is doing the hallway track and looking to talk about the “underlay” ops and the competitive space around Docker and Container.
    • Dan will be talking to customers and watching how the community is evolving and experimenting
    • Rob & Dan will be at Open Cloud Technology Summit, June 22 in Seattle

 

5 Key Aspects of High Fidelity DevOps [repost from DevOps.com]

For all our cloud enthusiasm, I feel like ops automation is suffering as we increase choice and complexity.  Why is this happening?  It’s about loss of fidelity.

Nearly a year ago, I was inspired by a mention of “Fidelity Gaps” during a Cloud Foundry After Dark session.  With additional advice from DevOps leader Gene Kim, this narrative about the why and how of DevOps Fidelity emerged.

As much as we talk about how we should have shared goals spanning Dev and Ops, it’s not nearly as easy as it sounds. To fuel a DevOps culture, we have to build robust tooling, also.

That means investing up front in five key areas: abstraction, composability, automation, orchestration, and idempotency.

Together, these concepts allow sharing work at every level of the pipeline. Unfortunately, it’s tempting to optimize work at one level and miss the true system bottlenecks.

Creating production-like fidelity for developers is essential: We need it for scale, security and upgrades. It’s not just about sharing effort; it’s about empathy and collaboration.

But even with growing acceptance of DevOps as a cultural movement, I believe deployment disparities are a big unsolved problem. When developers have vastly different working environments from operators, it creates a “fidelity gap” that makes it difficult for the teams to collaborate.

Before we talk about the costs and solutions, let me first share a story from back when I was a bright-eyed OpenStack enthusiast…

Read the Full Article on DevOps.com including my section about Why OpenStack Devstack harms the project and five specific ways to improve DevOps fidelity.

my 8 steps that would improve OpenStack Interop w/ AWS

I’ve been talking with a lot of OpenStack people about frustrating my attempted hybrid work on seven OpenStack clouds [OpenStack Session Wed 2:40].  This post documents the behavior Digital Rebar expects from the multiple clouds that we have integrated with so far.  At RackN, we use this pattern for both cloud and physical automation.

Sunday, I found myself back in front of the the Board talking about the challenge that implementation variation creates for users.  Ultimately, the question “does this harm users?” is answered by “no, they just leave for Amazon.”

I can’t stress this enough: it’s not about APIs!  The challenge is twofold: implementation variance between OpenStack clouds and variance between OpenStack and AWS.

The obvious and simplest answer is that OpenStack implementers need to conform more closely to AWS patterns (once again, NOT the APIs).

Here are the eight Digital Rebar node allocation steps [and my notes about general availability on OpenStack clouds]:

  1. Add node specific SSH key [YES]
  2. Get Metadata on Networks, Flavors and Images [YES]
  3. Pick correct network, flavors and images [NO, each site is distinct]
  4. Request node [YES]
  5. Get node PUBLIC address for node [NO, most OpenStack clouds do not have external access by default]
  6. Login into system using node SSH key [PARTIAL, the account name varies]
  7. Add root account with Rebar SSH key(s) and remove password login [PARTIAL, does not work on some systems]
  8. Remove node specific SSH key [YES]

These steps work on every other cloud infrastructure that we’ve used.  And they are achievable on OpenStack – DreamHost delivered this experience on their new DreamCompute infrastructure.

I think that this is very achievable for OpenStack, but we’re doing to have to drive conformance and figure out an alternative to the Floating IP (FIP) pattern (IPv6, port forwarding, or adding FIPs by default) would all work as part of the solution.

For Digital Rebar, the quick answer is to simply allocate a FIP for every node.  We can easily make this a configuration option; however, it feels like a pattern fail to me.  It’s certainly not a requirement from other clouds.

I hope this post provides specifics about delivering a more portable hybrid experience.  What critical items do you want as part of your cloud ops process?

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.

 

At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!

 

SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters

Originally posted on Kubernetes Blog.  I wanted to repost here because it’s part of the RackN ongoing efforts to focus on operational and fidelity gap challenges early.  Please join us in this effort!

openWe think Kubernetes is an awesome way to run applications at scale! Unfortunately, there’s a bootstrapping problem: we need good ways to build secure & reliable scale environments around Kubernetes. While some parts of the platform administration leverage the platform (cool!), there are fundamental operational topics that need to be addressed and questions (like upgrade and conformance) that need to be answered.

Enter Cluster Ops SIG – the community members who work under the platform to keep it running.

Our objective for Cluster Ops is to be a person-to-person community first, and a source of opinions, documentation, tests and scripts second. That means we dedicate significant time and attention to simply comparing notes about what is working and discussing real operations. Those interactions give us data to form opinions. It also means we can use real-world experiences to inform the project.

We aim to become the forum for operational review and feedback about the project. For Kubernetes to succeed, operators need to have a significant voice in the project by weekly participation and collecting survey data. We’re not trying to create a single opinion about ops, but we do want to create a coordinated resource for collecting operational feedback for the project. As a single recognized group, operators are more accessible and have a bigger impact.

What about real world deliverables?

We’ve got plans for tangible results too. We’re already driving toward concrete deliverables like reference architectures, tool catalogs, community deployment notes and conformance testing. Cluster Ops wants to become the clearing house for operational resources. We’re going to do it based on real world experience and battle tested deployments.

Connect with us.

Cluster Ops can be hard work – don’t do it alone. We’re here to listen, to help when we can and escalate when we can’t. Join the conversation at:

The Cluster Ops Special Interest Group meets weekly at 13:00PT on Thursdays, you can join us via the video hangout and see latest meeting notes for agendas and topics covered.

Fast Talk: Creating Operating Environments that Span Clouds and Physical Infrastructures

This short 15-minute talk pulls together a few themes around composability that you’ll see in future blogs where I lay out the challenges and solutions for hybrid DevOps practices.  Like any DevOps concept – it’s a mix of technology, attitude (culture) and process.

Our hybrid DevOps objective is simple: We need multi-infrastructure Amazon equivalence for ops automation.

IT perspective of AWSHere’s the summary:

  • Hybrid Infrastructure is new normal
  • Amazon is the Ops benchmark
  • Embrace operations automation
  • Invest in making IT composable

 

Want to listen to it?  Here’s the voice over: