my 8 steps that would improve OpenStack Interop w/ AWS

I’ve been talking with a lot of OpenStack people about frustrating my attempted hybrid work on seven OpenStack clouds [OpenStack Session Wed 2:40].  This post documents the behavior Digital Rebar expects from the multiple clouds that we have integrated with so far.  At RackN, we use this pattern for both cloud and physical automation.

Sunday, I found myself back in front of the the Board talking about the challenge that implementation variation creates for users.  Ultimately, the question “does this harm users?” is answered by “no, they just leave for Amazon.”

I can’t stress this enough: it’s not about APIs!  The challenge is twofold: implementation variance between OpenStack clouds and variance between OpenStack and AWS.

The obvious and simplest answer is that OpenStack implementers need to conform more closely to AWS patterns (once again, NOT the APIs).

Here are the eight Digital Rebar node allocation steps [and my notes about general availability on OpenStack clouds]:

  1. Add node specific SSH key [YES]
  2. Get Metadata on Networks, Flavors and Images [YES]
  3. Pick correct network, flavors and images [NO, each site is distinct]
  4. Request node [YES]
  5. Get node PUBLIC address for node [NO, most OpenStack clouds do not have external access by default]
  6. Login into system using node SSH key [PARTIAL, the account name varies]
  7. Add root account with Rebar SSH key(s) and remove password login [PARTIAL, does not work on some systems]
  8. Remove node specific SSH key [YES]

These steps work on every other cloud infrastructure that we’ve used.  And they are achievable on OpenStack – DreamHost delivered this experience on their new DreamCompute infrastructure.

I think that this is very achievable for OpenStack, but we’re doing to have to drive conformance and figure out an alternative to the Floating IP (FIP) pattern (IPv6, port forwarding, or adding FIPs by default) would all work as part of the solution.

For Digital Rebar, the quick answer is to simply allocate a FIP for every node.  We can easily make this a configuration option; however, it feels like a pattern fail to me.  It’s certainly not a requirement from other clouds.

I hope this post provides specifics about delivering a more portable hybrid experience.  What critical items do you want as part of your cloud ops process?

Hybrid DevOps: Union of Configuration, Orchestration and Composability

Steven Spector and I talked about “Hybrid DevOps” as a concept.  Our discussion led to a ‘there’s a picture for that!’ moment that often helped clarify the concept.  We believe that this concept, like Rugged DevOps, is additive to existing DevOps thinking and culture.  It’s about expanding our thinking to include orchestration and composability.

Hybrid DevOps 3 components (1)Here’s our write-up: Hybrid DevOps: Union of Configuration, Orchestration and Composability

Hybrid & Container Disruption [Notes from CTP Mike Kavis’ Interview]

Last week, Cloud Technology Partner VP Mike Kavis (aka MadGreek65) and I talked for 30 minutes about current trends in Hybrid Infrastructure and Containers.

leadership-photos-mike

Mike Kavis

Three of the top questions that we discussed were:

  1. Why Composability is required for deployment?  [5:45]
  2. Is Configuration Management dead? [10:15]
  3. How can containers be more secure than VMs? [23:30]

Here’s the audio matching the time stamps in my notes:

  • 00:44: What is RackN? – scale data center operations automation
  • 01:45: Digital Rebar is… 3rd generation provisioning to manage data center ops & bring up
  • 02:30: Customers were struggling on Ops more than code or hardware
  • 04:00: Rethinking “open” to include user choice of infrastructure, not just if the code is open source.
  • 05:00: Use platforms where it’s right for users.
  • 05:45: Composability – it’s how do we deal with complexity. Hybrid DevOps
  • 06:40: How do we may Ops more portable
  • 07:00: Five components of Hybrid DevOps
  • 07:27: Rob has “Rick Perry” Moment…
  • 08:30: 80/20 Rule for DevOps where 20% is mixed.
  • 10:15: “Is configuration management dead” > Docker does hurt Configuration Management
  • 11:00: How Service Registry can replace Configuration.
  • 11:40: Reference to John Willis on the importance of sequence.
  • 12:30: Importance of Sequence, Services & Configuration working together
  • 12:50: Digital Rebar intermixes all three
  • 13:30: The race to have orchestration – “it’s always been there”
  • 14:30: Rightscale Report > Enterprises average SIX platforms in use
  • 15:30: Fidelity Gap – Why everyone will hybrid but need to avoid monoliths
  • 16:50: Avoid hybrid trap and keep a level of abstraction
  • 17:41: You have to pay some “abstraction tax” if you want to hybrid BUT you can get some additional benefits: hybrid + ops management.
  • 18:00: Rob gives a shout out to Rightscale
  • 19:20: Rushing to solutions does not create secure and sustained delivery
  • 20:40: If you work in a silo, you loose the ability to collaborate and reuse other works
  • 21:05: Rob is sad about “OpenStack explosion of installers”
  • 21:45: Container benefit from services containers – how they can be MORE SECURE
  • 23:00: Automation required for security
  • 23:30: How containers will be more secure than containers
  • 24:30: Rob bring up “cheese” again…
  • 26:15: If you have more situationalleadership-photos-mike awareness, you can be more secure WITHOUT putting more work for developers.
  • 27:00: Containers can help developers worry about as many aspects of Ops
  • 27:45: Wrap up

What do you think?  I’d love to hear your opinion on these topics!

Composability is Critical in DevOps: let’s break the monoliths

This post was inspired by my DevOps.com Git for DevOps post and is an evolution of my “Functional Ops (the cake is a lie)” talks.

git_logo2016 is the year we break down the monoliths.  We’ve spent a lot of time talking about monolithic applications and microservices; however, there’s an equally deep challenge in ops automation.

Anti-monolith composability means making our automation into function blocks that can be chained together by orchestration.

What is going wrong?  We’re building fragile tightly coupled automation.

Most of the automation scripts that I’ve worked with become very long interconnected sequences well beyond the actual application that they are trying to install.  For example, Kubernetes needs etcd as a datastore.  The current model is to include the etcd install in the install script.  The same is true for SDN install/configuation and post-install test and dashboard UIs.  The simple “install Kubernetes” quickly explodes into a kitchen sink of related adjacent components.

Those installs quickly become fragile and bloated.  Even worse, they have hidden dependencies.  What happens when etcd changes.  Now, we’ve got to track down all the references to it burried in etcd based applications.  Further, we don’t get the benefits of etcd deployment improvements like secure or scale configuration.

What can we do about it?  Resist the urge to create vertical silos.

It’s temping and fast to create automation that works in a very prescriptive way for a single platform, operating system and tool chain.  The work of creating abstractions between configuration steps seems like a lot of overhead.  Even if you create those boundaries or reuse upstream automation, you’re likely to be vulnerable to changes within that component.  All these concerns drive operators to walk away from working collaboratively with each other and with developers.

Giving up on collaborative Ops hurts us all and makes it impossible to engineer excellent operational tools.  

Don’t give up!  Like git for development, we can do this together.

Post-OpenStack DefCore, I’m Chasing “open infrastructure” via cross-platform Interop

Like my previous DefCore interop windmill tilting, this is not something that can be done alone. Open infrastructure is a collaborative effort and I’m looking for your help and support. I believe solving this problem benefits us as an industry and individually as IT professionals.

2013-09-13_18-56-39_197So, what is open infrastructure?   It’s not about running on open source software. It’s about creating platform choice and control. In my experience, that’s what defines open for users (and developers are not users).

I’ve spent several years helping lead OpenStack interoperability (aka DefCore) efforts to ensure that OpenStack cloud APIs are consistent between vendors. I strongly believe that effort is essential to build an ecosystem around the project; however, in talking to enterprise users, I’ve learned that that their  real  interoperability gap is between that many platforms, AWS, Google, VMware, OpenStack and Metal, that they use everyday.

Instead of focusing inward to one platform, I believe the bigger enterprise need is to address automation across platforms. It is something I’m starting to call hybrid DevOps because it allows users to mix platforms, service APIs and tools.

Open infrastructure in that context is being able to work across platforms without being tied into one platform choice even when that platform is based on open source software. API duplication is not sufficient: the operational characteristics of each platform are different enough that we need a different abstraction approach.

We have to be able to compose automation in a way that tolerates substitution based on infrastructure characteristics. This is required for metal because of variation between hardware vendors and data center networking and services. It is equally essential for cloud because of variation between IaaS capabilities and service delivery models. Basically, those  minor  differences between clouds create significant challenges in interoperability at the operational level.

Rationalizing APIs does little to address these more structural differences.

The problem is compounded because the differences are not nicely segmented behind abstraction layers. If you work to build and sustain a fully integrated application, you must account for site specific needs throughout your application stack including networking, storage, access and security. I’ve described this as all deployments have 80% of the work common but the remaining 20% is mixed in with the 80% instead of being nicely layers. So, ops is cookie dough not vinaigrette.

Getting past this problem for initial provisioning on a single platform is a false victory. The real need is portable and upgrade-ready automation that can be reused and shared. Critically, we also need to build upon the existing foundations instead of requiring a blank slate. There is openness value in heterogeneous infrastructure so we need to embrace variation and design accordingly.

This is the vision the RackN team has been working towards with open source Digital Rebar project. We now able to showcase workload deployments (Docker, Kubernetes, Ceph, etc) on multiple cloud platforms that also translate to full bare metal deployments. Unlike previous generations of this tooling (some will remember Crowbar), we’ve been careful to avoid injecting external dependencies into the DevOps scripts.

While we’re able to demonstrate a high degree of portability (or fidelity) across multiple platforms, this is just the beginning. We are looking for users and collaborators who want to want to build open infrastructure from an operational perspective.

You are invited to join us in making open cross-platform operations a reality.