5 Key Aspects of High Fidelity DevOps [repost from DevOps.com]

For all our cloud enthusiasm, I feel like ops automation is suffering as we increase choice and complexity.  Why is this happening?  It’s about loss of fidelity.

Nearly a year ago, I was inspired by a mention of “Fidelity Gaps” during a Cloud Foundry After Dark session.  With additional advice from DevOps leader Gene Kim, this narrative about the why and how of DevOps Fidelity emerged.

As much as we talk about how we should have shared goals spanning Dev and Ops, it’s not nearly as easy as it sounds. To fuel a DevOps culture, we have to build robust tooling, also.

That means investing up front in five key areas: abstraction, composability, automation, orchestration, and idempotency.

Together, these concepts allow sharing work at every level of the pipeline. Unfortunately, it’s tempting to optimize work at one level and miss the true system bottlenecks.

Creating production-like fidelity for developers is essential: We need it for scale, security and upgrades. It’s not just about sharing effort; it’s about empathy and collaboration.

But even with growing acceptance of DevOps as a cultural movement, I believe deployment disparities are a big unsolved problem. When developers have vastly different working environments from operators, it creates a “fidelity gap” that makes it difficult for the teams to collaborate.

Before we talk about the costs and solutions, let me first share a story from back when I was a bright-eyed OpenStack enthusiast…

Read the Full Article on DevOps.com including my section about Why OpenStack Devstack harms the project and five specific ways to improve DevOps fidelity.

my 8 steps that would improve OpenStack Interop w/ AWS

I’ve been talking with a lot of OpenStack people about frustrating my attempted hybrid work on seven OpenStack clouds [OpenStack Session Wed 2:40].  This post documents the behavior Digital Rebar expects from the multiple clouds that we have integrated with so far.  At RackN, we use this pattern for both cloud and physical automation.

Sunday, I found myself back in front of the the Board talking about the challenge that implementation variation creates for users.  Ultimately, the question “does this harm users?” is answered by “no, they just leave for Amazon.”

I can’t stress this enough: it’s not about APIs!  The challenge is twofold: implementation variance between OpenStack clouds and variance between OpenStack and AWS.

The obvious and simplest answer is that OpenStack implementers need to conform more closely to AWS patterns (once again, NOT the APIs).

Here are the eight Digital Rebar node allocation steps [and my notes about general availability on OpenStack clouds]:

  1. Add node specific SSH key [YES]
  2. Get Metadata on Networks, Flavors and Images [YES]
  3. Pick correct network, flavors and images [NO, each site is distinct]
  4. Request node [YES]
  5. Get node PUBLIC address for node [NO, most OpenStack clouds do not have external access by default]
  6. Login into system using node SSH key [PARTIAL, the account name varies]
  7. Add root account with Rebar SSH key(s) and remove password login [PARTIAL, does not work on some systems]
  8. Remove node specific SSH key [YES]

These steps work on every other cloud infrastructure that we’ve used.  And they are achievable on OpenStack – DreamHost delivered this experience on their new DreamCompute infrastructure.

I think that this is very achievable for OpenStack, but we’re doing to have to drive conformance and figure out an alternative to the Floating IP (FIP) pattern (IPv6, port forwarding, or adding FIPs by default) would all work as part of the solution.

For Digital Rebar, the quick answer is to simply allocate a FIP for every node.  We can easily make this a configuration option; however, it feels like a pattern fail to me.  It’s certainly not a requirement from other clouds.

I hope this post provides specifics about delivering a more portable hybrid experience.  What critical items do you want as part of your cloud ops process?

SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters

Originally posted on Kubernetes Blog.  I wanted to repost here because it’s part of the RackN ongoing efforts to focus on operational and fidelity gap challenges early.  Please join us in this effort!

openWe think Kubernetes is an awesome way to run applications at scale! Unfortunately, there’s a bootstrapping problem: we need good ways to build secure & reliable scale environments around Kubernetes. While some parts of the platform administration leverage the platform (cool!), there are fundamental operational topics that need to be addressed and questions (like upgrade and conformance) that need to be answered.

Enter Cluster Ops SIG – the community members who work under the platform to keep it running.

Our objective for Cluster Ops is to be a person-to-person community first, and a source of opinions, documentation, tests and scripts second. That means we dedicate significant time and attention to simply comparing notes about what is working and discussing real operations. Those interactions give us data to form opinions. It also means we can use real-world experiences to inform the project.

We aim to become the forum for operational review and feedback about the project. For Kubernetes to succeed, operators need to have a significant voice in the project by weekly participation and collecting survey data. We’re not trying to create a single opinion about ops, but we do want to create a coordinated resource for collecting operational feedback for the project. As a single recognized group, operators are more accessible and have a bigger impact.

What about real world deliverables?

We’ve got plans for tangible results too. We’re already driving toward concrete deliverables like reference architectures, tool catalogs, community deployment notes and conformance testing. Cluster Ops wants to become the clearing house for operational resources. We’re going to do it based on real world experience and battle tested deployments.

Connect with us.

Cluster Ops can be hard work – don’t do it alone. We’re here to listen, to help when we can and escalate when we can’t. Join the conversation at:

The Cluster Ops Special Interest Group meets weekly at 13:00PT on Thursdays, you can join us via the video hangout and see latest meeting notes for agendas and topics covered.

Fast Talk: Creating Operating Environments that Span Clouds and Physical Infrastructures

This short 15-minute talk pulls together a few themes around composability that you’ll see in future blogs where I lay out the challenges and solutions for hybrid DevOps practices.  Like any DevOps concept – it’s a mix of technology, attitude (culture) and process.

Our hybrid DevOps objective is simple: We need multi-infrastructure Amazon equivalence for ops automation.

IT perspective of AWSHere’s the summary:

  • Hybrid Infrastructure is new normal
  • Amazon is the Ops benchmark
  • Embrace operations automation
  • Invest in making IT composable

 

Want to listen to it?  Here’s the voice over:

 

Problems with the “Give me a Wookiee” hybrid API

Greg Althaus, RackN CTO, creates amazing hybrid DevOps orchestration that spans metal and cloud implementations.  When it comes to knowing the nooks and crannies of data centers, his ops scar tissue has scar tissue.  So, I knew you’d all enjoy this funny story he wrote after previewing my OpenStack API report.  

“APIs are only valuable if the parameters mean the same thing and you get back what you expect.” Greg Althaus

The following is a guest post by Greg:

While building the Digital Rebar OpenStack node provider, Rob Hirschfeld tried to integrate with 7+ OpenStack clouds.  While the APIs matched across instances, there are all sorts of challenges with what comes out of the API calls.  

The discovery made me realize that APIs are not the end of interoperability.  They are the beginning.  

I found I could best describe it with a story.

I found an API on a service and that API creates a Wookiee!

I can tell the API that I want a tall or short Wookiee or young or old Wookiee.  I test against the Kashyyyk service.  I consistently get a 8ft Brown 300 year old Wookiee when I ask for a Tall Old Wookiee.  

I get a 6ft Brown 50 Year old Wookiee when I ask for a Short Young Wookiee.  Exactly what I want, all the time.  

My pointy-haired emperor boss says I need to now use the Forest Moon of Endor (FME) Service.  He was told it is the exact same thing but cheaper.  Okay, let’s do this.  It consistently gives me 5 year old 4 ft tall Brown Ewok (called a Wookiee) when I ask for the Tall Young Wookiee.  

This is a fail.  I mean, yes, they are both furry and brown, but the Ewok can’t reach the top of my bookshelf.  

The next service has to work, right?  About the same price as FME, the Tatooine Service claims to be really good too.  It passes tests.  It hands out things called Wookiees.  The only problem is that, while size is an API field, the service requires the use of petite and big instead of short and tall.  This is just annoying.  This time my tall (well big) young Wookiee is 8 ft tall and 50 years old, but it is green and bald (scales are like that).  

I don’t really know what it is.  I’m sure it isn’t a Wookiee.  

And while she is awesome (better than the male Wookiees), she almost froze to death in the arctic tundra that is Boston.  

My point: APIs are only valuable if the parameters mean the same thing and you get back what you expect.

 

Hybrid DevOps: Union of Configuration, Orchestration and Composability

Steven Spector and I talked about “Hybrid DevOps” as a concept.  Our discussion led to a ‘there’s a picture for that!’ moment that often helped clarify the concept.  We believe that this concept, like Rugged DevOps, is additive to existing DevOps thinking and culture.  It’s about expanding our thinking to include orchestration and composability.

Hybrid DevOps 3 components (1)Here’s our write-up: Hybrid DevOps: Union of Configuration, Orchestration and Composability

Is Hybrid DevOps Like The Tokyo Metro?

I LOVE OPS ANALOGIES!  The “Hybrid DevOps = Tokyo Metro” really works because it accepts that some complexity is inescapable.  It would be great if Tokyo was a single system, but it’s not.  Cloud and infrastructure are the same – they are not a single vendor system and going to converge.

With that intro…Dan Choquette writes how DevOps at scale like a major city’s subway system? Both require strict processes and operational excellence to move a lot of different parts at once. How else? If you had …

Source: Is Hybrid DevOps Like The Tokyo Metro?