Bugs Bunny, Prince and Enabling True Hybrid Infrastructure Consumption

OK- Stay with me on this. I’m drawing parallels again.  🙂

Like many from my generation, my initial exposure to classical music and opera was derived from Bugs Bunny on Saturday mornings (culturally deprived, I know). One of the cartoons I remember well was with Bugs trying to get even with the heavy-set opera singer who disrupts Bugs’ banjo playing. In order to exact his revenge, Bugs infiltrates the opera singer’s concert by impersonating the famous long-hared (hared…get it?) conductor, Leopold Stokowski. He proceeds to force the tenor to hit octaves that structurally compromise the amphitheater and as it crumbles leaves him bruised and battered. Bugs is as always, victorious.

bugs

In examining Bugs’ strategy (let’s assume he actually had one), Bugs took over operations of the orchestra’s musical program to achieve his goal of getting the tenor “in-line” so to speak. As I prepare to head down to the OpenStack Conference in Austin, TX next week, I’m seeing similar patterns develop in the cloud and data center infrastructure space which are very “Bugs/Leopold-like”. With organizations deciding on how to consolidate data centers, containerize apps and move to the cloud, vendors and open source technologies offer value, however true operational, infrastructure and platform independence are not what they appear to be. For example, once you move your apps off the data center to AWS or VMware and then later determine you are paying too much or the workload is no longer is appropriate for the infrastructure, good luck replicating the configuration work done on CloudFormation on another cloud or back in the data center. Same rationale is applicable to other technologies such as converged infrastructure and proprietary private cloud platforms. As the customer, to achieve scale and remove operational pain you must fall in line. That in itself is a big commitment to make in a still-evolving and maturing technology industry and a dynamic business climate.

On an unrelated topic, I was saddened to learn of the passing of Prince this past week. While not a die-hard fan, I liked his music. He was a great composer of songs and had a style all to his own. Beyond his music and sheer talent, I admired his business beliefs and deep desire to maintain creative ownership and control of his music and his brand.

princeDespite his fortune and fame, there was a period in the middle of Prince’s career in which he felt creatively and financially locked-in by the big record companies. Once Prince (and the unpronounceable symbol) broke away from Warner Music, he was able to produce music under his own label. This action enabled him to create music without a major record label dictating when he needed to produce a new album and what it needed to sound like. In addition, he was now able to market his new recordings to the distribution platform that supported his artistic and financial goals. While still having ties to Warner Music, he was no longer bound by their business practices. Along with starting his own music subscription service, Prince cut deals with Arista, Columbia, iTunes and Sony. Prince’s music production had operational portability, business agility and choice (seven Grammy awards and 100 million record sales also help create that kind of leverage.).

While open APIs and containers offer some portability, at RackN we believe they do not offer a completely free market experience to the cloud and infrastructure consumer. If the business decides it is paying too much for AWS, it should not allow for the operational underlay and configuration complexity to lock them to the infrastructure provider. They should be able to transfer their business to Google, Azure, Rackspace or Dreamhost with ease. We believe technologies that create portable, composable operational workflows drive true infrastructure and platform independence and as a benefit, reduces business risk. Choosing a platform and being forced to use it are two very different things.

In conclusion, when considering moving workloads to the cloud, converged infrastructure platforms or using DevOps automation tools, consider how you can achieve programmable operational portability and agility. Think about how you can best absorb new technologies without causing operational disruption in your infrastructure. Furthermore, ensure you can accomplish this in a repeatable, automated fashion. Analyze how you can abstract away complex configurations for security, networking and container orchestration technologies and make them adaptable from one infrastructure platform to another. Attempt to eliminate configuration versioning as much as possible and make upgrades simplistic and automated so your DevOps staff does not have to be experts (they are stressed out enough.).

If you are attending the OpenStack Conference this week, look me up. While I am far from a music expert, i’ll be happy to share with you my insights on how to spot a technology vendor that likes to play a purple guitar as opposed to one that eats carrots and plays the banjo.

-Dan Choquette: Co-Founder, RackN

 

 

 

OpenStack is caught in a snowstorm – it’s status quo for ops implementations to be snowflakes

OpenStack got into exactly the place we expected: operations started with fragmented and divergent data centers (aka snowflaked) and OpenStack did nothing to change that. Can we fix that? Yes, but the answer involves relying on Amazon as our benchmark.

In advance of my OpenStack Summit Demo/Presentation (video!) [slides], I’ve spent the last few weeks mapping seven (and counting) OpenStack implementations into the cloud provider subsystem of the Digital Rebar provisioning platform. Before I started working on adding OpenStack integration, RackN already created a hybrid DevOps baseline. We are able to run the same Kubernetes and Docker Swarm provisioning extensions on multiple targets including Amazon, Google, Packet and directly on physical systems (aka metal).

Before we talk about OpenStack challenges, it’s important to understand that data centers and clouds are messy, heterogeneous environments.

These variations are so significant and operationally challenging that they are the fundamental design driver for Digital Rebar. The platform uses a composable operational approach to isolate and then chain automation tasks together. That allows configurations, like networking, from infrastructure specific functions to be passed into common building blocks without user intervention.

Composability is critical because it allows operators to isolate variations into modular pieces and the expose common configuration elements. Since the pattern works successfully for crossing other clouds and metal, I anticipated success with OpenStack.

The challenge is that there is not “one standard OpenStack” implementation.  This issue is well documented under OpenStack as Project Shade.

If you only plan to operate a mono-cloud then these are not concerns; however, everyone I’ve met is using at least AWS and one other cloud. This operational fact means that AWS provides the common service behavior baseline. This is not an API statement – it’s about being able to operate on the systems delivered by the API.

While the OpenStack API worked consistently on each tested cloud (win for DefCore!), it frequently delivered systems that could not be deployed or were unusable for later steps.

While these are not directly OpenStack API concerns, I do believe that additional metadata in the API could help expose material configuration choices. The challenge becomes defining those choices in a reference architecture way. The OpenStack principle of leaving implementation choices open makes it challenging to drive these options to a narrow set of choices. Unfortunately, it means it is difficult to create an intra-OpenStack hybrid automation without hard-coded vendor identities or exploding configuration flags.

As series of individually reasonable options dominoes together to make to these challenges.  These are real issues that I made the integration difficult.

  • No default of externally accessible systems. I have to assign floating IPs (an anti-pattern for individual VMs) or be on the internal networks. No consistent naming pattern for networks, types (flavors) or starting images.  In several cases, the “private” network is the publicly accessible one and the “external” network is visible but unusable.
  • No consistent naming for access user accounts.  If I want to ssh to a system, I have to fail my first login before I learn the right user name.
  • No data to determine which networks provide which functions.  And there’s no metadata about which networks are public or private.  
  • Incomplete post-provisioning processes because they are left open to user customization.

There is a defensible and logical reason for each example above; sadly, those reasons do nothing to make OpenStack more operationally accessible.  While intra-OpenStack interoperability is helpful, I believe that ecosystems and users benefit from Amazon-like behavior.

What should you do?  Help broaden the OpenStack discussions to seek interoperability with the whole cloud ecosystem.

 

At RackN, we will continue to refine and adapt to these variations.  Creating a consistent experience that copes with variability is the raison d’etre for our efforts with Digital Rebar. That means that we ultimately use AWS as the yardstick for configuration of any infrastructure from physical, OpenStack and even Amazon!

 

SIG-ClusterOps: Promote operability and interoperability of Kubernetes clusters

Originally posted on Kubernetes Blog.  I wanted to repost here because it’s part of the RackN ongoing efforts to focus on operational and fidelity gap challenges early.  Please join us in this effort!

openWe think Kubernetes is an awesome way to run applications at scale! Unfortunately, there’s a bootstrapping problem: we need good ways to build secure & reliable scale environments around Kubernetes. While some parts of the platform administration leverage the platform (cool!), there are fundamental operational topics that need to be addressed and questions (like upgrade and conformance) that need to be answered.

Enter Cluster Ops SIG – the community members who work under the platform to keep it running.

Our objective for Cluster Ops is to be a person-to-person community first, and a source of opinions, documentation, tests and scripts second. That means we dedicate significant time and attention to simply comparing notes about what is working and discussing real operations. Those interactions give us data to form opinions. It also means we can use real-world experiences to inform the project.

We aim to become the forum for operational review and feedback about the project. For Kubernetes to succeed, operators need to have a significant voice in the project by weekly participation and collecting survey data. We’re not trying to create a single opinion about ops, but we do want to create a coordinated resource for collecting operational feedback for the project. As a single recognized group, operators are more accessible and have a bigger impact.

What about real world deliverables?

We’ve got plans for tangible results too. We’re already driving toward concrete deliverables like reference architectures, tool catalogs, community deployment notes and conformance testing. Cluster Ops wants to become the clearing house for operational resources. We’re going to do it based on real world experience and battle tested deployments.

Connect with us.

Cluster Ops can be hard work – don’t do it alone. We’re here to listen, to help when we can and escalate when we can’t. Join the conversation at:

The Cluster Ops Special Interest Group meets weekly at 13:00PT on Thursdays, you can join us via the video hangout and see latest meeting notes for agendas and topics covered.

AWS Ops patterns set the standard: embrace that and accelerate

RackN creates infrastructure agnostic automation so you can run physical and cloud infrastructure with the same elastic operational patterns.  If you want to make infrastructure unimportant then your hybrid DevOps objective is simple:

Create multi-infrastructure Amazon equivalence for ops automation.

Ecosystem View of AWSEven if you are not an AWS fan, they are the universal yardstick (15 minute & 40 minute presos) That goes for other clouds (public and private) and for physical infrastructure too. Their footprint is simply so pervasive that you cannot ignore “works on AWS” as a need even if you don’t need to work on AWS.  Like PCs in the late-80s, we can use vendor competition to create user choice of infrastructure. That requires a baseline for equivalence between the choices. In the 90s, the Windows’ monopoly provided those APIs.

Why should you care about hybrid DevOps? As we increase operational portability, we empower users to make economic choices that foster innovation.  That’s valuable even for AWS locked users.

We’re not talking about “give me a VM” here! The real operational need is to build accessible, interconnected systems – what is sometimes called “the underlay.” It’s more about networking, configuration and credentials than simple compute resources. We need consistent ways to automate systems that can talk to each other and static services, have access to dependency repositories (code, mirrors and container hubs) and can establish trust with other systems and administrators.

These “post” provisioning tasks are sophisticated and complex. They cannot be statically predetermined. They must be handled dynamically based on the actual resource being allocated. Without automation, this process becomes manual, glacial and impossible to maintain. Does that sound like traditional IT?

Side Note on Containers: For many developers, we are adding platforms like Docker, Kubernetes and CloudFoundry, that do these integrations automatically for their part of the application stack. This is a tremendous benefit for their use-cases. Sadly, hiding the problem from one set of users does not eliminate it! The teams implementing and maintaining those platforms still have to deal with underlay complexity.

I am emphatically not looking for AWS API compatibility: we are talking about emulating their service implementation choices.  We have plenty of ways to abstract APIs. Ops is a post-API issue.

In fact, I believe that red herring leads us to a bad place where innovation is locked behind legacy APIs.  Steal APIs where it makes sense, but don’t blindly require them because it’s the layer under them where the real compatibility challenge lurk.  

Side Note on OpenStack APIs (why they diverge): Trying to implement AWS APIs without duplicating all their behaviors is more frustrating than a fresh API without the implied AWS contracts.  This is exactly the problem with OpenStack variation.  The APIs work but there is not a behavior contract behind them.

For example, transitioning to IPv6 is difficult to deliver because Amazon still relies on IPv4. That lack makes it impossible to create hybrid automation that leverages IPv6 because they won’t work on AWS. In my world, we had to disable default use of IPv6 in Digital Rebar when we added AWS. Another example? Amazon’s regional AMI pattern, thankfully, is not replicated by Google; however, their lack means there’s no consistent image naming pattern.  In my experience, a bad pattern is generally better than inconsistent implementations.

As market dominance drives us to benchmark on Amazon, we are stuck with the good, bad and ugly aspects of their service.

For very pragmatic reasons, even AWS automation is highly fragmented. There are a large and shifting number of distinct system identifiers (AMIs, regions, flavors) plus a range of user-configured choices (security groups, keys, networks). Even within a single provider, these options make impossible to maintain a generic automation process.  Since other providers logically model from AWS, we will continue to expect AWS like behaviors from them.  Variation from those norms adds effort.

Failure to follow AWS without clear reason and alternative path is frustrating to users.

Do you agree?  Join us with Digital Rebar creating real a hybrid operations platform.

Fast Talk: Creating Operating Environments that Span Clouds and Physical Infrastructures

This short 15-minute talk pulls together a few themes around composability that you’ll see in future blogs where I lay out the challenges and solutions for hybrid DevOps practices.  Like any DevOps concept – it’s a mix of technology, attitude (culture) and process.

Our hybrid DevOps objective is simple: We need multi-infrastructure Amazon equivalence for ops automation.

IT perspective of AWSHere’s the summary:

  • Hybrid Infrastructure is new normal
  • Amazon is the Ops benchmark
  • Embrace operations automation
  • Invest in making IT composable

 

Want to listen to it?  Here’s the voice over:

 

Problems with the “Give me a Wookiee” hybrid API

Greg Althaus, RackN CTO, creates amazing hybrid DevOps orchestration that spans metal and cloud implementations.  When it comes to knowing the nooks and crannies of data centers, his ops scar tissue has scar tissue.  So, I knew you’d all enjoy this funny story he wrote after previewing my OpenStack API report.  

“APIs are only valuable if the parameters mean the same thing and you get back what you expect.” Greg Althaus

The following is a guest post by Greg:

While building the Digital Rebar OpenStack node provider, Rob Hirschfeld tried to integrate with 7+ OpenStack clouds.  While the APIs matched across instances, there are all sorts of challenges with what comes out of the API calls.  

The discovery made me realize that APIs are not the end of interoperability.  They are the beginning.  

I found I could best describe it with a story.

I found an API on a service and that API creates a Wookiee!

I can tell the API that I want a tall or short Wookiee or young or old Wookiee.  I test against the Kashyyyk service.  I consistently get a 8ft Brown 300 year old Wookiee when I ask for a Tall Old Wookiee.  

I get a 6ft Brown 50 Year old Wookiee when I ask for a Short Young Wookiee.  Exactly what I want, all the time.  

My pointy-haired emperor boss says I need to now use the Forest Moon of Endor (FME) Service.  He was told it is the exact same thing but cheaper.  Okay, let’s do this.  It consistently gives me 5 year old 4 ft tall Brown Ewok (called a Wookiee) when I ask for the Tall Young Wookiee.  

This is a fail.  I mean, yes, they are both furry and brown, but the Ewok can’t reach the top of my bookshelf.  

The next service has to work, right?  About the same price as FME, the Tatooine Service claims to be really good too.  It passes tests.  It hands out things called Wookiees.  The only problem is that, while size is an API field, the service requires the use of petite and big instead of short and tall.  This is just annoying.  This time my tall (well big) young Wookiee is 8 ft tall and 50 years old, but it is green and bald (scales are like that).  

I don’t really know what it is.  I’m sure it isn’t a Wookiee.  

And while she is awesome (better than the male Wookiees), she almost froze to death in the arctic tundra that is Boston.  

My point: APIs are only valuable if the parameters mean the same thing and you get back what you expect.

 

One Cloud, Many Providers: The OpenStack Interop Challenge

We want OpenStack to work as a universal cloud API but it’s hard!  What’s the problem? 

Clouds DownThis post, written before the Tokyo Summit but not published, talks about how we got here without a common standard and offers some pointers.  At the Austin Summit, I’ve got a talk on hybrid Open Infrastructure Wednesday @ 2:40 where I talk specifically about solutions.  I’ve been working on multi-infrastructure hybrid – that means making ops portable between OpenStack, Google, Amazon, Physical and other options.

The Problem: At a fundamental level, OpenStack has yet to decide if it’s an infrastructure (IaaS) product or a open software movement.

Is there a path to be both? There are many vendors who are eager to sell you their distinct flavor of OpenStack; however, lack of consistency has frustrated users and operators. OpenStack faces internal and external competition if we do not address this fragmentation. Over the next few paragraphs, we’ll explore the path the Foundation has planned to offer users a consistent core product while fostering its innovative community.

How did we get down this path?  Here’s some background how how we got here.

Before we can discuss interoperability (interop), we need to define success for OpenStack because interop is the means, not the end. My mark for success is when OpenStack has created a sustainable market for products that rely on the platform. In business plan speak, we’d call that a serviceable available market (SAM). In practical terms, OpenStack is successful when businesses targets the platform as the first priority when building integrations over cloud behemoths: Amazon and VMware.

The apparent dominance of OpenStack in terms of corporate contribution and brand position does not translate into automatic long term success.  

While apparently united under a single brand, intentional technical diversity in OpenStack has lead to incompatibilities between different public and private implementations. While some of these issues are accidents of miscommunication, others were created by structural choices inherent to the project’s formation. No matter the causes, they frustrate users and limit the network effect of widespread adoption.

Technical diversity was both a business imperative and design objective for OpenStack during formation.

In order to quickly achieve critical mass, the project needed to welcome a diverse and competitive set of corporate sponsors. The commitment to support operating systems, multiple hypervisors, storage platforms and networking providers has been essential to the project’s growth and popularity. Unfortunately, it also creates combinatorial complexity and political headaches.

With all those variables, it’s best to think of interop as a spectrum.  

At the top of that spectrum is basic API compatibility and the boom is fully integrated operation where an application could run site unaware in multiple clouds simultaneously.  Experience shows that basic API compatibility is not sufficient: there are significant behavioral impacts due to implementation details just below the API layer that must also be part of any interop requirement.  Variations like how IPs are assigned and machines are initialized matter to both users and tools.  Any effort to ensure consistency must go beyond simple API checks to validate that these behaviors are consistent.

OpenStack enforces interop using a process known as DefCore which vendors are required to follow in order to use the trademark “OpenStack” in their product name.

The process is test driven – vendors are required to pass a suite of capability tests defined in DefCore Guidelines to get Foundation approval. Guidelines are published on a 6 month cadence and cover only a “core” part of OpenStack that DefCore has defined as the required minimum set. Vendors are encouraged to add and extend beyond that base which then leads for DefCore to expand the core based on seeing widespread adoption.

What is DefCore?  Here’s some background about that too!

By design, DefCore started with a very small set of OpenStack functionality.  So small in fact, that there were critical missing pieces like networking APIs from the initial guideline.  The goal for DefCore is to work through the coabout mmunity process to expand based identified best practices and needed capabilities.  Since OpenStack encourages variation, there will be times when we have to either accept or limit variation.  Like any shared requirements guideline, DefCore becomes a multi-vendor contract between the project and its users.

Can this work?  The reality is that Foundation enforcement of the Brand using DefCore is really a very weak lever. The real power of DefCore comes when customers use it to select vendors.

Your responsibility in this process is to demand compliance from your vendors. OpenStack interoperability efforts will fail if we rely on the Foundation to enforce compliance because it’s simply too late at that point. Think of the healthy multi-vendor WiFi environment: vendors often introduce products on preliminary specifications to get ahead of market. For success, OpenStack vendors also need to be racing to comply with the upcoming guidelines. Only customers can create that type of pressure.

From that perspective, OpenStack will only be as interoperable as the market demands.

That creates a difficult conundrum: without common standards, there’s no market and OpenStack will simply become vertical vendor islands with limited reach. Success requires putting shared interests ahead of product.

That brings us full circle: does OpenStack need to be both a product and a community?  Yes, it clearly does.  

The significant distinction for interop is that we are talking about a focus on the user community voice over the vendor and developer community.  For that, OpenStack needs to focus on product definition to grow users.

I want to thank Egle Sigler and Shamail Tahir for their early review of this post.  Even beyond the specific content, they have helped shape my views on this topics.  Now, I’d like to hear your thoughts about this!  We need to work together to address Interoperability – it’s a community thing.

OpenStack Brief Histories: Austin 2011 and DefCore

These two short items are sidebars for my “One Cloud, Many Providers: The OpenStack Interp Challenge” post.  They provide additional context for the more focused question in the post: “At a fundamental level, OpenStack has yet to decide if it’s an infrastructure product or a open software movement. Is there a path to be both?” 

Background 1: OpenStack, The Early Days

How did we get here?  It’s worth noting that 2011 OpenStack was structured as a heterogenous vendor playground.  At the inaugural OpenStack summit in Austin when the project was just forming around NASA’s Nova and Rackspace’s Swift projects, monolithic cloud stacks were a very real threat.  VMware and Amazon were the de facto standards but closed and proprietary.  The open alternatives, CloudStack (Cloud.com), Eucalyptus and OpenNebula were too tied to single vendors or lacking in scale.  Having a multi-vendor, multi-contributor project without a dictatorial owner was a critical imperative for the community and it continues to be one of the most distinctive OpenStack traits.

Background 2:  DefCore, The Community Interoperability Process

What is DefCore?  The name DefCore is a portmanteau of the committee’s job to “define core” functions of OpenStack.  The official explanation says “DefCore sets base requirements by defining 1) capabilities, 2) code and 3) must-pass tests for all OpenStack products. This definition uses community resources and involvement to drive interoperability by creating the minimum standards for products labeled OpenStack.”  Fundamentally, it’s an OpenStack Board committee with membership open to the community.  In very practical terms, DefCore picks which features and implementation details of OpenStack are required by the vendors; consequently, we’ve designed a governance process to ensure transparency and, hopefully, prevent individual vendors from exerting too much influence.

Repair Kenmore Dishwasher Error F4E1 needs THIS Reset Code

This post is paying it forward on the SEO for people repairing their own Kenmore Dishwasher.  The repair is VERY managable but requires an undocumented clear code at the end of the replacement.

20160312_104922.jpgIf you get an F4-E1 (washer pump bad) code, then you MUST clear the code after you replace the drive motor.  The reset code is pressing any three bottons in a 1-2-3 sequence three times (so 1-2-3, 1-2-3, 1-2-3).  I took a picture of my unit with all lights on after I entered the diagnostic code.

Kudos to Kenmore for making their unit incredibly servicable.  

Replacing the drive components was EASY because of their thoughtful design. The repair basically replaced all the mechanical parts of the device as a single unit for $250 (a new unit is about $900).  Once I had the drive, it only required removing a few obvious connections.  The parts effectively snap together.

My only element of panic when I put the unit back together with the new parts and got the original code again.  Entering the clear code made everything work.  While it’s easy to find the parts, the reset code is NOT easy to find.  Hopefully, this post helps you resolve this final step in the repair!