(re)Finding an Open Infrastructure Plan: Bridging OpenStack & Kubernetes

TL;DR: infrastructure operations is hard and we need to do a lot more to make these systems widely accessible, easy to sustain and lower risk.  We’re discussing these topics on twitter…please join in.  Themes include “do we really have consensus and will to act” and “already a solved problem” and “this hurts OpenStack in the end.”  

pexels-photo-229949I am always looking for ways to explain (and solve!) the challenges that we face in IT operations and open infrastructure.  I’ve been writing a lot about my concern that data center automation is not keeping pace and causing technical debt.  That concern led to my recent SRE blogging for RackN.

It’s essential to solve these problems in an open way so that we can work together as a community of operators.

It feels like developers are quick to rally around open platforms and tools while operators tend to be tightly coupled to vendor solutions because operational work is tightly coupled to infrastructure.  From that perspective, I’m been very involved in OpenStack and Kubernetes open source infrastructure platforms because I believe the create communities where we can work together.

This week, I posted connected items on VMblog and RackN that layout a position where we bring together these communities.

Of course, I do have a vested interest here.  Our open underlay automation platform, Digital Rebar, was designed to address a missing layer of physical and hybrid automation under both of these projects.  We want to help accelerate these technologies by helping deliver shared best practices via software.  The stack is additive – let’s build it together.

I’m very interested in hearing from you about these ideas here or in the context of the individual posts.  Thanks!

OpenStack’s Big Pivot: our suggestion to drop everything and focus on being a Kubernetes VM management workload

TL;DR: Sometimes paradigm changes demand a rapid response and I believe unifying OpenStack services under Kubernetes has become an such an urgent priority that we must freeze all other work until this effort has been completed.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive

By design, OpenStack chose to be unopinionated about operations.

pexels-photo-422290That made sense for a multi-vendor project that was deeply integrated with the physical infrastructure and virtualization technologies.  The cost of that decision has been high for everyone because we did not converge to shared practices that would drive ease of operations, upgrade or tuning.  We ended up with waves of vendors vying to have the the fastest, simplest and openest version.  

Tragically, install became an area of competition instead an area of collaboration.

Containers and microservice architecture (as required for Kubernetes and other container schedulers) is providing an opportunity to correct this course.  The community is already moving towards containerized services with significant interest in using Kubernetes as the underlay manager for those services.  I’ve laid out the arguments for and challenges ahead of this approach in other places.  

These technical challenges involve tuning the services for cloud native configuration and immutable designs.  They include making sure the project configurations can be injected into containers securely and the infra-service communication can handle container life-cycles.  Adjacent concerns like networking and storage also have to be considered.  These are all solvable problems that can be more quickly resolved if the community acts together to target just one open underlay.

The critical fact is that the changes are manageable and unifying the solution makes the project stronger.

Using Kubernetes for OpenStack service management does not eliminate or even solve the challenges of deep integration.  OpenStack already has abstractions that manage vendor heterogeneity and those abstractions are a key value for the project.  Kubernetes solves a different problem: it manages the application services that run OpenStack with a proven, understood pattern.  By adopting this pattern fully, we finally give operators consistent, shared and open upgrade, availability and management tooling.

Having a shared, open operational model would help drive OpenStack faster.

There is a risk to this approach: driving Kubernetes as the underlay for OpenStack will force OpenStack services into a more narrow scope as an infrastructure service (aka IaaS).  This is a good thing in my opinion.   We need multiple abstractions when we build effective IT systems.  

The idea that we can build a universal single abstraction for all uses is a dangerous distraction; instead; we need to build platform layers collaborativity.  

While initially resisting, I have become enthusiatic about this approach.  RackN has been working hard on the upgradable & highly available Kubernetes on Metal prerequisite.  We’ve also created prototypes of the fully integrated stack.  We believe strongly that this work should be done as a community effort and not within a distro.

My call for a Kubernetes underlay pivot embraces that collaborative approach.  If we can keep these platforms focused on their core value then we can build bridges between what we have and our next innovation.  What do you think?  Is this a good approach?  Contact us if you’d like to work together on making this happen.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive to SREs? 

Cloud Native PHYSICAL PROVISIONING? Come on! Really?!

We believe Cloud Native development disciplines are essential regardless of the infrastructure.

imageToday, RackN announce very low entry level support for Digital Rebar Provisioning – the RESTful Cobbler PXE/DHCP replacement.  Having a company actually standing behind this core data center function with support is a big deal; however…

We’re making two BIG claims with Provision: breaking DevOps bottlenecks and cloud native physical provisioning.  We think both points are critical to SRE and Ops success because our current approaches are not keeping pace with developer productivity and hardware complexity.

I’m going to post more about Provision can help address the political struggles of SRE and DevOps that I’ve been watching in our industry.   A hint is in the release, but the Cloud Native comment needs to be addressed.

First, Cloud Native is an architecture, not an infrastructure statement.

There is no requirement that we use VMs or AWS in Cloud Native.  From that perspective, “Cloud” is a useful but deceptive adjective.  Cloud Native is born from applications that had to succeed in hands-off, lower SLA infrastructure with fast delivery cycles on untrusted systems.  These are very hostile environments compared to “legacy” IT.

What makes Digital Rebar Provision Cloud Native?  A lot!

The following is a list of key attributes I consider essential for Cloud Native design.

Micro-services Enabled: The larger Digital Rebar project is a micro-services design.  Provision reflects a stand-alone bundling of two services: DHCP and Provision.  The new Provision service is designed to both stand alone (with embedded UX) and be part of a larger system.

Swagger RESTful API: We designed the APIs first based on years of experience.  We spent a lot of time making sure that the API conformed to spec and that includes maintaining the Swagger spec so integration is easy.

Remote CLI: We build and test our CLI extensively.  In fact, we expect that to be the primary user interface.

Security Designed In: We are serious about security even in challenging environments like PXE where options are limited by 20 year old protocols.  HTTPS is required and user or bearer token authentication is required.  That means that even API calls from machines can be secured.

12 Factor & API Config: There is no file configuration for Provision.  The system starts with command line flags or environment variables.  Deeper configuration is done via API/CLI.  That ensures that the system can be fully managed by remote and configured securely becausee credentials are required for configuration.

Fast Start / Golang:  Provision is a totally self-contained golang app including the UX.  Even so, it’s very small.  You can run it on a laptop from nothing in about 2 minutes including download.

CI/CD Coverage: We committed to deep test coverage for Provision and have consistently increased coverage with every commit.  It ensures quality and prevents regressions.

Documentation In-project Auto-generated: On-boarding is important since we’re talking about small, API-driven units.  A lot of Provisioning documentation is generated directly from the code into the actual project documentation.  Also, the written documentation is in Restructured Text in the project with good indexes and cross-references.  We regenerate the documentation with every commit.

We believe these development disciplines are essential regardless of the infrastructure.  That’s why we made sure the v3 Provision (and ultimately every component of Digital Rebar as we iterate to v3) was built to these standards.

What do you think?  Is this Cloud Native?  What did we miss?

RackN Ends DevOps Gridlock in Data Center [Press Release]

Today we announced the availability of Digital Rebar Provision, the industry’s first cloud-native physical provisioning utility.  We’ve had this in the Digital Rebar community for a few weeks before offering support and response has been great!

DR ProvisionBy releasing their API-driven provisioning tool as a stand-alone component of the larger Digital Rebar suite, RackN helps DevOps teams break automation bottlenecks in their legacy data centers without disrupting current operations. The stand-alone open utility can be deployed in under 5 minutes and fits into any data center design. RackN also announced a $1,000 starter support and consulting package to further accelerate transition from tools like Cobbler, MaaS or Stacki to the new Golang utility.

“We were seeing SREs suffering from high job turnover,” said Rob Hirschfeld, RackN founder and CEO. “When their integration plans get gridlocked by legacy tooling they quickly either lose patience or political capital. Digital Rebar Provision replaces the legacy tools without process disruption so that everyone can find shared wins early in large SRE initiatives.”

The first cloud-native physical provisioning utility

Data center provisioning is surprisingly complex because it’s caught between cutting edge hardware and arcane protocols and firmware requirements that are difficult to disrupt.  The heart of the system is a fickle combination of specific DHCP options, a firmware bootstrap environment (known as PXE), a very lightweight file transfer protocol (TFTP) and operating system specific templating tools like preseed and kickstart.  Getting all these pieces to work together with updated APIs without breaking legacy support has been elusive.

By rethinking physical ops in cloud-native terms, RackN has managed to distill out a powerful provisioning tool for DevOps and SRE minded operators who need robust API/CLI, Day 2 Ops, security and control as primary design requirements. By bootstrapping foundational automation with Digital Rebar Provision, DevOps teams lay a foundation for data center operations that improves collaboration between operators and SRE teams: operators enjoy additional control and reuse and SREs get a doorway into building a fully automated process.

A pragmatic path without burning downing the data center

“I’m excited to see RackN providing a pragmatic path from physical boot to provisioning without having to start over and rebuild my data center to get there.” said Dave McCrory, an early cloud and data gravity innovator.  “It’s time for the industry to stop splitting physical and cloud IT processes because snowflaked, manual processes slow everyone down.  I can’t imagine an easier on-ramp than Digital Rebar Provision”

The RackN Digital Rebar is making it easy for Cobbler, Stacki, MaaS and Forman users to evaluate our RESTful, Golang, Template-based PXE Provisioning utility.  Interested users can evaluate the service in minutes on a laptop or engage with RackN for a more comprehensive trail with expert support.  The open Provision service works both independently and as part of Digital Rebar’s full life-cycle hybrid control.

Scontactee specific features at http://rackn.com/provision/drsa.

Want help starting on this journey?  Contact us and we can help.

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

CaaPuccino: A frothy mix of containers and platforms.

Check out Krish Subramanian’s (@krishnan) Modern Enterprise podcast (audio here) today for a surprisingly deep and thoughtful discussion about how frothy new technologies are impacting Modern Enterprise IT. Of course, we also take some time to throw some fire bombs at the end. You can use my notes below to jump to your favorite topics.

The key takeaways are that portability is hard and we’re still working out the impact of container architecture.

The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.

If you are not actively making automation that works against multiple infrastructures then you are building technical debt.

Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.

Krish, thanks for the great discussion!

Rob’s Podcast Notes (39 minutes)

2:37: Rob intros about Digital Rebar & RackN

4:50: Why our Kubernetes is JUST UPSTREAM

5:35: Where are we going in 5 years > why Rob believes in Hybrid

  • Should not be 1 vendor who owns everything
  • That’s why we work for portability
  • Public cloud vision: you should stop caring about infrastructure
  • Coming to an age when infrastructure can be completely automated
  • Developer rebellion against infrastructure

8:36: Krish believes that Public cloud will be more decentralized

  • Public cloud should be part of everyone’s IT plan
  • It should not be the ONLY thig

9:25: Docker helps create portability, what else creates portability? Will there be a standard

  • Containers are a huge change, but it’s not just packaging
  • Smaller units of work is important for portability
  • Container schedulers & PaaS are very opinionated, that’s what creates portability
  • Deeper into infrastructure loses portability (RackN helps)
  • Rob predicts that Lambda and Serverless creates portability too

11:38: Are new standards emerging?

  • Some APIs become dominate and create de facto APIs
  • Embedded assumptions break portability – that’s what makes automation fragile
  • Rob explains why we inject configuration to abstract infrastructure
  • RackN works to inject attributes instead of allowing scripts to assume settings
  • For example, networking assumptions break portability
  • Platforms force people to give up configuration in ways that break portability

14:50: Why did Platform as a Service not take off?

  • Rob defends PaaS – thinks that it has accomplished a lot
  • Challenge of PaaS is that it’s very restrictive by design
  • Calls out Andrew Clay Shafer’s “don’t call it a PaaS” position
  • Containers provide a less restrictive approach with more options.

17:00: What’s the impact on Enterprise? How are developers being impacted?

  • Service Orientation is a very important thing to consider
  • Encapsulation from services is very valuable
  • Companies don’t own all their IT services any more – it’s not monolithic
  • IT Service Orientation aligns with Business Processes
    Rob says the API economy is a big deal
  • In machine learning, a business’ data may be more valuable than their product

19:30: Services impact?

  • Service’s have a business imperative
  • We’re not ready for all the impacts of a service orientation
  • Challenge is to mix configuration and services
  • Magic of Digital Rebar is that it can mix orchestration of both

22:00: We are having issues with simple, how are we going to scale up?

  • Barriers are very low right now

22:30: Will Kubernetes help us solve governance issues?

  • Kubernetes is doing a go building an ecosystem
  • Smart to focus on just being Kubernetes
  • It will be chaotic as the core is worked out

24:00: Do you think Kubernetes is going in the right direction?

  • Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
  • Google has the right balance of control
  • Kubernetes really is not that complex for what it does
  • Mesos is also good but harder to understand for users
  • Swarm is simple but harder to extend for an ecosystem
  • Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
  • Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
  • It’s likely to level the field, not create a Google advantage

27:00: How does Kubernetes fit into the Digital Rebar picture?

  • We think of Kubernetes as a great infrastructure abstraction that creates portability
  • We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
  • OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
  • RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
  • Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
  • Digital Rebar can also then use the Kubernetes abstractions?

30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!

  • There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
  • Since OpenStack is just an application and Kubernetes is a good way to manage applications
  • When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
  • “I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
  • Physical environment still has to be injected into the OpenStack on Kubernetes environment

35:05 Does OpenStack have a future?

  • Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
  • Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
  • Don’t “square peg a data center round hole” – find the best fit
  • OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.

Why IBM’s hybrid “no-single-way” is a good plan

I got to spend a few days hearing IBM’s cloud plans at IBM Interconnect including a presentation, dinner and guest blogging.  Read below for links to that content.

As part of their CloudMinds group, we’re encouraged to look at the big picture of the conference and there’s a lot to take in. IBM has serious activity around machine learning, cognitive, serverless, functional languages, block chain, platform and infrastructure as a service. Frankly, that’s a confusing array of technologies.

Does this laundry list of technologies fit into a unified strategy? No, and that’s THE POINT.

Anyone who thinks they can predict a definitive right mix of technologies to solve customer problems is not paying attention to the pace of innovation. IBM is listening to their customers and hearing that needs are expanding not consolidating. In this type of market, limiting choice hurts customers.

That means that a hybrid strategy with overlapping offerings serves their customers interests.

IBM has the luxury and scale of being able to chase multiple technologies to find winners. Of course, there’s a danger of hanging on to losers too long too. So far, it looks like they are doing a good job of riding that sweet spot. Their agility here may be the only way that they can reasonably find a chink in Amazon’s cloud armour.

While the hybrid story is harder to tell, it’s the right one for this market.

Four Posts For Deeper Reading

The posts below cover a broad range of topics! Chris Ferris and I did some serious writing about collaboration and my DevOps/Hybrid post has been getting some attention. It’s all recommended reading so I’ve included some highlights.

#CloudMinds tackle the future of cognitive in Las Vegas huddle

Rob is part of the IBM CloudMinds group that meets occasionally to discuss rising cloud, infrastructure and technology challenges.

“Cognitive cannot and will not exist without trust. Humans will not trust cognitive unless we can show that our cognitive solutions understand them.”

How open communities can hurt, and help, interoperability

“The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.”

When DevOps and hybrid collide (2017 trend lines)

“We’ve clearly learned that DevOps automation pays back returns in agility and performance. Originally, small-batch, lean thinking was counter-intuitive. Now it’s time to make similar investments in hybrid automation so that we can leverage the most innovation available in IT today.”

Open Source Collaboration: The Power of No & Interoperability

“Users and operators can put significant pressure on project leaders and vendors to ensure that the platforms are interoperable. “

Packet Pushers 333: Orchestration v Automation < YES, this is what we are doing!

Iix34grhy_400x400 highly recommend catching Packet Pushers 333 “Automation & Orchestration In Networking” by Drew Conry-Murray and guests Pete Lumbis and Michael Damkot.

While the discussion is all about NETWORK DevOps, they do a good job of decrying WHY current state of system orchestration is so sad – in a word: heterogeneity.  It’s not going away because the alternative is lock-in.  They also do a good job of describing the difference between automation and orchestration; however, I think there’s a middle tier  of resource “scheduling” that better describes OpenStack and Kubernetes.

Around 5:00 minutes into the podcast, they effectively describe the composable design of Digital Rebar and the rationale for the way that we’ve abstracted interfaces for automation.  If you guys really do want to cash in by consulting with it (at 10 minutes), just give me a call.

It’s great to hear acknowledgement of both the complexity and need for solving these problems.   Thanks for the great podcast Drew, Pete and Michael!

Oh… and I’m going to be presenting at Interop ITX also.  Hopefully, I’ll get a chance to talk 1×1 with Drew.