How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

CaaPuccino: A frothy mix of containers and platforms.

Check out Krish Subramanian’s (@krishnan) Modern Enterprise podcast (audio here) today for a surprisingly deep and thoughtful discussion about how frothy new technologies are impacting Modern Enterprise IT. Of course, we also take some time to throw some fire bombs at the end. You can use my notes below to jump to your favorite topics.

The key takeaways are that portability is hard and we’re still working out the impact of container architecture.

The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.

If you are not actively making automation that works against multiple infrastructures then you are building technical debt.

Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.

Krish, thanks for the great discussion!

Rob’s Podcast Notes (39 minutes)

2:37: Rob intros about Digital Rebar & RackN

4:50: Why our Kubernetes is JUST UPSTREAM

5:35: Where are we going in 5 years > why Rob believes in Hybrid

  • Should not be 1 vendor who owns everything
  • That’s why we work for portability
  • Public cloud vision: you should stop caring about infrastructure
  • Coming to an age when infrastructure can be completely automated
  • Developer rebellion against infrastructure

8:36: Krish believes that Public cloud will be more decentralized

  • Public cloud should be part of everyone’s IT plan
  • It should not be the ONLY thig

9:25: Docker helps create portability, what else creates portability? Will there be a standard

  • Containers are a huge change, but it’s not just packaging
  • Smaller units of work is important for portability
  • Container schedulers & PaaS are very opinionated, that’s what creates portability
  • Deeper into infrastructure loses portability (RackN helps)
  • Rob predicts that Lambda and Serverless creates portability too

11:38: Are new standards emerging?

  • Some APIs become dominate and create de facto APIs
  • Embedded assumptions break portability – that’s what makes automation fragile
  • Rob explains why we inject configuration to abstract infrastructure
  • RackN works to inject attributes instead of allowing scripts to assume settings
  • For example, networking assumptions break portability
  • Platforms force people to give up configuration in ways that break portability

14:50: Why did Platform as a Service not take off?

  • Rob defends PaaS – thinks that it has accomplished a lot
  • Challenge of PaaS is that it’s very restrictive by design
  • Calls out Andrew Clay Shafer’s “don’t call it a PaaS” position
  • Containers provide a less restrictive approach with more options.

17:00: What’s the impact on Enterprise? How are developers being impacted?

  • Service Orientation is a very important thing to consider
  • Encapsulation from services is very valuable
  • Companies don’t own all their IT services any more – it’s not monolithic
  • IT Service Orientation aligns with Business Processes
    Rob says the API economy is a big deal
  • In machine learning, a business’ data may be more valuable than their product

19:30: Services impact?

  • Service’s have a business imperative
  • We’re not ready for all the impacts of a service orientation
  • Challenge is to mix configuration and services
  • Magic of Digital Rebar is that it can mix orchestration of both

22:00: We are having issues with simple, how are we going to scale up?

  • Barriers are very low right now

22:30: Will Kubernetes help us solve governance issues?

  • Kubernetes is doing a go building an ecosystem
  • Smart to focus on just being Kubernetes
  • It will be chaotic as the core is worked out

24:00: Do you think Kubernetes is going in the right direction?

  • Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
  • Google has the right balance of control
  • Kubernetes really is not that complex for what it does
  • Mesos is also good but harder to understand for users
  • Swarm is simple but harder to extend for an ecosystem
  • Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
  • Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
  • It’s likely to level the field, not create a Google advantage

27:00: How does Kubernetes fit into the Digital Rebar picture?

  • We think of Kubernetes as a great infrastructure abstraction that creates portability
  • We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
  • OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
  • RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
  • Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
  • Digital Rebar can also then use the Kubernetes abstractions?

30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!

  • There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
  • Since OpenStack is just an application and Kubernetes is a good way to manage applications
  • When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
  • “I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
  • Physical environment still has to be injected into the OpenStack on Kubernetes environment

35:05 Does OpenStack have a future?

  • Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
  • Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
  • Don’t “square peg a data center round hole” – find the best fit
  • OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.

Open Source Collaboration: The Power of No

TL;DR: The days of using open software passively from vendors are past, users need to have a voice and opinion about project governance. This post is a joint effort with Rob Hirschfeld, RackN, and Chris Ferris, IBM, based on their IBM Interconnect 2017 “Open Cloud Architecture: Think You Can Out-Innovate the Best of the Rest?” presentation.

nullIt’s a common misconception that open source collaboration means saying YES to all ideas; however, the reality of successful projects is the opposite.

Permissive open source licenses drive a delicate balance for projects. On one hand, projects that adopt permissive licenses should be accepting of contributions to build community and user base. On the other, maintainers need to adopt a narrow focus to ensure project utility and simplicity. If the project’s maintainers are too permissive, the project bloats and wanders without a clear purpose. If they are too restrictive then the project fails to build community.

It is human nature to say yes to all collaborators, but that can frustrate core developers and users.

For that reason, stronger open source projects have a clear, focused, shared vision. Historically, that vision was enforced by a benevolent dictator for life (BDFL); however, recent large projects have used a consensus of project elders to make the task more sustainable. These roles serve a critical need: they say “no” to work that does not align with the project’s mission and vision. The challenge of defining that vision can be a big one, but without a clear vision, it’s impossible for the community to sustain growth because new contributors can dilute the utility of projects. [author’s note: This is especially true of celebrity projects like OpenStack or Kubernetes that attract “shared glory” contributors]

There is tremendous social and commercial pressure driving this vision vs. implementation balance.

The most critical one is the threat of “forking.” Forking is what happens when the code/collaborator base of a project splits into multiple factions and stops working together on a single deliverable. The result is incompatible products with a shared history. While small forks are required to support releases, and foster development; diverging community forks can have unpredictable impacts for a project.

Forks are not always bad: they provide a control mechanism for communities.

The fundamental nature of open source projects that adopt a permissive license is what allows forks to become the primary governance tool. The nature of permissive licenses allows anyone to create a new line of development that’s different than the original line. Forks can allow special interests in a code base to focus on their needs. That could be new features or simply stabilization. Many times, a major release version of a project evolves into forks where both old and newer versions have independent communities because of deployment inertia. It can also allow new leadership or governance without having to directly displace an entrenched “owner”.

But forking is expensive because it makes it harder for communities to collaborate.

To us, the antidote for forking is not simply vision but a strong focus on interoperability. Interoperability (or interop) means ensuring that different implementations remain compatible for users. A simplified example would be having automation that works on one OpenStack cloud also work on all the others without modification. Strong interop creates an ecosystem for a project by making users confident that their downstream efforts will not be disrupted by implementation variance or version changes.

Good Interop relieves the pressure of forking.

Interop can only work when a project defines what is expected behavior and creates tests that enforce those standards. That activity forces project contributors to agree on project priorities and scope. Projects that refuse to define interop expectations end up disrupting their user and collaborator base in frustrating ways that lead to forking (Rob’s commentary on the potential Docker fork of 2016).

nullUnfortunately, Interop is not a generally a developer priority.

In the end, interoperability is a user feature that competes with other features. Sadly, it is often seen as hurting feature development because new features must work to maintain existing interop standards. For that reason, new contributors may see interop demands as a impediment to forward progress; however, it’s a strong driver for user adoption and growth.

The challenge is that those users are typically more focused on their own implementation and less visible to the project leadership. Vendors have similar disincentives to do work that benefits other vendors in the community. These tensions will undermine the health of communities that do not have strong BDFL or Elders leadership. So, who then provides the adult supervision?

Ultimately, users must demand interop and provide commercial preference for vendors that invest in interop.

Open source has definitely had an enormous impact on the software industry; generally, a change for the better. But, that change comes at a cost – the need for involvement, not just of vendors and individual developers, but, ultimately it demands the participation of consumers/users.

Interop isn’t naturally a vendor priority because it levels the playing field for all vendors; however, vendors do prioritize what their customers want.

Ideally, customer needs translate into new features that have a broad base of consumer interest. Interop ensure that features can be used broadly. Thus interop is an important attribute to consumers not only for vendors, but by the open source communities building the software. This alignment then serves as the foundation upon which (increasingly) that vendor software is based.

Customers should be actively and publicly supportive of interop efforts of projects on which their vendor’s offerings depend. If there isn’t such an initiative in those projects, then they should demand one be started through their vendor partners and in the public forums for the project.

Further, if consumers of an open source project sense that it lacks a strong, focused, vision and is wandering off course, they need to get involved and say so, either directly and/or through their vendor partners.

While open source has changing the IT industry, it also has a cost. The days of using software passively from vendors are past, users need to have a voice and opinion. The need to ensure that their chosen vendors are also supporting the health of the community.

What do you think? Reach out to Rob (@zehicle) and Chris (@christo4ferris) and let us know!

Note: Cross posted on IBM OpenTech site.

Accelerating Community Ops on Kubernetes in Hybrid Style

Preface: RackN is looking for SRE teams who are enthusiastic about accelerating Kubernetes on-premises in a long term operational way that can be shared and reused across the community.

kubernetesWe’re excited to see and be part of the community progress towards enterprise-ready Kubernetes operations on both cloud and on-premises.  The RackN team is excited to be part of multiple groups establishing patterns with shareable/reusable automation. I strongly recommend watching (or, better, collaborating in) these efforts if you are deploying Kubernetes even at experimental scale.

We’ve worked hard to make shared community ops work accessible, repeatable and multi-platform without compromising scale or security.

The RackN team has been enthusiastic supporters of Kubernetes since the 1.0 launch with our first deployments going back to June 2015 with updates for 1.2, 1.3 and now 1.5. I’m excited to report that fully converged the composable Digital Rebar approach with the Kubernetes Kargo Ansible. Our 1.2 efforts leveraged the Kargo predecessor “Kubespray.” This integration brings the parallel hybrid operation and node-by-node function of Digital Rebar with the Ansible community efforts around Kargo.

Composable design is a key element the RackN focus on SRE automation because it allows ecosystem

That allows a fully integrated deploy where Digital Rebar stages the environment and then use Kargo directly from upsteam to install Kubernetes. Post-deployment, Digital Rebar is able to extend the cluster with packages like Helm, Deis, Dashboard and others.

Since Digital Rebar supports parallel deployments, it’s possible to fully exercise the options enabled by Kargo simultaneously for development and testing.  Benefits????

For example, you can built-test-destroy coordinated Kubernetes installs on Centos, Redhat and Ubuntu as part of an automation pipeline. Unlike client side approaches like Terraform or Ansible, our infrastructure allows transparent monitoring of the deployments including Slack integration.

Flexibility is also important between users because Ops variation is both a benefit and a cost.

A key Digital Rebar design goal is for users to explore useful variation and still share operational best practices. We are proving that shared community automation can support many different scenarios including variation between between clouds, physical, operating system, networking and container engine.

If we cannot manage this variation in a consistent way then we’re doomed to operational fragmentation (like OpenStack has endured).

We’re inviting you to check out our open work supporting the Kubernetes Ops community. As Rob Hirschfeld says, looking for “Day 2” minded operators who want to make sure that we are always able to share Kubernetes best practices.

What does it take to Operate Open Platforms? Answers in Datanauts 72

Did I just let OpenStack ops off the hook….?  Kubernetes production challenges…?  

ix34grhy_400x400I had a lot of fun in this Datanauts wide ranging discussion with unicorn herders Chris Wahl and Ethan Banks.  I like the three section format because it gives us a chance to deep dive into distinct topics and includes some out-of-band analysis by the hosts; however, that means you need to keep listening through the commercial breaks to hear the full podcast.

Three parts?  Yes, Chris and Ethan like to save the best questions for last.

In Part 1, we went deep into the industry operational and business challenges uncovered by the OpenStack project. Particularly, Chris and I go into “platform underlay” issues which I laid out in my “please stop the turtles” post. This was part of the build-up to my SRE series.

In Part 2, we explore my operations-focused view of the latest developments in container schedulers with a focus on Kubernetes. Part of the operational discussion goes into architecture “conceits” (or compromises) that allow developers to get the most from cloud native design patterns. I also make a pitch for using proven tools to run the underlay.

In Part 3, we go deep into DevOps automation topics of configuration and orchestration. We talk about the design principles that help drive “day 2” automation and why getting in-place upgrades should be an industry priority.  Of course, we do cover some Digital Rebar design too.

Take a listen and let me know what you think!

On Twitter, we’ve already started a discussion about how much developers should care about infrastructure. My opinion (posted here) is that one DevOps idea where developers “own” infrastructure caused a partial rebellion towards containers.

Why is RackN advancing OpenStack on Kubernetes?

Yesterday, RackN CEO, Rob Hirschfeld, described the remarkable progress in OpenStack on Kubernetes using Helm (article link).  Until now, RackN had not been willing to officially support OpenStack deployments; however, we now believe that this approach is a game changer for OpenStack operators even if they are not actively looking at Kubernetes.

We are looking for companies that want to join in this work and fast-track it into production. If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan. This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Here is Rob’s Demo

Rob’s Original Blog Post

RackN revisits OpenStack deployments with an eye on ongoing operations. I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment because I felt that the technical hurdles of cloud native architecture would prove challenging.

I was wrong: I underestimated how fast these issues could be addressed.

… read the rest at Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar) — Rob Hirschfeld

Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

“Why SRE?” Discussion with Eric @Discoposse Wright

sre-series My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

ericewrightI was a guest on Eric “@discoposse” Wright of the Green Circle Community #42 Podcast (my previous appearance).

LISTEN NOW: Podcast #42 (transcript)

In this action-packed 30 minute conversation, we discuss the industry forces putting pressure on operations teams.  These pressures require operators to be investing much more heavily on reusable automation.

That leads us towards why Kubernetes is interesting and what went wrong with OpenStack (I actually use the phrase “dumpster fire”).  We ultimately talk about how those lessons embedded in Digital Rebar architecture.

Can we control Hype & Over-Vendoring?

Q: Is over-vendoring when you’ve had to much to drink?
A: Yes, too much Kool Aid.

There’s a lot of information here – skip to the bottom if you want to see my recommendation.

Last week on TheNewStack, I offered eight ways to keep Kubernetes on the right track (abridged list here) and felt that item #6 needed more explanation and some concrete solutions.

  1. DO: Focus on a Tight Core
  2. DO: Build a Diverse Community
  3. DO: Multi-cloud and Hybrid
  4. DO: Be Humble and Honest
  5. AVOID: “The One Ring” Universal Solution Hubris
  6. AVOID: Over-Vendoring (discussed here)
  7. AVOID: Coupling Installers, Brokers and Providers to the core
  8. AVOID: Fast Release Cycles without LTS Releases

kool-aid-manWhat is Over-Vendoring?  It’s when vendors’ drive their companies’ brands ahead of the health of the project.  Generally by driving an aggressive hype cycle where vendors are trying to jump on the hype bandwagon.

Hype can be very dangerous for projects (David Cassel’s TNS article) because it is easy to bypass the user needs and boring scale/stabilization processes to focus on vendor differentiation.  Unfortunately, common use-cases do not drive differentiation and are invisible when it comes to company marketing budgets.  That boring common core has the effect creating tragedy of the commons which undermines collaboration on shared code bases.

The solution is to aggressively keep the project core small so that vendors have specific and limited areas of coopetition.  

A small core means we do not compel collaboration in many areas of project.  This drives competition and diversity that can be confusing.  The temptation to endorse or nominate companion projects is risky due to the hype cycle.  Endorsements can create a bias that actually hurts innovation because early or loud vendors do not generally create the best long term approaches.  I’ve heard this described as “people doing the real work don’t necessarily have time to brag about it.”

Keeping a small core mantra drives a healthy plug-in model where vendors can differentiate.  It also ensures that projects can succeed with a bounded set of core contributors and support infrastructure.  That means that we should not measure success by commits, committers or lines of code because these will drop as projects successfully modularize.  My recommendation for a key success metric is to the ratio of committers to ecosystem members and users.

Tracking improving ratio of core to ecosystem shows that improving efficiency of investment.  That’s a better sign of health than project growth.

It’s important to note that there is also a serious risk of under-vendoring too!  

We must recognize and support vendors in open source communities because they sustain the project via direct contributions and bringing users.  For a healthy ecosystem, we need to ensure that vendors can fairly profit.  That means they must be able to use their brand in combination with the project’s brand.  Apache Project is the anti-pattern because they have very strict “no vendor” trademark marketing guidelines that can strand projects without good corporate support.

I’ve come to believe that it’s important to allow vendors to market open source projects brands; however, they also need to have some limits on how they position the project.

How should this co-branding work?  My thinking is that vendor claims about a project should be managed in a consistent and common way.  Since we’re keeping the project core small, that should help limit the scope of the claims.  Vendors that want to make ecosystem claims should be given clear spaces for marketing their own brand in participation with the project brand.

I don’t pretend that this is easy!  Vendor marketing is planned quarters ahead of when open source projects are ready for them: that’s part of what feeds the hype cycle. That means that projects will be saying no to some free marketing from their ecosystem.  Ideally, we’re saying yes to the right parts at the same time.

Ultimately, hype control means saying no to free marketing.  For an open source project, that’s a hard but essential decision.

 

Cloudcast.net gem about Cluster Ops Gap

15967Podcast juxtaposition can be magical.  In this case, I heard back-to-back sessions with pragmatic for cluster operations and then how developers are rebelling against infrastructure.

Last week, I was listening to Brian Gracely’s “Automatic DevOps” discussion with  John Troyer (CEO at TechReckoning, a community for IT pros) followed by his confusingly titled “operators” talk with Brandon Phillips (CTO at CoreOS).

John’s mid-recording comments really resonated with me:

At 16 minutes: “IT is going to be the master of many environments… If you have an environment is hybrid & multi-cloud, then you still need to care about infrastructure… we are going to be living with that for at least 10 years.”

At 18 minutes: “We need a layer that is cloud-like, devops-like and agile-like that can still be deployed in multiple places.  This middle layer, Cluster Ops, is really important because it’s the layer between the infrastructure and the app.”

The conversation with Brandon felt very different where the goal was to package everything “operator” into Kubernetes semantics including Kubernetes running itself.  This inception approach to running the cluster is irresistible within the community because the goal of the community is to stop having to worry about infrastructure.  [Brian – call me if you want to a do podcast of the counter point to self-hosted].

Infrastructure is hard and complex.  There’s good reason to limit how many people have to deal with that, but someone still has to deal with it.

I’m a big fan of container workloads generally and Kubernetes specifically as a way to help isolate application developers from infrastructure; consequently, it’s not designed to handle the messy infrastructure requirements that make Cluster Ops a challenge.  This is a good thing because complexity explodes when platforms expose infrastructure details.

For Kubernetes and similar, I believe that injecting too much infrastructure mess undermines the simplicity of the platform.

There’s a different type of platform needed for infrastructure aware cluster operations where automation needs to address complexity via composability.  That’s what RackN is building with open Digital Rebar: a the hybrid management layer that can consistently automate around infrastructure variation.

If you want to work with us to create system focused, infrastructure agnostic automation then take a look at the work we’ve been doing on underlay and cluster operations.

 

DevOps vs Cloud Native: Damn, where did all this platform complexity come from?

Complexity has always part of IT and it’s increasing as we embrace microservices and highly abstracted platforms.  Making everyone cope with this challenge is unsustainable.

We’re just more aware of infrastructure complexity now that DevOps is exposing this cluster configuration to developers and automation tooling. We are also building platforms from more loosely connected open components. The benefit of customization and rapid development has the unfortunate side-effect of adding integration points. Even worse, those integrations generally require operations in a specific sequence.

The result is a developer rebellion against DevOps on low level (IaaS) platforms towards ones with higher level abstractions (PaaS) like Kubernetes.
11-11-16-hirschfeld-1080x675This rebellion is taking the form of “cloud native” being in opposition to “devops” processes. I discussed exactly that point with John Furrier on theCUBE at Kubecon and again in my Messy Underlay presentation Defrag Conf.

It is very clear that DevOps mission to share ownership of messy production operations requirements is not widely welcomed. Unfortunately, there is no magic cure for production complexity because systems are inherently complex.

There is a (re)growing expectation that operations will remain operations instead of becoming a shared team responsibility.  While this thinking apparently rolls back core principles of the DevOps movement, we must respect the perceived productivity impact of making operations responsibility overly broad.

What is the right way to share production responsibility between teams?  We can start to leverage platforms like Kubernetes to hide underlay complexity and allow DevOps shared ownership in the right places.  That means that operations still owns the complex underlay and platform jobs.  Overall, I think that’s a workable diversion.