Containers, Private Clouds, GIFEE, and the Underlay Problem

ITRevoluion

Gene Kim (@RealGeneKim) posted an exclusive Q&A with Rob Hirschfeld (@zehicle) today on IT Technology: Rob Hirschfeld on Containers, Private Clouds, GIFEE, and the Remaining “Underlay Problem.” 

Questions from the post:

  • Gene Kim: Tell me about the landscape of docker, OpenStack, Kubernetes, etc. How do they all relate, what’s changed, and who’s winning?
  • GK: I recently saw a tweet that I thought was super funny, saying something along the lines “friends don’t let friends build private clouds” — obviously, given all your involvement in the OpenStack community for so many years, I know you disagree with that statement. What is it that you think everyone should know about private clouds that tell the other side of the story?
  • GK: We talked about how much you loved the book Site Reliability Engineering: How Google Runs Production Systems by Betsy Beyer, which I also loved. What resonated with you, and how do you think it relates to how we do Ops work in the next decade?
  • GK: Tell me what about the work you did with Crowbar, and how that informs the work you’re currently doing with Digital Rebar?

Read the full Q&A here.

What does it take to Operate Open Platforms? Answers in Datanauts 72

Did I just let OpenStack ops off the hook….?  Kubernetes production challenges…?  

ix34grhy_400x400I had a lot of fun in this Datanauts wide ranging discussion with unicorn herders Chris Wahl and Ethan Banks.  I like the three section format because it gives us a chance to deep dive into distinct topics and includes some out-of-band analysis by the hosts; however, that means you need to keep listening through the commercial breaks to hear the full podcast.

Three parts?  Yes, Chris and Ethan like to save the best questions for last.

In Part 1, we went deep into the industry operational and business challenges uncovered by the OpenStack project. Particularly, Chris and I go into “platform underlay” issues which I laid out in my “please stop the turtles” post. This was part of the build-up to my SRE series.

In Part 2, we explore my operations-focused view of the latest developments in container schedulers with a focus on Kubernetes. Part of the operational discussion goes into architecture “conceits” (or compromises) that allow developers to get the most from cloud native design patterns. I also make a pitch for using proven tools to run the underlay.

In Part 3, we go deep into DevOps automation topics of configuration and orchestration. We talk about the design principles that help drive “day 2” automation and why getting in-place upgrades should be an industry priority.  Of course, we do cover some Digital Rebar design too.

Take a listen and let me know what you think!

On Twitter, we’ve already started a discussion about how much developers should care about infrastructure. My opinion (posted here) is that one DevOps idea where developers “own” infrastructure caused a partial rebellion towards containers.

SRE role with DevOps for Enterprise [@HPE podcast]

sre-series

My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

Yes, DevOps and SRE are complementary

In this short 16 minute podcast, HPE’s Stephen Spector and I discuss how DevOps and SRE thinking overlaps and where are the differences.  We also discuss how Enterprises should be evaluating Site Reliability Engineering as a function and where it fits in their organization.

Why is RackN advancing OpenStack on Kubernetes?

Yesterday, RackN CEO, Rob Hirschfeld, described the remarkable progress in OpenStack on Kubernetes using Helm (article link).  Until now, RackN had not been willing to officially support OpenStack deployments; however, we now believe that this approach is a game changer for OpenStack operators even if they are not actively looking at Kubernetes.

We are looking for companies that want to join in this work and fast-track it into production. If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan. This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Here is Rob’s Demo

Rob’s Original Blog Post

RackN revisits OpenStack deployments with an eye on ongoing operations. I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment because I felt that the technical hurdles of cloud native architecture would prove challenging.

I was wrong: I underestimated how fast these issues could be addressed.

… read the rest at Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar) — Rob Hirschfeld

Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

“Why SRE?” Discussion with Eric @Discoposse Wright

sre-series My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

ericewrightI was a guest on Eric “@discoposse” Wright of the Green Circle Community #42 Podcast (my previous appearance).

LISTEN NOW: Podcast #42 (transcript)

In this action-packed 30 minute conversation, we discuss the industry forces putting pressure on operations teams.  These pressures require operators to be investing much more heavily on reusable automation.

That leads us towards why Kubernetes is interesting and what went wrong with OpenStack (I actually use the phrase “dumpster fire”).  We ultimately talk about how those lessons embedded in Digital Rebar architecture.

(re)OpenStack for 2017 – board voting week starts this Monday

[1/19 Update: I placed 9th in the results (or 6th if you go only by popular vote instead of total votes).  There are 8 seats, so I was not elected.]

The OpenStack Project needs a course correction and I’m asking for your community vote to put me back on the 2017 Board to help drive it.  As a start-up CEO, I’m neutral, yet I also have the right technical, commercial and community influence to make this a reality.

Vote Now!Your support is critical because OpenStack fills a very real need and should have a solid future; however, it needs to adapt to market realities to achieve that.

I want the Board to acknowledge and adapt to stumbles in ecosystem success including being dropped or re-prioritized by key sponsors.  This should include tightening the mission so the project can collaborate more freely with both open and proprietary platforms.  In 2016, I’ve been deeply involved OpenStack alternatives including Kubernetes and hybrid Cloud automation with Amazon and Google.

OpenStack must adjust to being one of several alternatives including AWS, Google and container platforms like Kubernetes.

That means focusing on our IaaS strengths and being unambiguous about core function like SDN and storage integration.   It also means ensuring that commercial members of the ecosystem can both profit and compete.  The Board has both the responsibility and authority to make these changes if the members are willing to act.

What’s my background?  I’ve been an active and vocal member of the OpenStack community since the very beginning of the project especially around Operator and Product Management issues.  I was elected to the board four times and played critical roles including launching the DefCore efforts and pushing for more definition of the Big Tent concept (which I believe has hurt the project).

In a great field of candidates!  Like other years, there are many very strong candidates whom I have worked with in a number of roles.  I always recommend distributing your eight votes to multiple people and limited “affinity voting” for your own company or geography.   While all candidates would serve the board, this year, I’d like to call attention specifically to  Shamail Tahair as a candidate who has invested significant time in helping with Product Management and Enterprise Readiness for OpenStack.

2016 Infrastructure Revolt makes 2017 the “year of the IT Escape Clause”

Software development technology is so frothy that we’re developing collective immunity to constant churn and hype cycles. Lately, every time someone tells me that they have hot “picked technology Foo” they also explain how they are also planning contingencies for when Foo fails. Not if, when.

13633961301245401193crawfish20boil204-mdRequired contingency? That’s why I believe 2017 is the year of the IT Escape Clause, or, more colorfully, the IT Crawfish.

When I lived in New Orleans, I learned that crawfish are anxious creatures (basically tiny lobsters) with powerful (and delicious) tails that propel them backward at any hint of any danger. Their ability to instantly back out of any situation has turned their name into a common use verb: crawfish means to back out or quickly retreat.

In IT terms, it means that your go-forward plans always include a quick escape hatch if there’s some problem. I like Subbu Allamaraju’s description of this as Change Agility.  I’ve also seen this called lock-in prevention or contingency planning. Both are important; however, we’re reaching new levels for 2017 because we can’t predict which technology stacks are robust and complete.

The fact is the none of them are robust or complete compared to historical platforms. So we go forward with an eye on alternatives.

How did we get to this state? I blame the 2016 Infrastructure Revolt.

Way, way, way back in 2010 (that’s about bronze age in the Cloud era), we started talking about developers helping automate infrastructure as part of deploying their code. We created some great tools for this and co-opted the term DevOps to describe provisioning automation. Compared to the part, it was glorious with glittering self-service rebellions and API-driven enlightenment.

In reality, DevOps was really painful because most developers felt that time fixing infrastructure was a distraction from coding features.

In 2016, we finally reached a sufficient platform capability set in tools like CI/CD pipelines, Docker Containers, Kubernetes, Serverless/Lambda and others that Developers had real alternatives to dealing with infrastructure directly. Once we reached this tipping point, the idea of coding against infrastructure directly become unattractive. In fact, the world’s largest infrastructure company, Amazon, is actively repositioning as a platform services company. Their re:Invent message was very clear: if you want to get the most from AWS, use our services instead of the servers.

For most users, using platform services instead of infrastructure is excellent advice to save cost and time.

The dilemma is that platforms are still evolving rapidly. So rapidly that adopters cannot count of the services to exist in their current form for multiple generations. However, the real benefits drive aggressive adoption. They also drive the rise of Crawfish IT.

As they say in N’Awlins, laissez les bon temps rouler!

Related Reading on the Doppler: Your Cloud Strategy Must Include No-Cloud Options

 

Can we control Hype & Over-Vendoring?

Q: Is over-vendoring when you’ve had to much to drink?
A: Yes, too much Kool Aid.

There’s a lot of information here – skip to the bottom if you want to see my recommendation.

Last week on TheNewStack, I offered eight ways to keep Kubernetes on the right track (abridged list here) and felt that item #6 needed more explanation and some concrete solutions.

  1. DO: Focus on a Tight Core
  2. DO: Build a Diverse Community
  3. DO: Multi-cloud and Hybrid
  4. DO: Be Humble and Honest
  5. AVOID: “The One Ring” Universal Solution Hubris
  6. AVOID: Over-Vendoring (discussed here)
  7. AVOID: Coupling Installers, Brokers and Providers to the core
  8. AVOID: Fast Release Cycles without LTS Releases

kool-aid-manWhat is Over-Vendoring?  It’s when vendors’ drive their companies’ brands ahead of the health of the project.  Generally by driving an aggressive hype cycle where vendors are trying to jump on the hype bandwagon.

Hype can be very dangerous for projects (David Cassel’s TNS article) because it is easy to bypass the user needs and boring scale/stabilization processes to focus on vendor differentiation.  Unfortunately, common use-cases do not drive differentiation and are invisible when it comes to company marketing budgets.  That boring common core has the effect creating tragedy of the commons which undermines collaboration on shared code bases.

The solution is to aggressively keep the project core small so that vendors have specific and limited areas of coopetition.  

A small core means we do not compel collaboration in many areas of project.  This drives competition and diversity that can be confusing.  The temptation to endorse or nominate companion projects is risky due to the hype cycle.  Endorsements can create a bias that actually hurts innovation because early or loud vendors do not generally create the best long term approaches.  I’ve heard this described as “people doing the real work don’t necessarily have time to brag about it.”

Keeping a small core mantra drives a healthy plug-in model where vendors can differentiate.  It also ensures that projects can succeed with a bounded set of core contributors and support infrastructure.  That means that we should not measure success by commits, committers or lines of code because these will drop as projects successfully modularize.  My recommendation for a key success metric is to the ratio of committers to ecosystem members and users.

Tracking improving ratio of core to ecosystem shows that improving efficiency of investment.  That’s a better sign of health than project growth.

It’s important to note that there is also a serious risk of under-vendoring too!  

We must recognize and support vendors in open source communities because they sustain the project via direct contributions and bringing users.  For a healthy ecosystem, we need to ensure that vendors can fairly profit.  That means they must be able to use their brand in combination with the project’s brand.  Apache Project is the anti-pattern because they have very strict “no vendor” trademark marketing guidelines that can strand projects without good corporate support.

I’ve come to believe that it’s important to allow vendors to market open source projects brands; however, they also need to have some limits on how they position the project.

How should this co-branding work?  My thinking is that vendor claims about a project should be managed in a consistent and common way.  Since we’re keeping the project core small, that should help limit the scope of the claims.  Vendors that want to make ecosystem claims should be given clear spaces for marketing their own brand in participation with the project brand.

I don’t pretend that this is easy!  Vendor marketing is planned quarters ahead of when open source projects are ready for them: that’s part of what feeds the hype cycle. That means that projects will be saying no to some free marketing from their ecosystem.  Ideally, we’re saying yes to the right parts at the same time.

Ultimately, hype control means saying no to free marketing.  For an open source project, that’s a hard but essential decision.

 

Infrastructure Masons is building a community around data center practice

IT is subject to seismic shifts right now. Here’s how we cope together.

For a long time, I’ve advocated for open operations (“OpenOps”) as a way to share best practices about running data centers. I’ve worked hard in OpenStack and, recently, Kubernetes communities to have operators collaborate around common architectures and automation tools. I believe the first step in these efforts starts with forming a community forum.

I’m very excited to have the RackN team and technology be part of the newly formed Infrastructure Masons effort because we are taking this exact community first approach.

infrastructure_masons

Here’s how Dean Nelson, IM organizer and head of Uber Compute, describes the initiative:

An Infrastructure Mason Partner is a professional who develop products, build or support infrastructure projects, or operate infrastructure on behalf of end users. Like their end users peers, they are dedicated to the advancement of the Industry, development of their fellow masons, and empowering business and personal use of the infrastructure to better the economy, the environment, and society.

We’re in the midst of tremendous movement in IT infrastructure.  The change to highly automated and scale-out design was enabled by cloud but is not cloud specific.  This requirement is reshaping how IT is practiced at the most fundamental levels.

We (IT Ops) are feeling amazing pressure on operations and operators to accelerate workflow processes and innovate around very complex challenges.

Open operations loses if we respond by creating thousands of isolated silos or moving everything to a vendor specific island like AWS.  The right answer is to fund ways to share practices and tooling that is tolerant of real operational complexity and the legitimate needs for heterogeneity.

Interested in more?  Get involved with the group!  I’ll be sharing more details here too.