SRE role with DevOps for Enterprise [@HPE podcast]

sre-series

My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

Yes, DevOps and SRE are complementary

In this short 16 minute podcast, HPE’s Stephen Spector and I discuss how DevOps and SRE thinking overlaps and where are the differences.  We also discuss how Enterprises should be evaluating Site Reliability Engineering as a function and where it fits in their organization.

Beyond Expectations: OpenStack via Kubernetes Helm (Fully Automated with Digital Rebar)

RackN revisits OpenStack deployments with an eye on ongoing operations.

I’ve been an outspoken skeptic of a Joint OpenStack Kubernetes Environment (my OpenStack BCN presoSuper User follow-up and BOS Proposal) because I felt that the technical hurdles of cloud native architecture would prove challenging.  Issues like stable service positioning and persistent data are requirements for OpenStack and hard problems in Kubernetes.

I was wrong: I underestimated how fast these issues could be addressed.

youtube-thumb-nail-openstackThe Kubernetes Helm work out of the AT&T Comm Dev lab takes on the integration with a “do it the K8s native way” approach that the RackN team finds very effective.  In fact, we’ve created a fully integrated Digital Rebar deployment that lays down Kubernetes using Kargo and then adds OpenStack via Helm.  The provisioning automation includes a Ceph cluster to provide stateful sets for data persistence.  

This joint approach dramatically reduces operational challenges associated with running OpenStack without taking over a general purpose Kubernetes infrastructure for a single task.

sre-seriesGiven the rise of SRE thinking, the RackN team believes that this approach changes the field for OpenStack deployments and will ultimately dominate the field (which is already  mainly containerized).  There is still work to be completed: some complex configuration is required to allow both Kubernetes CNI and Neutron to collaborate so that containers and VMs can cross-communicate.

We are looking for companies that want to join in this work and fast-track it into production.  If this is interesting, please contact us at sre@rackn.com.

Why should you sponsor? Current OpenStack operators facing “fork-lift upgrades” should want to find a path like this one that ensures future upgrades are baked into the plan.  This approach provide a fast track to a general purpose, enterprise grade, upgradable Kubernetes infrastructure.

Closing note from my past presentations: We’re making progress on the technical aspects of this integration; however, my concerns about market positioning remain.

“Why SRE?” Discussion with Eric @Discoposse Wright

sre-series My focus on SRE series continues… At RackN, we see a coming infrastructure explosion in both complexity and scale. Unless our industry radically rethinks operational processes, current backlogs will escalate and stability, security and sharing will suffer.

ericewrightI was a guest on Eric “@discoposse” Wright of the Green Circle Community #42 Podcast (my previous appearance).

LISTEN NOW: Podcast #42

In this action-packed 30 minute conversation, we discuss the industry forces putting pressure on operations teams.  These pressures require operators to be investing much more heavily on reusable automation.

That leads us towards why Kubernetes is interesting and what went wrong with OpenStack (I actually use the phrase “dumpster fire”).  We ultimately talk about how those lessons embedded in Digital Rebar architecture.

(re)OpenStack for 2017 – board voting week starts this Monday

[1/19 Update: I placed 9th in the results (or 6th if you go only by popular vote instead of total votes).  There are 8 seats, so I was not elected.]

The OpenStack Project needs a course correction and I’m asking for your community vote to put me back on the 2017 Board to help drive it.  As a start-up CEO, I’m neutral, yet I also have the right technical, commercial and community influence to make this a reality.

Vote Now!Your support is critical because OpenStack fills a very real need and should have a solid future; however, it needs to adapt to market realities to achieve that.

I want the Board to acknowledge and adapt to stumbles in ecosystem success including being dropped or re-prioritized by key sponsors.  This should include tightening the mission so the project can collaborate more freely with both open and proprietary platforms.  In 2016, I’ve been deeply involved OpenStack alternatives including Kubernetes and hybrid Cloud automation with Amazon and Google.

OpenStack must adjust to being one of several alternatives including AWS, Google and container platforms like Kubernetes.

That means focusing on our IaaS strengths and being unambiguous about core function like SDN and storage integration.   It also means ensuring that commercial members of the ecosystem can both profit and compete.  The Board has both the responsibility and authority to make these changes if the members are willing to act.

What’s my background?  I’ve been an active and vocal member of the OpenStack community since the very beginning of the project especially around Operator and Product Management issues.  I was elected to the board four times and played critical roles including launching the DefCore efforts and pushing for more definition of the Big Tent concept (which I believe has hurt the project).

In a great field of candidates!  Like other years, there are many very strong candidates whom I have worked with in a number of roles.  I always recommend distributing your eight votes to multiple people and limited “affinity voting” for your own company or geography.   While all candidates would serve the board, this year, I’d like to call attention specifically to  Shamail Tahair as a candidate who has invested significant time in helping with Product Management and Enterprise Readiness for OpenStack.

2016 Infrastructure Revolt makes 2017 the “year of the IT Escape Clause”

Software development technology is so frothy that we’re developing collective immunity to constant churn and hype cycles. Lately, every time someone tells me that they have hot “picked technology Foo” they also explain how they are also planning contingencies for when Foo fails. Not if, when.

13633961301245401193crawfish20boil204-mdRequired contingency? That’s why I believe 2017 is the year of the IT Escape Clause, or, more colorfully, the IT Crawfish.

When I lived in New Orleans, I learned that crawfish are anxious creatures (basically tiny lobsters) with powerful (and delicious) tails that propel them backward at any hint of any danger. Their ability to instantly back out of any situation has turned their name into a common use verb: crawfish means to back out or quickly retreat.

In IT terms, it means that your go-forward plans always include a quick escape hatch if there’s some problem. I like Subbu Allamaraju’s description of this as Change Agility.  I’ve also seen this called lock-in prevention or contingency planning. Both are important; however, we’re reaching new levels for 2017 because we can’t predict which technology stacks are robust and complete.

The fact is the none of them are robust or complete compared to historical platforms. So we go forward with an eye on alternatives.

How did we get to this state? I blame the 2016 Infrastructure Revolt.

Way, way, way back in 2010 (that’s about bronze age in the Cloud era), we started talking about developers helping automate infrastructure as part of deploying their code. We created some great tools for this and co-opted the term DevOps to describe provisioning automation. Compared to the part, it was glorious with glittering self-service rebellions and API-driven enlightenment.

In reality, DevOps was really painful because most developers felt that time fixing infrastructure was a distraction from coding features.

In 2016, we finally reached a sufficient platform capability set in tools like CI/CD pipelines, Docker Containers, Kubernetes, Serverless/Lambda and others that Developers had real alternatives to dealing with infrastructure directly. Once we reached this tipping point, the idea of coding against infrastructure directly become unattractive. In fact, the world’s largest infrastructure company, Amazon, is actively repositioning as a platform services company. Their re:Invent message was very clear: if you want to get the most from AWS, use our services instead of the servers.

For most users, using platform services instead of infrastructure is excellent advice to save cost and time.

The dilemma is that platforms are still evolving rapidly. So rapidly that adopters cannot count of the services to exist in their current form for multiple generations. However, the real benefits drive aggressive adoption. They also drive the rise of Crawfish IT.

As they say in N’Awlins, laissez les bon temps rouler!

Related Reading on the Doppler: Your Cloud Strategy Must Include No-Cloud Options

 

Can we control Hype & Over-Vendoring?

Q: Is over-vendoring when you’ve had to much to drink?
A: Yes, too much Kool Aid.

There’s a lot of information here – skip to the bottom if you want to see my recommendation.

Last week on TheNewStack, I offered eight ways to keep Kubernetes on the right track (abridged list here) and felt that item #6 needed more explanation and some concrete solutions.

  1. DO: Focus on a Tight Core
  2. DO: Build a Diverse Community
  3. DO: Multi-cloud and Hybrid
  4. DO: Be Humble and Honest
  5. AVOID: “The One Ring” Universal Solution Hubris
  6. AVOID: Over-Vendoring (discussed here)
  7. AVOID: Coupling Installers, Brokers and Providers to the core
  8. AVOID: Fast Release Cycles without LTS Releases

kool-aid-manWhat is Over-Vendoring?  It’s when vendors’ drive their companies’ brands ahead of the health of the project.  Generally by driving an aggressive hype cycle where vendors are trying to jump on the hype bandwagon.

Hype can be very dangerous for projects (David Cassel’s TNS article) because it is easy to bypass the user needs and boring scale/stabilization processes to focus on vendor differentiation.  Unfortunately, common use-cases do not drive differentiation and are invisible when it comes to company marketing budgets.  That boring common core has the effect creating tragedy of the commons which undermines collaboration on shared code bases.

The solution is to aggressively keep the project core small so that vendors have specific and limited areas of coopetition.  

A small core means we do not compel collaboration in many areas of project.  This drives competition and diversity that can be confusing.  The temptation to endorse or nominate companion projects is risky due to the hype cycle.  Endorsements can create a bias that actually hurts innovation because early or loud vendors do not generally create the best long term approaches.  I’ve heard this described as “people doing the real work don’t necessarily have time to brag about it.”

Keeping a small core mantra drives a healthy plug-in model where vendors can differentiate.  It also ensures that projects can succeed with a bounded set of core contributors and support infrastructure.  That means that we should not measure success by commits, committers or lines of code because these will drop as projects successfully modularize.  My recommendation for a key success metric is to the ratio of committers to ecosystem members and users.

Tracking improving ratio of core to ecosystem shows that improving efficiency of investment.  That’s a better sign of health than project growth.

It’s important to note that there is also a serious risk of under-vendoring too!  

We must recognize and support vendors in open source communities because they sustain the project via direct contributions and bringing users.  For a healthy ecosystem, we need to ensure that vendors can fairly profit.  That means they must be able to use their brand in combination with the project’s brand.  Apache Project is the anti-pattern because they have very strict “no vendor” trademark marketing guidelines that can strand projects without good corporate support.

I’ve come to believe that it’s important to allow vendors to market open source projects brands; however, they also need to have some limits on how they position the project.

How should this co-branding work?  My thinking is that vendor claims about a project should be managed in a consistent and common way.  Since we’re keeping the project core small, that should help limit the scope of the claims.  Vendors that want to make ecosystem claims should be given clear spaces for marketing their own brand in participation with the project brand.

I don’t pretend that this is easy!  Vendor marketing is planned quarters ahead of when open source projects are ready for them: that’s part of what feeds the hype cycle. That means that projects will be saying no to some free marketing from their ecosystem.  Ideally, we’re saying yes to the right parts at the same time.

Ultimately, hype control means saying no to free marketing.  For an open source project, that’s a hard but essential decision.

 

Infrastructure Masons is building a community around data center practice

IT is subject to seismic shifts right now. Here’s how we cope together.

For a long time, I’ve advocated for open operations (“OpenOps”) as a way to share best practices about running data centers. I’ve worked hard in OpenStack and, recently, Kubernetes communities to have operators collaborate around common architectures and automation tools. I believe the first step in these efforts starts with forming a community forum.

I’m very excited to have the RackN team and technology be part of the newly formed Infrastructure Masons effort because we are taking this exact community first approach.

infrastructure_masons

Here’s how Dean Nelson, IM organizer and head of Uber Compute, describes the initiative:

An Infrastructure Mason Partner is a professional who develop products, build or support infrastructure projects, or operate infrastructure on behalf of end users. Like their end users peers, they are dedicated to the advancement of the Industry, development of their fellow masons, and empowering business and personal use of the infrastructure to better the economy, the environment, and society.

We’re in the midst of tremendous movement in IT infrastructure.  The change to highly automated and scale-out design was enabled by cloud but is not cloud specific.  This requirement is reshaping how IT is practiced at the most fundamental levels.

We (IT Ops) are feeling amazing pressure on operations and operators to accelerate workflow processes and innovate around very complex challenges.

Open operations loses if we respond by creating thousands of isolated silos or moving everything to a vendor specific island like AWS.  The right answer is to fund ways to share practices and tooling that is tolerant of real operational complexity and the legitimate needs for heterogeneity.

Interested in more?  Get involved with the group!  I’ll be sharing more details here too.