Deep Thinking & Tech + Great Guests – L8ist Sh9y podcast relaunched

Posted on October 17, 2017 by Rob H

I love great conversations about technology – especially ones where the answer is not very neatly settled into winners and losers (which is ALL of them in IT). I’m excited that RackN has (re)launched the L8ist Sh9y (aka Latest Shiny) podcast around this exact theme.

Please check out the deep and thoughtful discussion I just had with Mark Thiele (notes) of Aperca where we covered Mark’s thought on why public cloud will be under 20% of IT and culture issues head on.

Spoiler: we have David Linthicum coming next, SO SUBSCRIBE.

I’ve been a guest on some great podcasts (Cloudcast, gcOnDemand, Datanauts, IBM Dojo, HPE, Foodfight) and have deep respect for critical work they do in industry.

We feel there’s still room for deep discussions specifically around automated IT Operations in cloud, data center and edge; consequently, we’re branching out to start including deep interviews in addition to our initial stable of IT Ops deep technical topics like Terraform, Edge Computing, GartnerSYM review, Kubernetes and, of course, our own Digital Rebar.

Soundcloud Subscription Information

SRE Thinking : Reframing Dev + Ops

Posted on August 16, 2017 by Rob H

Last month, Eric Wright and I were able to complete a discussion the inspired my guest post for CapitalOne “How Platforms and SREs Change the DevOps Contract.” While our conversation ranged widely over the challenges of building and integration of IT processes, the key message is simple: we need to make investments in operations.

This podcast explains why I’ve been using Site Reliability Engineering (SRE) as a proxy for this DevOps inspired rethinking of operations.

I hope you’ll take the time to listen to this deep conversation about very real IT issues. Eric and I are not shy about expressing our opinions, but we’re also anti-shaming. The simple reality is that building infrastructure is hard and we all make difficult choices. My hope is that we can start sharing the fixes and helping each other out.

Podcast Episode 50 – SRE Revisited plus the Challenges of Ops and more with Rob Hirschfeld (@zehicle)

Do these topics inspire you? Creating data center automation for SREs is our mission at RackN. We believe that well run infrastructure requires building APIs from the ground up and keeping them simple. I hope that you’ll take 5 minutes to try our latest offering, Digital Rebar Provision and join us on the quest drive excellence in operations.

July 28 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on July 28, 2017 by Rob H

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

This week, we launched our new RackN website to provide more information on our solutions and services as well as provide customer examples. Click over to our new site and let us know your thoughts.

SRE Items of the Week

Site Reliability Engineer: Don’t fall victim to the bias blind spot
http://sdtimes.com/site-reliability-engineer-dont-fall-victim-to-the-bias-blind-spot/

To ensure websites and applications deliver consistently excellent speed and availability, some organizations are adopting Google’s Site Reliability Engineering (SRE) model. In this model, a Site Reliability Engineer (SRE) – usually someone with both development and IT Ops experience – institutes clear-cut metrics to determine when a website or application is production-ready from a user performance perspective. This helps reduce friction that often exists between the “dev” and “ops” sides of organizations. More specifically, metrics can eliminate the conflict between developers’ desire to “Ship it!” and operations desire to not be paged when they are on-call. If performance thresholds aren’t met, releases cannot move forward. READ MORE

Episode 50 – SRE Revisited plus the Challenge of Ops and more with Rob Hirschfeld
http://podcast.discoposse.com/e/ep-50-sre-revisited-plus-the-challenges-of-ops-and-more-with-rob-hirschfeld-zehicle/

This fun chat expands on what we started talking about in episode 42 (http://podcast.discoposse.com/e/ep-42-spiraling-ops-debt-sre-solutions-and-rackn-chat-with-rob-hirschfeld-zehicle/) as we dive into the challenges and potential solutions for thinking and acting with the SRE approach. Big thansk to Rob Hirschfeld from @RackN for sharing his thoughts and experiences from the field on this very exciting subject. LISTEN HERE

Site Reliability Engineering – Operators and Developers Working Together
http://bit.ly/2u7eSmm

Rob Hirschfeld, Co-Founder and CEO of RackN provides his thoughts on how operators are equivalent to developers and work together to accomplish the critical task of keep the infrastructure running and available with constant changes in the data center

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

DevOpsDays Dallas – August 29 – 30: Rob Hirschfeld Talk

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #82
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

Let’s DevOps IRL: my SRE postings on RackN!

Posted on July 10, 2017 by Rob H

I’m investing in these Site Reliability Engineering (SRE) discussions because I believe operations (and by extension DevOps) is facing a significant challenge in keeping up with development tooling. The links below have been getting a lot of interest on twitter and driving some good discussion.

datanauts_logo_300

15967

Addressing this Ops debt is our primary mission at my company, RackN: we believe that integrated system level tooling is required. We also believe that new tools should not disrupt environments so we work very hard to adapt to requirements of individual sites.

SRE is urgent because it provides a pragmatic path and rationale for investment.

Even if you don’t agree with Google’s term or all their practices, I think fundamental concepts of system thinking, status/pay, automation investment and developer collaboration are essential. It should come as no surprise that these are all Lean/DevOps concepts; however, SRE has the pragmatic side of being a job function.

Here are some recent relevant discussions I’ve been having about SREs with links to both the audio and my text show notes.

Cloud Cast about SRE concepts and decomposing Ops
Datanauts deep dive about SRE based on the “DevOps vs SRE” talk from DevOpsDays Austin (original post)
Charity Majors and I debate the SRE name and pay equity for Ops.
Further Reading Podcasts
- Turbomatic’s Eric Wright
- HPE’s Stephen Spector

Of course, RackN is also doing a WEEKLY SRE update that captures general interest items. Check that out and subscribe.

June 30 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on June 30, 2017 by Rob H

SRE Items of the Week

Site Reliability Engineering at Dropbox with Tammy Butow @tammybutow

The mess and success of building open leadership (notes from Kubernetes Leadership Summit)
http://bit.ly/2tMTzEy

Three weeks ago, Kubernetes leaders met for a very busy day to reflect and plan how the community was being growing. I was humbled to be part of the Kubernetes Leadership Summit due to my work as the Cluster Ops SIG co-chair. READ MORE

Ops integration will be scary, proceed with haste
http://bit.ly/2u2Wfhq

As CEO of RackN, I talk to a lot of operations teams who have big aspirations for automation that are faltering due to internal resistance. Generally, we’re talking to the SREs on the team. Sadly, those SREs are often stymied by narrowly scoped teams and house-of-cards technical debt. READ MORE

The Case for Ops Engineering Pay Equity with Charity Majors
http://bit.ly/2tZBjYD

Charity Majors is one of my DevOps and SRE heroes* so it was great fun to be able to debate SRE with her at Gluecon this spring. Encouraged by Mike Maney to retell the story, we got to recapture our disagreement about “Is SRE is Good Term?” from the evening before. READ MORE

Datanauts #89 Dives Deep on SRE Approach and Urgency
http://bit.ly/2tqmbGl

In Datanauts 089, Chris Wahl and Ethan Banks help me break down the concepts from my “DevOps vs SRE vs Cloud Native” presentation from DevOpsDays Austin last spring. They do a great job exploring the tough topics and concepts from the presentation. It’s almost like an extended Q&A so you may want to review the slides or recording before diving into the podcast.

Here are my notes from the podcast READ MORE

5 Laws every aspiring Devops engineer should know by @ChrisShort
https://opensource.com/open-organization/17/5/5-devops-laws

“A good engineer is a lazy engineer,” some will say. And to a certain extent, it’s true: Laziness is a great quality if you’re automating repetitive tasks. But laziness flies in the face of learning new technologies and getting new work done. Somewhere between Junior Systems Administrator and Senior DevOps Engineer, laziness no longer becomes an advantage.

Let’s discuss the five laws aspiring DevOps engineers should follow if they want to become great DevOps engineers. READ MORE
___________

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
____________

UPCOMING EVENTS

2017 New York Venture Summit – LINK

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #78
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

Datanauts #89 dives deep on SRE approach and urgency

Posted on June 29, 2017 by Rob H

TL;DR: SRE makes Ops more Dev like in critical ways like status equity and tooling approaches.

In Datanauts 089, Chris Wahl and Ethan Banks help me break down the concepts from my “DevOps vs SRE vs Cloud Native” presentation from DevOpsDays Austin last spring. They do a great job exploring the tough topics and concepts from the presentation. It’s almost like an extended Q&A so you may want to review the slides or recording before diving into the podcast.

Advanced Reading: my follow-up discussion on SRE with the Cloudcast team and my previous Datanauts podcast.

Here are my notes from the podcast:

01:00 “Doing infrastructure in a way that the robots can take over”
01:51 Video where Charity & Rob Debated the SRE term
02:00 History of SRE term from Google vs Sys Ops – if site was not up, money was not flowing. SRE culture fixed pay equity and career ladder, ops would have automation/dev time, dev on hooks for errors
03:00 Google took a systems approach with lots of time for automation and coding
03:20 Finding a 10x improvement in ops. Go buy the book.
04:00 SRE is a new definition of System Op
04:10 The S in could be “system” or physical location (not web site).
05:00 We’re seeing SRE teams showing up in companies of every size. Replacing DevOps teams (which is a good thing). Rob is hoping that SRE is replacing DevOps as a job title.
06:10 Don’t fall for a title change from Sys Op to SRE with actually getting the pay and authority
06:45 Ethan believes that SRE is transforming to have a broad set of responsibilities. Is just a new System Admin definition?
07:30 Rob things that the SRE expectation is for a much higher level of automation. There’s a big thinking shift.
08:00 SREs are still operators. You have to walk the walk to know how to run the system. Not developers who are writing the platform.
08:30 Chris asks about the Ops technical debt
09:00 We need to make Ops tooling “better enough” – we’re not solving this problem fast enough. We have to do a better job – Rob talks about the Wannacry event.
10:30 Chris asks how to fix this since complexity is increasing. Rob plugs Digital Rebar as a way to solve this.
11:00 People are excited about Digital Rebar but don’t have the time to fix the problem. They are running crisis to crisis so we never get to automation that actually improves things.
12:00 At best, Ops is invisible. SRE is different because it includes CI/CD with on going interactions. There’s a lot coming with immutable operating systems and constantly term.
13:00 The idea that a Linux system has been up for 10 years is an anti-pattern. Rob would rather have people say that none of their servers has been up for more than a week (because they are constantly refreshed)
13:19 Chris & Ethan – SECTION 1 REVIEW
- SRE is not new, it’s about moving into a proactive stance (automatically reacting)
- The power is the buy in so that Ops has ownership of the stack
15:00 SRE vs DevOps vs Cloud Native – not in conflict, but we love to create opposition
15:40 There is a difference, they are not interchangeable. SRE is a job title, DevOps is a process and Cloud Native is an architecture.
16:30 We need to resist that Cloud Native is a “new shiney” that replaces DevOps. We don’t have to take things away.
17:00 Lean is a process where we’re trying to shorten the flow from ideation to delivery. Read the Goal [links] and The Phoenix Project [links].
18:00 Bottlenecks (where we’ve added work or delays) really break our pipelines.
19:00 Ethan’s adds the insight: If you don’t have small steps then you don’t really understand your process
20:00 Platform as a Service is not really reducing complexity, we’re just hiding/abstracting it. That moves the complexity. We may hide it from developers but may be passing it to the operators.
21:00 Chris asks if this can be mapped to legacy? Rob agrees that it’s a legacy architectural choice that was made to reduce incremental risk. Today, we’re trying to make our risk into smaller steps which makes it so that we will have smaller but more frequent breaks.
22:40 The way we deliver systems is changing to require a much faster pace of taking changes
23:00 SREs are data driven so they can feed information back to devs. They can’t (shouldn’t) walk away from running systems. This is an investment requirement so we can create data.
24:00 We let a lot of problems lurk below the surface that eventually surface as a critical issue. Cannot let toothaches turn into abscesses. SREs should watch systems over time.
25:20 If you are running under performance in the cloud, then you are wasting money.
26:00 Cloud Native, an architecture? What is it? It means a ton of things. For this preso, Rob made it about 12 factor and API driven infrastructure.
26:50 “If you are not worried about rising debt then we are in trouble.” We need to root cause! If not, they snowball and operators are just running fire to fire. We need to stop having operators be heros / grenade divers because it’s an anti-pattern. Predictable systems do not create a lot of interrupts or crises. Operators should not be event driven.
28:40 Chris & Ethan – SECTION 2 REVIEW
- Chris: Being data driven combats complexity
- Ethan: Breaking down processes into smaller units reduces risk.
30:00 Cloud First is not Cloud Only. CNCF projects are not VM specific, they are about abstractions that help developers be more productive. Ideally, the abstractions remove infrastructure because developers don’t want to do any infrastructure. We should not are about which type of infrastructure we are using
31:30 The similarities between the concepts is in their common outcomes/values. Cloud First wants to be infrastructure agnostic.
32:30 Chris ask how important CI/CD should be. Are these still important in non-Cloud environments. Rob things that Cloud Native may “cloud wash” architectures that are really just as important in traditional infrastructure.
34:00 Cloud Native was a defensive architecture because early cloud was not very good. CI/CD pipelines would be considered best practices in regular manufacturing.
35:00 These ideas are really good manufacturing process applied back to IT. Thankfully, there’s really nothing unexpected from repeatable production.
36:30 Lesson: Pay Equity. Traditionally operators are not paid as well as developers and that means that we’re giving them less respect. HiPPO (highest paid person in organization) is a very real effect where you can create a respect gap.
38:00 Lesson: Disrupt Less. We love the idea of disruption but they are very expensive and disproportionately to the operators. Change for Developers may be small but have big impacts to operators. More disruptive changes actually slow down adoption because that slows down inertia. SREs should be able to push back to insist on migration paths.
40:00 Rob talks about how RedFish, while good to replace IPMI, will take long time before it. There are pros and cons.

What makes ops hard? SRE/DevOps challenge & imperative [from Cloudcast 301]

Posted on June 27, 2017 by Rob H

TL;DR: Operators (DevOps & SREs) have a hard job, we need to make time and room for them to redefine their jobs in a much more productive way.

The Cloudcast.net by Brian Gracely and Aaron Delp brings deep experience and perspective into their discussions based on their impressive technology careers and understanding of the subject matter. Their podcasts go deep quickly with substantial questions that get to the heart of the issue. This was my third time on the show (previous notes).

In episode 301, we go deeply into the meaning and challenges for Site Reliability Engineering (SRE) functions. We also cover some popular technologies that are of general interest.

Author’s Note; For further information about SREs, listen to my discussion about “SRE vs DevOps vs Cloud Native” on the Datanauts podcast #89. (transcript pending)

Here are my notes from Cloudcast 301. with bold added for emphasis:

2:00 Rob defines SRE (more resources on RackN.com site).
- 2:30 Google’s SRE book gave a name, even changed the definition, to what I’ve been doing my whole career. Evolved name from being just about sites to a full system perspective.
- 3:30 SRE and DevOps are aligned at the core. While DevOps is about process and culture, SRE is more about the function and “factory.”
- 4:30 Developers don’t want to be shoving coal into the engine, but someone, SREs, have to make sure that everything keeps running

5:15 Brian asks about impedance mismatch between Dev and Ops. How do we fix that?

- 6:30 Rob talks about the crisis brewing for operations innovation gap (link). Digital Rebar is designed to create site-to-site automation so Operators can share repeatable best practices.
- 7:30 OpenStack ran aground because Operators because we never created a the practices that could be repeated. “Managed service as the required pattern is a failure of building good operational software.”
- 8:00 RackN decomposes operations into isolated units so that individual changes don’t break the software on top
9:20 Brian talks about the increasing rate of releases means that operations doesn’t have the skills to keep up with patching.

- 10:10 That’s “underlay automation” and even scarier because software is composited with all sorts of parts that have their own release cycles that are not synchronized.
- 11:30 We need to get system level patch/security.update hygiene to be automatic
- 12:20 This is really hard!
13:00 Brian asks what are the baby steps?

- 13:20 We have to find baby steps where there are nice clean boundaries at every layer from the very most basic. For RackN, that’s DHCP and PXE and then upto Kubernetes.
- 15:15 Rob rants that renaming Ops teams as SRE is a failure because SRE has objectives like job equity that need to be included.
- 16:00 Org silos get in the way of automation that have antibodies that make it difficult for SREs and DevOps to succeed.
- 17:10 Those people have to be empowered to make change
- 17:40 The existing tools must be pluggable or you are hurting operators. There’s really no true greenfield, so we help people by making things work in existing data centers.
- 19:00 Scripts may have technical debt but that does not mean they should just be disposed.
- 19:20 New and shiney does not equal better. For example, Container Linux (aka CoreOS) does not solve all problems.
- 20:10 We need to do better creating bridges between existing and new.
- 20:40 How do we make Day 2 compelling?
21:15 Brian asks about running OpenStack on Kubernetes.

- 22:00 Rob is a fan of Kubernetes on Metal, but really, we don’t want metal and vms to be different. That means that Kubernetes can be a universal underlay which is threatening to OpenStack.
- 23:00 This is no longer a JOKE: “Joint OpenStack Kubernetes Environments”
- 23:30 Running things on Kubernetes (or OpenStack) is great because the abstractions hide complexity of infrastructure; however, at the physical layer you need something that exposes that complexity (which is what RackN does).
25:00 Brian asks at what point do you need to get past the easy abstractions

- 25:30 You want to never care ever. But sometimes you need the information for special cases.
- 26:20 We don’t want to make the core APIs complex just to handle the special cases.
- 27:00 There’s still a class of people who need to care about hardware. These needs should not be embedded into the Kubernetes (or OpenStack) API.
28:00 Brian summarizes that we should not turn 1% use cases into complexity for everyone. We need to foster the skill of coding for operators

- 28:45 For SREs, turning Operators into coding & automation is essential. That’s a key point in the 50% programming statement for SREs.
- In the closing, Rob suggested checking out Digital Rebar Provision as a Cobbler replacement.

We’re very invested in talking about SRE and want to hear from you! How is your company transforming operations work to make it more sustainable, robust and human?We want to hear your stories and questions.

June 23 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on June 23, 2017 by Rob H

SRE Items of the Week

Datanauts 089: SRE vs Cloud Native vs DevOps
http://bit.ly/2txPXWV

Rob Hirschfeld joins the Datanauts to talk about the term Site Reliability Engineer (SRE) and what it means for IT operations.

Rob explores how the SRE designation is an effort to put operations teams on a more equal footing with developers within an organization. Rob and the Datanauts also discuss how SREs line up with other industry trends such as the cloud native and DevOps movements. LISTEN HERE

Why Does DevOps Require a New Operating Model? By Mustafa Kapadia @MKapadiaTweets
https://devops.com/why-should-cios-redesign-their-organizations/

For many, redesigning the operating model is table stakes for a successful DevOps transformation. But have you ever wondered why? Popular wisdom will have you believe that the main reason for operating model redesign are to…

“Improve collaboration between business and IT”
“Realign metrics”
“Take full advantage of the new tools”
“And even jump start culture change”

While these are all good reasons, frankly they miss the point. Experience suggests there is a more practical reason – match ownership with desired output.

What do we mean by that? Well first, let’s look at how the current model works. READ MORE

What can developers learn from being on call? By Julia Evans @b0rk http://jvns.ca/blog/2017/06/18/operate-your-software/

We often talk about being on call as being a bad thing. For example, the night before I wrote this my phone woke me up in the middle of the night because something went wrong on a computer. That’s no fun! I was grumpy.

In this post, though, we’re going to talk about what you can learn from being on call and how it can make you a better software engineer!. And to learn from being on call you don’t necessarily need to get woken up in the middle of the night. By “being on call”, here, I mean “being responsible for your code when it breaks”. It could mean waking up to issues that happened overnight and needing to fix them during your workday! READ MORE

Kargo Ansible Playbooks foster Collaborative Kubernetes Ops
http://bit.ly/2qENw3I

Why Kargo?
Making Kubernetes operationally strong is a widely held priority and I track many deployment efforts around the project. The incubated Kargo project is of particular interest for me because it uses the popular Ansible toolset to build robust, upgradable clusters on both cloud and physical targets. I believe using tools familiar to operators grows our community.

We’re excited to see the breadth of platforms enabled by Kargo and how well it handles a wide range of options like integrating Ceph for StatefulSet persistence and Helm for easier application uploads. Those additions have allowed us to fully integrate the OpenStack Helm charts (demo video). READ MORE

newsletter

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/

UPCOMING EVENTS

2017 New York Venture Summit – LINK

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #77
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

June 16 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Posted on June 16, 2017 by Rob H

SRE Items of the Week

The Cloudcast #301 – SRE and Infrastructure Operations
http://www.thecloudcast.net/2017/06/the-cloudcast-301-sre-and.html

Description: Brian talks with Rob Hirschfeld (@zehicle, Founder/CEO of @RackN) about the concepts of SRE (Site Reliability Engineering), the challenges of maintaining infrastructure software, emerging tools and the next-generation of operations.

Show Notes:

Topic 1 – Welcome back to the show. Let’s start by talking about the concept of SRE (Site Reliability Engineering). Give us the basics and maybe explain how it differs from what people define in DevOps.

Topic 2 – Application development has been moving faster for quite a while (agile development, etc.). But now infrastructure/operations teams have to deal with faster software – especially around updates (e.g. Kubernetes releases every 3 months). How are companies managing this?

Topic 3 – Given that this pace of operations change may not slow down, how do you think about the challenge in terms of process/operations versus technology/tools?

Topic 4 – What are some of the steps that companies take to better prepare for this type of operational model? Tools, process, skills, etc.

Topic 5 – Do you see SRE as being a progression for existing infrastructure/operations people, or is this more focused on sysadmins or developers that want to get away from building applications?

_____________

DevOps Enterprise Summit London: Tales of courage and community
https://techbeacon.com/devops-enterprise-summit-london-tales-courage-community

After spending two amazing days with 700 of my closest DevOps cohorts from Europe, the Middle East, Africa, and beyond, I learned all about the latest and greatest IT and technology transformation reports at the DevOps Enterprise Summit London. With substantial growth in attendance from the first year, in 2016, the buzz around the show was palpable. And, what a location! From the venue, the QEII Centre, we had 360-degree views of central London, from Big Ben to the London Eye and beyond.

Read more from Steve Brodie, CEO of Electric Cloud @stbrodie
_____________

.IO! .IO! It’s off to a Service Mesh you should go [Gluecon 2017 notes]
http://bit.ly/2rjw4We

Gluecon turned out to be all about a microservice concept called a “service mesh” which was being promoted by Buoyant with Linkerd and IBM/Google/Lyft with Istio. This class of services is a natural evolution of the rush to microservices and something that I’ve written microservice technical architecture on TheNewStack about in the past. READ MORE
_____________

A few things I’ve learned about Kubernetes
https://jvns.ca/blog/2017/06/04/learning-about-kubernetes/

I’ve been learning about Kubernetes at work recently. I only started seriously thinking about it maybe 6 months ago – my partner Kamal has been excited about Kubernetes for a few years (him: “julia! you can run programs without worrying what computers they run on! it is so cool!“, me: “I don’t get it, how is that even possible”), but I understand it a lot better now.

This isn’t a comprehensive explanation or anything, it’s some things I learned along the way that have helped me understand what’s going on.

Read more from Julia Evans @b0rk
_____________

UPCOMING EVENTS

2017 New York Venture Summit – LINK

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly) – Issue #76
The DevOps/WebOps Marketing Geek – LINK from @LukasHertig
Julie Evans Blog – LINK

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

Posted on April 24, 2017 by Rob H

CaaPuccino: A frothy mix of containers and platforms.

Check out Krish Subramanian’s (@krishnan) Modern Enterprise podcast (audio here) today for a surprisingly deep and thoughtful discussion about how frothy new technologies are impacting Modern Enterprise IT. Of course, we also take some time to throw some fire bombs at the end. You can use my notes below to jump to your favorite topics.

The key takeaways are that portability is hard and we’re still working out the impact of container architecture.

The benefit of the longer interview is that we really dig into the reasons why portability is hard and discuss ways to improve it. My personal SRE posts and those on the RackN blog describe operational processes that improve portability. These are real concerns for all IT organizations because mixed and hybrid models are a fact of life.

If you are not actively making automation that works against multiple infrastructures then you are building technical debt.

Of course, if you just want the snark, then jump forward to 24:00 minutes in where we talk future of Kubernetes, OpenStack and the inverted intersection of the projects.

Krish, thanks for the great discussion!

Rob’s Podcast Notes (39 minutes)

2:37: Rob intros about Digital Rebar & RackN

4:50: Why our Kubernetes is JUST UPSTREAM

5:35: Where are we going in 5 years > why Rob believes in Hybrid

Should not be 1 vendor who owns everything
That’s why we work for portability
Public cloud vision: you should stop caring about infrastructure
Coming to an age when infrastructure can be completely automated
Developer rebellion against infrastructure

8:36: Krish believes that Public cloud will be more decentralized

Public cloud should be part of everyone’s IT plan
It should not be the ONLY thig

9:25: Docker helps create portability, what else creates portability? Will there be a standard

Containers are a huge change, but it’s not just packaging
Smaller units of work is important for portability
Container schedulers & PaaS are very opinionated, that’s what creates portability
Deeper into infrastructure loses portability (RackN helps)
Rob predicts that Lambda and Serverless creates portability too

11:38: Are new standards emerging?

Some APIs become dominate and create de facto APIs
Embedded assumptions break portability – that’s what makes automation fragile
Rob explains why we inject configuration to abstract infrastructure
RackN works to inject attributes instead of allowing scripts to assume settings
For example, networking assumptions break portability
Platforms force people to give up configuration in ways that break portability

14:50: Why did Platform as a Service not take off?

Rob defends PaaS – thinks that it has accomplished a lot
Challenge of PaaS is that it’s very restrictive by design
Calls out Andrew Clay Shafer’s “don’t call it a PaaS” position
Containers provide a less restrictive approach with more options.

17:00: What’s the impact on Enterprise? How are developers being impacted?

Service Orientation is a very important thing to consider
Encapsulation from services is very valuable
Companies don’t own all their IT services any more – it’s not monolithic
IT Service Orientation aligns with Business Processes
Rob says the API economy is a big deal
In machine learning, a business’ data may be more valuable than their product

19:30: Services impact?

Service’s have a business imperative
We’re not ready for all the impacts of a service orientation
Challenge is to mix configuration and services
Magic of Digital Rebar is that it can mix orchestration of both

22:00: We are having issues with simple, how are we going to scale up?

Barriers are very low right now

22:30: Will Kubernetes help us solve governance issues?

Kubernetes is doing a go building an ecosystem
Smart to focus on just being Kubernetes
It will be chaotic as the core is worked out

24:00: Do you think Kubernetes is going in the right direction?

Rob is bullish for Kubernetes to be the dominant platform because it’s narrow and specific
Google has the right balance of control
Kubernetes really is not that complex for what it does
Mesos is also good but harder to understand for users
Swarm is simple but harder to extend for an ecosystem
Kubernetes is a threat to Amazon because it creates portability and ecosystem outside of their platform
Rob thinking that Kubernetes could create platform services that compete with AWS services like RDS.
It’s likely to level the field, not create a Google advantage

27:00: How does Kubernetes fit into the Digital Rebar picture?

We think of Kubernetes as a great infrastructure abstraction that creates portability
We believe there’s a missing underlay that cannot abstract the infrastructure – that’s what we do.
OpenStack deployments broken because every data center is custom and different – vendors create a lot of consulting without solving the problem
RackN is creating composability UNDER Kubernetes so that those infrastructure differences do not break operation automation
Kubernetes does not have the constructs in the abstraction to solve the infrastructure problem, that’s a different problem that should not be added into the APIs
Digital Rebar can also then use the Kubernetes abstractions?

30:20: Can OpenStack really be managed/run on top of Kubernetes? That seems complex!

There is a MESS in the message of Kubernetes under OpenStack because it sends the message that Kubernetes is better at managing application than OpenStack
Since OpenStack is just an application and Kubernetes is a good way to manage applications
When OpenStack is already in containers, we can use Kubernetes to do that in a logical way
“I’m super impressed with how it’s working” using OpenStack Helm Packs (still needs work)
Physical environment still has to be injected into the OpenStack on Kubernetes environment

35:05 Does OpenStack have a future?

Yes! But it’s not the big “data center operating system” future that we expected in 2010. Rob thinks it a good VM management platform.
Rob provides the same caution for Kubernetes. It will work where the abstractions add value but data centers are complex hybrid beasts
Don’t “square peg a data center round hole” – find the best fit
OpenStack should have focused on the things it does well – it has a huge appetite for solving too many problems.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Tag Archives: Podcast

SRE Thinking : Reframing Dev + Ops

July 28 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Let’s DevOps IRL: my SRE postings on RackN!

June 30 – Weekly Recap of All Things Site Reliability Engineering (SRE)

What makes ops hard? SRE/DevOps challenge & imperative [from Cloudcast 301]

June 16 – Weekly Recap of All Things Site Reliability Engineering (SRE)

_____________

How about a CaaPuccino? Krish and Rob discuss containers, platforms, hybrid issues around Kubernetes and OpenStack.

Rob’s Podcast Notes (39 minutes)