June 2 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

RackN and our Co-Founder and CEO Rob Hirschfeld openly called for a significant change to the OpenStack and Kubernetes communities in his VMBlog.com post, How is OpenStack so dead AND yet so very alive to SREs?

“We’re going to keep solving problems in and around the OpenStack community.  I’m excited to see the Foundation embracing that mission.  There are still many hard decisions to make.  For example, I believe that Kubernetes as an underlay is compelling for operators and will drive the OpenStack code base into a more limited role as a Kubernetes workload (check out my presentation about that at Boston).  While that may refocus the coding efforts, I believe it expands the relevance of the open infrastructure community we’ve been building.

Building infrastructure software is hard and complex.  It’s better to do it with friends so please join me in helping keep these open operations priorities very much alive.”

To provide more information on this idea, Rob posted a new blog, OpenStack’s Big Pivot: our suggestion to drop everything and focus on being a Kubernetes VM management workload.

“Sometimes paradigm changes demand a rapid response and I believe unifying OpenStack services under Kubernetes has become an such an urgent priority that we must freeze all other work until this effort has been completed.”

This proposal has caused significant readership for a typical RackN blog as well as on social media so Rob has posted a 2nd post to further the proposal. (re)Finding an Open Infrastructure Plan: Bridging OpenStack & Kubernetes.

It’s essential to solve these problems in an open way so that we can work together as a community of operators.”

As you would expect, RackN is very interested in your thoughts on this proposal and its impact not only on the OpenStack and Kubernetes communities but also how it can transform the ability of IT infrastructure teams to deploy complex technologies in a reliable and scalable manner.

Please contact @zehicle and @rackngo to join the conversation.
_____________

Using Containers and Kubernetes to Enable the Development of Object-Oriented Infrastructure: Brendan Burns GlueCon Presentation

Is SRE a Good Term?
Interview with Rob Hirschfeld (RackN) and Charity Majors (Honeycomb) at Gluecon 2017


_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

Velocity : June 19 – 20 in San Jose, CA

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)Issue #74

(re)Finding an Open Infrastructure Plan: Bridging OpenStack & Kubernetes

TL;DR: infrastructure operations is hard and we need to do a lot more to make these systems widely accessible, easy to sustain and lower risk.  We’re discussing these topics on twitter…please join in.  Themes include “do we really have consensus and will to act” and “already a solved problem” and “this hurts OpenStack in the end.”  

pexels-photo-229949I am always looking for ways to explain (and solve!) the challenges that we face in IT operations and open infrastructure.  I’ve been writing a lot about my concern that data center automation is not keeping pace and causing technical debt.  That concern led to my recent SRE blogging for RackN.

It’s essential to solve these problems in an open way so that we can work together as a community of operators.

It feels like developers are quick to rally around open platforms and tools while operators tend to be tightly coupled to vendor solutions because operational work is tightly coupled to infrastructure.  From that perspective, I’m been very involved in OpenStack and Kubernetes open source infrastructure platforms because I believe the create communities where we can work together.

This week, I posted connected items on VMblog and RackN that layout a position where we bring together these communities.

Of course, I do have a vested interest here.  Our open underlay automation platform, Digital Rebar, was designed to address a missing layer of physical and hybrid automation under both of these projects.  We want to help accelerate these technologies by helping deliver shared best practices via software.  The stack is additive – let’s build it together.

I’m very interested in hearing from you about these ideas here or in the context of the individual posts.  Thanks!

OpenStack’s Big Pivot: our suggestion to drop everything and focus on being a Kubernetes VM management workload

TL;DR: Sometimes paradigm changes demand a rapid response and I believe unifying OpenStack services under Kubernetes has become an such an urgent priority that we must freeze all other work until this effort has been completed.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive

By design, OpenStack chose to be unopinionated about operations.

pexels-photo-422290That made sense for a multi-vendor project that was deeply integrated with the physical infrastructure and virtualization technologies.  The cost of that decision has been high for everyone because we did not converge to shared practices that would drive ease of operations, upgrade or tuning.  We ended up with waves of vendors vying to have the the fastest, simplest and openest version.  

Tragically, install became an area of competition instead an area of collaboration.

Containers and microservice architecture (as required for Kubernetes and other container schedulers) is providing an opportunity to correct this course.  The community is already moving towards containerized services with significant interest in using Kubernetes as the underlay manager for those services.  I’ve laid out the arguments for and challenges ahead of this approach in other places.  

These technical challenges involve tuning the services for cloud native configuration and immutable designs.  They include making sure the project configurations can be injected into containers securely and the infra-service communication can handle container life-cycles.  Adjacent concerns like networking and storage also have to be considered.  These are all solvable problems that can be more quickly resolved if the community acts together to target just one open underlay.

The critical fact is that the changes are manageable and unifying the solution makes the project stronger.

Using Kubernetes for OpenStack service management does not eliminate or even solve the challenges of deep integration.  OpenStack already has abstractions that manage vendor heterogeneity and those abstractions are a key value for the project.  Kubernetes solves a different problem: it manages the application services that run OpenStack with a proven, understood pattern.  By adopting this pattern fully, we finally give operators consistent, shared and open upgrade, availability and management tooling.

Having a shared, open operational model would help drive OpenStack faster.

There is a risk to this approach: driving Kubernetes as the underlay for OpenStack will force OpenStack services into a more narrow scope as an infrastructure service (aka IaaS).  This is a good thing in my opinion.   We need multiple abstractions when we build effective IT systems.  

The idea that we can build a universal single abstraction for all uses is a dangerous distraction; instead; we need to build platform layers collaborativity.  

While initially resisting, I have become enthusiatic about this approach.  RackN has been working hard on the upgradable & highly available Kubernetes on Metal prerequisite.  We’ve also created prototypes of the fully integrated stack.  We believe strongly that this work should be done as a community effort and not within a distro.

My call for a Kubernetes underlay pivot embraces that collaborative approach.  If we can keep these platforms focused on their core value then we can build bridges between what we have and our next innovation.  What do you think?  Is this a good approach?  Contact us if you’d like to work together on making this happen.

See Also Rob’s VMblog.com post How is OpenStack so dead AND yet so very alive to SREs? 

May 26 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

booth.PNG
Co-Founders of RackN Rob Hirschfeld and Greg Althaus at GlueCon

Reuven Cohen and Rob Hirschfeld Chat at GlueCon17
Reuven Cohen (@ruv) and Rob Hirschfeld discuss data center infrastructure trends concerning provisioning, automation and challenges. Rob highlights his company RackN and the open source project Digital Rebar sponsored by RackN.


_____________

Is SRE a Good Term?
Interview with Rob Hirschfeld (RackN) and Charity Majors (Honeycomb) at Gluecon 2017


_____________

How Google Runs its Production Systems – Get the Book
http://www.techrepublic.com/article/want-to-understand-how-google-runs-its-production-systems-read-this-free-book/

The book Site Reliability Engineering helps readers understand how some Googlers think: It contains the ideas of more than 125 authors. The four editors, Betsy Beyer, Chris Jones, Jennifer Petoff, and Niall Richard Murphy, managed to weave all of the different perspectives into a unified work that conveys a coherent approach to managing distributed production systems.

Site Reliability Engineering delivers 34 chapters—totaling more than 500 printed pages from O’Reilly Media—that encompass the principles and practices that keep Google’s production systems working. The entire book is available online at https://landing.google.com/sre/book.html, along with links to other talks, interviews, publications, and events.

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

Velocity : June 19 – 20 in San Jose, CA

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)Issue #73

May 19 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week


Kargo Ansible Playbooks foster Collaborative Kubernetes Ops
http://blog.kubernetes.io/2017/05/kargo-ansible-collaborative-kubernetes-ops.html

kubernetes

Making Kubernetes operationally strong is a widely held priority and I track many deployment efforts around the project. The incubated Kargo project is of particular interest for me because it uses the popular Ansible toolset to build robust, upgradable clusters on both cloud and physical targets. I believe using tools familiar to operators grows our community.

We’re excited to see the breadth of platforms enabled by Kargo and how well it handles a wide range of options like integrating Ceph for StatefulSet persistence and Helm for easier application uploads. Those additions have allowed us to fully integrate the OpenStack Helm charts (demo video). READ MORE
___________

Cybercrime for Profit? Five reasons why we need to start driving much more dynamic IT Operations
https://rackn.com/2017/05/16/cybercrime-for-profit-five-reasons-why-we-need-to-starting-driving-much-more-dynamic-it-operations/
pexels-photo-169617

There’s a frustrating cyberattack driven security awareness cycle in IT Operations. Exploits and vulnerabilities are neither new nor unexpected; however, there is a new element taking shape that should raise additional alarm.

Cyberattacks are increasingly profit generating and automated. READ MORE
_____________

Building the SRE Culture at LinkedIn
https://engineering.linkedin.com/blog/2017/05/building-the-sre-culture-at-linkedin

Being a Site Reliability Engineer (SRE) means having to talk about hard problems. Site outages, complex failure scenarios, and other technical emergencies are the things we have to be prepared to deal with every day. When we’re not dealing with problems, we’re discussing them. We regularly perform post-mortems and root cause analyses, and we generally dig into complex technical problems in an unflinching way. READ MORE
_____________
Virtual Panel: OpenStack Summit Boston 2017 Debriefing


_____________

SRE vs. DevOps — a False Distinction?
https://devops.com/sre-vs-devops-false-distinction/

Just a few days before he died at the beginning of the 1990s, a wise man taught us that “the show must go on.” Freddie Mercury’s parting words have long provided the guiding light for many, if not all, ops teams. In their eyes, the production environment should be exposed to minimum risk, even at the expense of new features and problem resolution.

About 10 years ago, Google decided to change its approach to production management. It took the company only a few years to realize that while R&D focused on creating new features and pushing them to production, the Operations group was trying to keep production as stable as possible—the two teams were pulling in opposite directions. This tension arose due to the groups’ different backgrounds, skill sets, incentives and metrics by which they were measured. READ MORE
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

Gluecon : May 24 – 25, 2017 in Denver, CO

  • Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)Issue #72

Cybercrime for Profit!? Five reasons why we need to start driving much more dynamic IT Operations

Author’s call to action: if you think you already know this is a problem, then why do we keep reliving it?  We’re doing our part open with Digital Rebar and we need more help to secure infrastructure using foundational automation.

There’s a frustrating cyberattack driven security awareness cycle in IT Operations.  Exploits and vulnerabilities are neither new nor unexpected; however, there is a new element taking shape that should raise additional alarm.pexels-photo-169617.jpeg

Cyberattacks are increasingly profit generating and automated.

The fundamental fact of the latest attacks is that patches were available.  The extensive impact we are seeing is caused by IT Operations that relies on end-of-life components and cannot absorb incremental changes.  These practices are based on dangerous obsolete assumptions about perimeter defense and long delivery cycles.

It’s not just new products using CI/CD pipelines and dynamic delivery: we must retrofit all IT infrastructure to be constantly refreshed.

We simply cannot wait because the cybersecurity challenges are accelerating.  What’s changed in the industry?  There is a combination of factors driving these trends:

  1. Profit motive – attacks are not simply about getting information, they are profit centers made simpler with hard to trace cryptocurrency.
  2. Shortening windows – we’re doing better at finding, publishing and fixing issues than ever in the open.  That cycle assumes that downstream users are also applying the fixes quickly.  Without downstream adoption, the process fails to realize key benefit.
  3. Automation and machine learning – the attackers are using more and more sophisticated automation to find and exploit vulnerabilities.  Expect them to use machine learning to make it even more effective.
  4. No perimeter – our highly interconnected and mobile IT environments eliminate the illusion of a perimeter defense.  This not just a networking statement: our code bases and service catalogs are built from many outside sources that often have deep access.
  5. Expanding surface area – finally, we’re embedding and connected more devices every second into our infrastructure.  Costs are decreasing while capability increases.  There’s no turning back from that, we we should expect an ongoing list of vulnerabilities.

No company has all the answers for cybersecurity; however, it’s clear that we cannot solve this cybersecurity at the perimeter and allowing the interior to remain static.

The only workable IT posture starts with a continuously deployed and updated foundation.

Companies typically skip this work because it’s very difficult to automate in a cross-infrastructure and reliable way.  I’ve been working in this space for nearly two decades and we’re just delivering deep automation that can be applied in generalized ways as part of larger processes.  The good news is that means that we can finally start discussing real shared industry best practices.

Thankfully, with shared practices and tooling, we can get ahead of the attackers.

RackN focuses exclusively on addressing infrastructure automation in an open way.  We are solving this problem from the data center foundations upward.  That allows us to establish security practice that is both completely trusted and constantly refreshed.  It’s definitely not the only thing companies need to do, but that foundation and posture helps drive a better defense.

I don’t pretend to have complete answers to the cyberattacks we are seeing, but I hope they inspire us to more security discipline.  We are on the cusp of a new wave of automated and fast exploits.

Let us know if you are interested in working with RackN to build a more dynamic infrastructure.

If Private Cloud is dead. Where did it go? How did it get there? [JOINT POST]

TL;DR: Hybrid killed IT.

I’m a regular participant on BWG Roundtable calls and often extend those discussions 1×1.  This post collects questions from one of those follow-up meetings where we explored how data center markets are changing based on new capacity and also the impact of cloud.  

We both believe in the simple answer, “it’s going to be hybrid.” We both feel that this answer does not capture the real challenges that customers are facing.

pexels-photo-325229So who are we?  Haynes Strader, Jr. comes at this from a real estate perspective via CBRE Data Center Solutions.  Rob Hirschfeld comes at this from an ops and automation perspective via RackN.  We are in very different aspects of the data center market.    

Rob: I know that we’re building a lot of data center capacity.  So far, it’s been really hard to move operations to new infrastructure and mobility is a challenge.  Do you see this too?

Haynes: Yes.  Creating a data center network that is both efficient and affordable is challenging. A couple of key data center interconnection providers offer this model, but few companies are in a position to truly leverage the node-cloud-node model, where a company leverages many small data center locations (colo) that all connect to a cloud option for the bulk of their computing requirements. This works well for smaller companies with a spread-out workforce, or brand new companies with no legacy infrastructure, but the Fortune 2000 still have the majority of their compute sitting in-house in owned facilities that weren’t originally designed to serve as data centers. Moving these legacy systems is nearly impossible.

Rob: I see many companies feeling trapped by these facilities and looking to the cloud as an alternative.  You are describing a lot of inertia in that migration.  Is there something that can help improve mobility?

Haynes: Data centers are physical presences to hold virtual environments. The physical aspect can only be optimized when a company truly understands its virtual footprint. IT capacity planning is key to this. System monitoring and usage analytics are critical to make growth and consolidation decisions. Why isn’t this being adopted more quickly? Is it cost? Is it difficulty to implement in complex IT environments? Is it the fear of the unknown?

Rob: I think that it’s technical debt that makes it hard (and scary) to change.  These systems were built manually or assuming that IT could maintain complete control.  That’s really not how cloud-focused operations work.  Is there a middle step between full cloud and legacy?

Haynes: Creating an environment where a company maximizes the use for its owned assets (leveraging sale leasebacks and forward-thinking financing) vs. waiting until end of life and attempting to dispose leads to opportunities to get capital injections early on and move to an OPEX model. This makes the transition to colo much easier, and avoids a large write-down that comes along with most IT transformations. Colocation is an excellent tool if it is properly negotiated because it can provide a flexible environment that can grow or shrink based on your utilization of other services. Sophisticated colo users know when it makes sense to pay top dollar for an environment that requires hyperconnectivity and when to save money for storage and day-to-day compute. They know when to leverage providers for services and when to manage IT tasks in-house. It is a daunting process, but the initial approach is key to getting to that place in the long term.

Rob:  So I’m back to thinking that the challenge for accessing all these colo opportunities is that it’s still way too hard to move operations between facilities and also between facilities and the cloud.  Until we improve mobility, choosing a provider can be a high stakes decision.  What factors do you recommend reviewing?

Haynes: There is an overwhelming number of factors in picking new colos:

  1. Location
  2. Connectivity/Latency
  3. Cloud Connectivity Options
  4. Pricing
  5. Quality of Services
  6. Security
  7. Hazard Risk Mitigation
  8. Comfort with services/provider
  9. Growth potential
  10. Flexibility of spend/portability (this is becoming ever-more important)

Rob: Yikes!  Are there minor operational differences between colos that are causing breaking changes in operations?

Haynes:  We run into this with our clients occasionally, but it is usually because they created two very different environments with different providers. This is a big reason to use a broker. Creating identical terms, pricing models, SLAs and work flows allow for clients to have a lot of leverage when they go to market. A select few of the top cloud providers do a really good job of this. They dominate the markets that they enter because they have a consistent, reliable process that is replicated globally. They also achieve some of the most attractive pricing and terms in the marketplace on a regular basis.

pexels-photo-119661.jpegRob: That makes sense.  Process matters for the operators and consistent practices make it easier to work with a partner.  Even so, moving can save a lot of money.  Is that savings justified against the risk and interruption?

Haynes: This is the biggest hurdle that our enterprise clients face. The risk of moving is risking an IT leader’s job. How do we do this with minimal risk and maximum upside? Long-term strategic planning is one answer, but in today’s world, IT leadership changes often and strategies go along with that. We don’t have a silver bullet for this one – but are always looking to partner with IT leaders that want to give it a shot and hopefully save a lot of money.

Rob: So is migration practical?

Haynes: Migration makes our clients cringe, but the ones that really try to take it on and make it happen strategically (not once it is too late) regularly reap the benefits of saving their company money and making them heroes to the organization.

Rob: I guess that brings us back to mixing infrastructures.  I know that public clouds have interconnect with colos that make it possible to avoid picking a single vendor.  Are you seeing this too?

Haynes: Hybrid, hybrid, hybrid. No one is the best one-stop shop. We all love 7-11 and it provides a lot of great solutions on the run, but I’m not grocery shopping there. Same reason I don’t run into a Kroger every time I need a bottle of water. Pick the right solution for the right application and workload.

Rob: That makes sense to me, but I see something different in practice.  Teams are too busy keeping the lights on to take advantage of longer-term thinking.  They seem so busy fighting fires that it’s hard to improve.

Haynes:  I TOTALLY agree. I don’t know how to change this. I get it, though. The CEO says, “We need to be in the cloud, yesterday,” and the CIO jumps. Suddenly everyone’s strategic planning is out the window and it is off to the races to find a quick-fix. Like most things, time and planning often reap more productive results.

Thanks for sharing our discussion!  

We’d love to hear your opinions about it.  We both agree that creating multi-site management abstractions could make life easier on IT and relatable to real estate and finance. With all of these organizations working in sync the world would be a better place. The challenge is figuring out how to get there!

May 12 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

RobatOpenStack

OpenStack on Kubernetes: Will it blend? (OpenStack Summit Session) w/ Rob Hirschfeld

OpenStack and Kubernetes: Combining the Best of Both Worlds (OpenStack Summit Session) w/ Rob Hirschfeld

OpenStack Summit Boston Day 1 Notes by Rob Hirschfeld
https://robhirschfeld.com/2017/05/09/openstack-boston-day-1-notes/

Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday.

In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community. READ MORE

_______
Immutable Infrastructure Webinar

Attendees:

  • Greg Althaus, Co-Founder and CTO, RackN
  • Erica Windisch, Founder and CEO, Piston 
  • Christopher MacGown, Advisor, IOpipe
  • Riyaz Faizullabhoy,  Security Engineer, Docker
  • Sheng Liang, Founder and CEO Rancher Labs
  • Moderated by Stephen Spector, HPE, Cloud Evangelist

_______
SREies Part1: Configuration Management by Krishelle Hardson-Hurley

SREies is a series on topics related to my job as a Site Reliability Engineer (SRE). About a month ago, I wrote an article about what it means to be an SRE which included a compatibility quiz and resource list to those who were intrigued by the role. If you are unfamiliar with SRE, I would suggest starting there before moving on.

In this series, I will extend my description to include more specific summaries of concepts that I have learned during my first six months at Dropbox. In this edition, I will be discussing Configuration Management. READ MORE

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

Interop ITX : May 15 – 19, 2017 in Las Vegas, NV

Gluecon : May 24 – 25, 2017 in Denver, CO

  • Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)Issue #71

OpenStack Boston Day 1 Notes

Contrary to pundit expectations, OpenStack did not roll over and die during the keynotes yesterday.

20170508_093339

In my 2011 Boston Summit shirt.

In fact, I saw the signs of a maturing project seeing real use and adoption. More critically, OpenStack leadership started the event with an acknowledgement of being part of, not owning, the vibrant open infrastructure community.

Continued Growth in Core Areas

Practical reasons for running dedicated infrastructure (compliance, control and cost) make OpenStack relevant for companies and governments with significant budgets. There is also a healthy shared infrastructure (aka public cloud) market living in the shadow of the big 3 players. It’s still unclear how this ecosystem will make money for the vendors.

What do customers buy? Should the Core be free?

My personal experience is that most customers are reluctant to (but grudgingly do) buy distros for the core open technology. They are much more willing to pay for adjacencies like security, storage and networking.

Emerging Challenges from Adjacent Technologies

Containers and Kubernetes are making a significant impact on the OpenStack community. At points, the OpenStack keynote was more about Kubernetes than OpenStack. It’s also clear that customers want to use containers as an abstraction layer to make infrastructure less visible or locked-in. That opens the market for using servers directly (bare metal) or other clouds. That portability is likely to help OpenStack more than hurt it because customers can exit workloads from the Big 3 players.

Friction for adoption remains a critical hurdle.

Containers, which are cloud first platforms, have much less friction than IaaS platforms. IaaS platforms, even managed ones, require physical infrastructure with the matching complexity and investment.

OpenStack: an open infrastructure software community

Overall, the summit remains an amazing community space for open infrastructure software and cloud alternatives to the Big 3 players. The Foundation’s pivot to embrace Kubernetes and foster several other open technologies helps maintain the central enthusiasm for open source infrastructure that gave birth to the platform in the first place.

A healthy pragmatic vibe

The summit may not have the same heady taking-on-the-world feeling as the early days; instead, it has a healthy pragmatic vibe. Considering how frothy this space remains, that may be a welcome relief.

What are your impressions? I’m looking forward to hearing from you!

May 5 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

RackN Announcement
[PRESS RELEASE] RackN Ends DevOps Gridlock in Data Center  

Today we announced the availability of Digital Rebar Provision, the industry’s first cloud-native physical provisioning utility.  We’ve had this in the Digital Rebar community for a few weeks before offering support and response has been great! READ MORE
_
______

Cloud Native PHYSICAL PROVISIONING? Come on! Really?!
 By Rob Hirschfeld

Today, RackN announce very low entry level support for Digital Rebar Provisioning – the RESTful Cobbler PXE/DHCP replacement.  Having a company actually standing behind this core data center function with support is a big deal; however…

We’re making two BIG claims with Provision: breaking DevOps bottlenecks and cloud native physical provisioning.  We think both points are critical to SRE and Ops success because our current approaches are not keeping pace with developer productivity and hardware complexity. READ MORE

RackN @ DevOpsDays Austin

IMG_0810

Slides from Rob Hirschfeld’s talk – The Server Cage Match

SRE vs DevOps vs Cloud Native: The Server Cage Match by Rob Hirschfeld

I don’t believe in DevOps shaming. Our community seems compelled to correct use of DevOps as an adjective for tools, teams and teapots. The frustration is reasonable: DevOps clearly taps into head space for both devs and operators who see a brighter automated future together. For example, check out this excellent DevOps discourse by Cindy Sridharan.

As an industry, we crave artificial conflict so it’s natural to try and distill site reliability engineering (SRE), DevOps and cloud native into warring factions when they are not. They all share a focus on Lean process. READ MORE

SRE News

What is DevOps? By Cindy Sridharan @copyconstruct  
https://medium.com/@cindysridharan/what-is-devops-5b0181fdb953

It happened again this week.

At this Wednesday’s Prometheus meetup I was hosting, I asked one of the attendees what he did for work.

He looked at me briefly before he barked one word in reply — DevOps — and then promptly made a beeline for the pizza at the back of the room. READ MORE
________

An Influx of Kubernetes Installers Raises Questions Around Conformance
https://thenewstack.io/kubernetes-installer-explosion-natural-enthusiasm/

For the Kubecon Europe last month, industry observer Joseph Jacks pulled together a list of over SIXTY (yes, 60) Kubernetes installers and services. This wealth of variation that made itself known as the conference, happily, kicked off a conformance effort to ensure that users get a consistent experience. I’m a strong believer that clear conformance builds ecosystems and have deep experience working on that from my OpenStack DefCore efforts.

In short, conformance is not a vendor issue: it’s a user experience and ecosystem issue.  READ MORE

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OpenStack Summit : May 8 – 11, 2017 in Boston, MA  

  • OpenStack and Kubernetes. Combining the best of both worlds – Kubernetes Day

Interop ITX : May 15 – 19, 2017 in Las Vegas, NV during    Open Source IT Summit – Tuesday, May 16, 9:00 – 5:00pm  

  • 3:15 – 4:05pm OpenStack and Kubernetes
  • 4:05 – 5:00pm Kubernetes for All

Gluecon : May 24 – 25, 2017 in Denver, CO

  • Surviving Day 2 in Open Source Hybrid Automation – May 23, 2017 : Rob Hirschfeld and Greg Althaus

OTHER NEWSLETTERS

SRE Weekly (@SREWeekly)Issue #70