Building Kubernetes based highly customizable environments on OpenStack with Kubespray

This talk was given on November 8 at the OpenStack Summit Sydney event.

Abstract

Kubespray (formerly Kargo) – is a project under Kubernetes community umbrella. From the technical side, it is a set of tools, that bring the possibility to deploy production-ready Kubernetes cluster easily.

Kubespray supports multiple Linux distributions to host the Kubernetes clusters (including Ubuntu, Debian, CentOS/RHEL and Container Linux by CoreOS), multiple cloud providers to be used as an underlay for the cluster deployment (AWS, DigitalOcean, GCE, Azure and OpenStack), together with the ability to use Bare Metal installations. It may consume Docker and rkt as the container runtimes for the containerized workloads, as well as a wide variety of networking plugins (Flannel, Weave, Calico and Canal); or built-in cloud provider networking instead.

In this talk we will describe the options of using Kubespray for building Kubernetes environments on OpenStack and how can you benefit from it.

What can I expect to learn?

Active Kubernetes community members, Ihor Dvoretskyi and Rob Hirschfeld, will highlight the benefits of running Kubernetes on top of OpenStack, and will describe how Kubespray may simplify the cluster building and management options for these use-cases.

Complete presentation

Slides
https://www.slideshare.net/RackN/slideshelf

Speakers

Ihor Dvoretskyi

Ihor is a Developer Advocate at Cloud Native Computing Foundation (CNCF), focused on the upstream Kubernetes-related efforts. He acts as a Product Manager at Kubernetes community, leading Product Management Special Interest Group with the goals of growing Kubernetes as a #1 open source container orchestration platform.

Rob Hirschfeld

Rob Hirschfeld has been involved in OpenStack since the earliest days with a focus on ops and building the infrastructure that powers cloud and storage.  He’s also co-Chair of the Kubernetes Cluster Ops SIG and a four term OpenStack board member.

 

Breaking the Silicon Floor – Digital Rebar v3.2 unlocks full life-cycle control for hardware provisioning

The difficulty in fully automating physical infrastructure environments, especially for distributed edge, adds significant cost, complexity and delay when building IT infrastructure. We’ve called this “underlay” or “ready state” in the past but “last mile” may be just as apt. The simple fact is that underlay is the foundation for everything you build above it so mistakes there are amplified.

Historically, simple systems still required manual or custom steps while complex systems where fragile and hard to learn. This dichotomy drives operators to add a cloud abstraction layer as a compromise because the cloud adds simple provisioning APIs at the prices of hidden operational complexity.

What if we had those simple APIs directly against the metal? Without the operational complexity?

That’s exactly what we’ve achieved in the latest Digital Rebar release. In this release, the RackN team refined the Digital Rebar control flows introduced in v3.1 based on customer and field experience. These flow are simple to understand, composable to build and amazingly fast in execution.

For example, you can build workflows that handle discovering machines with burn-in and inventory stages that install ssh keys that automatically register themselves for Terraform consumption. Our Terraform provider can then take those machines and make new workflow requests like “install CentOS” and tell me when it’s ready. When you’re finished, another workflow will teardown the system and scrub the data. That’s very cloud like behavior but directly on metal.

These workflows are designed to drive automatic behavior (like joining a Kubernetes cluster), simplify API requests (like target state for Terraform), or prepare environments for orchestration (like dynamic inventory for Ansible). They reflect our design goal to ensure that Digital Rebar integrates upstack easily.

Our point with Digital Rebar is to drive full automation down into the physical layer. By fixing the underlay, our approach accelerates and simplifies orchestration and platform layers above. We’re excited about the progress and invite you take 5 minutes to try our quick start.

Follow the Digital Rebar Community:

Deep Thinking & Tech + Great Guests – L8ist Sh9y Podcast

I love great conversations about technology – especially ones where the answer is not very neatly settled into winners and losers (which is ALL of them in IT).  I’m excited that RackN has (re)launched the L8ist Sh9y (aka Latest Shiny) podcast around this exact theme.

Please check out the deep and thoughtful discussion I just had with Mark Thiele (notes) of Apcera where we covered Mark’s thought on why public cloud will be under 20% of IT and culture issues head on.

Spoiler: we have David Linthicum coming next, SO SUBSCRIBE.

I’ve been a guest on some great podcasts (CloudcastgcOnDemandDatanautsIBM DojoHPEFoodfight) and have deep respect for critical work they do in industry.

We feel there’s still room for deep discussions specifically around automated IT Operations in cloud, data center and edge; consequently, we’re branching out to start including deep interviews in addition to our initial stable of IT Ops deep technical topics like TerraformEdge ComputingGartnerSYM review, Kubernetes and, of course, our own Digital Rebar.

Soundcloud Subscription Information

Digital Rebar v3.1 Release Annoucement

We’ve made open network provisioning radically simpler.  So simple, you can install in 5 minutes and be provisioning in under 30.  That’s a bold claim, but it’s also an essential deliverable for us to bridge the Ops execution gap in a way that does not disrupt your existing tool chains.

We’ve got a remarkable list of feature additions between Digital Rebar Provision (DRP) v3.0 and v3.1 that take it from basic provision into a powerful distributed infrastructure automation tool.

But first, we need to put v3.1 into a broader perspective: the new features are built from hard learned DevOps lessons.  The v2 combination of integrated provisioning and orchestration meant we needed a lot of overhead like Docker, Compose, PostgreSQL, Consul and RAILS.  That was needed for complex “one-click” cluster builds; however it’s overkill for users of Ansible, Terraform and immutable infrastructure flows.  

The v3 mantra is about starting simple and allowing users to grow automation incrementally.  RackN has been building advanced automation packages and powerful UX management to support that mission.

So what’s in the release?  The v3.0 release focused on getting core Provision infrastructure APIs, process and patterns working as a stand alone service. The v3.1 release targeted major architectural needs to streamline content management, event notification and add out-of-band actions.  

Key v3.1 Features

  • New Mascot and Logo!  We have a cloud native bare metal bear.  DRP fans should ask about stickers and t-shirts. Name coming soon! 
  • Layered Storage System. DRP storage model allows for layered storage tiers to support the content model and a read only base layer. These features allow operators to distribute content in a number of different ways and make field upgrades and multi-site synchronization possible.
  • Content packaging system.  DRP contents API allows operators to manage packages of other models via a single API call.  Content bundles are read-only and versioned so that field upgrades and patches can be distributed.
  • Plug-in system.  DRP allows API extensions and event listeners that are in the same process space as the DRP server.  This enables IPMI extensions and slack notifiers.
  • Stages, Tasks & Jobs.  DRP has a simple work queue system in which tasks are stored and tracked on machines during stages in their boot sequences.  This feature combines server and DRP client actions to create fast, simple and flexible workflows that don’t require agents or SSH access.
  • Websocket API for event subscription.  DRP clients can subscribe to system events using a long term websocket interface.  Subscriptions include filters so that operators can select very narrow notification scopes.
  • Removal of the minimal embedded UI (moving to community hosted UX).   DRP decoupled the user interface from the service API.  This allows features to be added to the UX without having to replace the Service.  This also allows community members to create their own UX.  RackN has agreed to support community users at no cost on a limited version of our commercial UX.

All of these features enable DRP to perform 100% of the hardware provision workflows that our customers need to run a fully autonomous, CI/CD enabled data center.  RackN has been showing examples of Ansible, Kubernetes, and Terraform to Metal integration as a reference implementations.

Getting the physical layer right is critical to closing your infrastructure execution gaps.  DRP v3.1 goes beyond getting it right – it makes it fast, simple and open.  Take a test drive of the open source code or give RackN a call to see our advanced automation demos.

Edge Infrastructure is Not Just Thousands of Mini Clouds

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale.  We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused.  The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones.  These workloads are not moving into centralized data centers.  In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction.  For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here.  First, any scale infrastructure problem must be solved at the physical layer first.  Second, we must have tooling that brings repeatable, automation processes to that layer.  It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth.  These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story.  I was surprised that they were present only in a minor way at the summit.  The portability and light footprint of these platforms make them a natural fit for edge infrastructure.  I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management.  In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized.  Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers.  This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational.  How should we be solving them?  In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud

September 8 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Nora Jones on Establishing, Growing, and Maturing a Chaos Engineering Practice
https://www.infoq.com/podcasts/nora-jones-chaos-engineering

Nora Jones, a senior software engineer on Netflix’ Chaos Team, talks with Wesley Reisz about what Chaos Engineering means today. She covers what it takes to build a practice, how to establish a strategy, defines cost of impact, and covers key technical considerations when leveraging chaos engineering. Read more and listen to podcast

SRE Jobs

I ran a job search on LinkedIn to find the # of available SRE positions currently open; there are 854 positions available as of this morning. Dice.com listed 30,665 positions based on a search. In comparison, DevOps only had 2,975 positions on Dice.com.

Podcast on Ansible, Kubernetes, Kubespray and Digital Rebar

Stephen Spector, HPE Cloud Evangelist talks with Rob Hirschfeld, Co-Founder and CEO RackN about the installation process for Kubernetes using Kubespray, Ansible, and Digital Rebar Provisioning. Additional commentary on the overviews of Kubernetes, Containers, and Installation in this podcast.

_____________

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

Podcast – Install Kubernetes with Ansible, Kubespray and Digital Rebar Provision

Stephen Spector, HPE Cloud Evangelist talks with Rob Hirschfeld, Co-Founder and CEO RackN about the installation process for Kubernetes using Kubespray, Ansible, and Digital Rebar Provisioning. Additional commentary on the overviews of Kubernetes, Containers, and Installation in this podcast.

More info on Digital Rebar Provisioning

Follow the RackN L8ist Sh9y Podcast

 

August 18 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Beyond Google SRE: What is Site Reliability Engineering like at Medium?
https://blog.netsil.com/beyond-google-sre-what-is-site-reliability-engineering-like-at-medium-71c65bd35f4e


We had the opportunity to sit down with Nathaniel Felsen, DevOps Engineer at Medium and the author of “Effective DevOps with AWS”. We are happy to share some practical insights from Nathaniel’s extensive experience as a seasoned DevOps and SRE practitioner.

While we hear a lot about these experiences from Google, Netflix, etc., we wanted to gather perspectives on DevOps and SRE life with other easily relatable companies. From tech-stack challenges to organization structure, Nathaniel provides a wide range of practical insights that we hope will be valuable in improving DevOps practices at your organization. READ MORE

GitHub seeks to spur innovation with Kubernetes migration
http://www.zdnet.com/article/github-seeks-to-spur-innovation-with-kubernetes-migration/

GitHub on Wednesday is sharing the details of the massive technical endeavor its engineers went through to migrate the infrastructure that powers github.com and api.github.com — some of its most critical workloads — from a set of manually-configured physical servers to Kubernetes clusters that run application containers.

GitHub is confident the move will allow for faster innovation on the online code sharing and development platform. READ MORE

SRE Thinking: Reframing Dev + Ops
http://bit.ly/2w2I53F

Last month, Eric Wright and I were able to complete a discussion the inspired my guest post for CapitalOne “How Platforms and SREs Change the DevOps Contract.” While our conversation ranged widely over the challenges of building and integration of IT processes, the key message is simple: we need to make investments in operations. READ MORE

Coal or Diamonds? Configuration Management is Under Pressure
http://bit.ly/2uTvADN

Cloud Native thinking is thankfully changing the way we approach traditional IT infrastructure.  These profound changes in how we build applications with 12-factor design and containers has deep implications on how we manage configuration and the tools we use to do it.  These are not cloud only impacts – the changes impact every corner of IT data centers. READ MORE

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/

_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

July 14 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Teradata Acquires San Diego-based Start-up StackIQ to Strengthen Teradata Everywhere and IntelliCloud Capabilities
http://prn.to/2vicpUb

SAN DIEGO, July 13, 2017 /PRNewswire/ — Teradata (NYSE:  TDC), the leading data and analytics company, today announced the acquisition of StackIQ, developers of one of the industry’s fastest bare metal software provisioning platforms which has managed the deployment of cloud and analytics software at millions of servers in data centers around the globe. The deal will leverage StackIQ’s expertise in open source software and large cluster provisioning to simplify and automate the deployment of Teradata Everywhere. Offering customers the speed and flexibility to deploy Teradata solutions across hybrid cloud environments, allows them to innovate quickly and build new analytical applications for their business.

How Platforms and SREs Change the DevOps Contract on  CapitalOne DevExchange
http://bit.ly/2uVXekf

capitalone
DevOps struggles under a “fully shared responsibility” contract for Developers and Operations that drives a futile search for elusive “full-stack engineers.” It’s time to revisit how to Dev and Ops are going to collaborate because these jobs often have different priorities.
READ MORE

RackN Introduction Video
Rob Hirschfeld, CEO and Co-Founder introduces RackN in 48 seconds

Kubernauts Worldwide Meetup
This video is from our first Kubernauts Worldwide Meetup covering the new features in Kubernetes 1.7 presented by Ihor Dvoretskyi, Kubernetes Pain Points and Upgrade presented by Rob Hirschfeld and about Kubernauts Training presented by Des Drury. Arash Kaffamanesh moderated the online meetup and provided a short overview about what Kubernauts are about.

Rob starts at 38 minute 50 seconds

Video Series w/ Packet.net
Three videos showing how to use Packet.net custom IPXE option with Digital Rebar IPXE provisioning

http://bit.ly/2t54J65      (Video 1 of 3)
http://bit.ly/2tO5WCy   (Video 2 of 3)
http://bit.ly/2vi5dXZ     (Video 3 of 3)

Let’s DevOps IRL: My SRE Postings on RackN by Rob Hirschfeld
http://bit.ly/2tzCvnj  

I’m investing in these Site Reliability Engineering (SRE) discussions because I believe operations (and by extension DevOps) is facing a significant challenge in keeping up with development tooling.   The links below have been getting a lot of interest on twitter and driving some good discussion. READ MORE

newsletter

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

 

June 30 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rack ngo)

SRE Items of the Week

Site Reliability Engineering at Dropbox with Tammy Butow @tammybutow

The mess and success of building open leadership (notes from Kubernetes Leadership Summit)
http://bit.ly/2tMTzEy

Three weeks ago, Kubernetes leaders met for a very busy day to reflect and plan how the community was being growing.  I was humbled to be part of the Kubernetes Leadership Summit due to my work as the Cluster Ops SIG co-chair. READ MORE

Ops integration will be scary, proceed with haste
http://bit.ly/2u2Wfhq

As CEO of RackN, I talk to a lot of operations teams who have big aspirations for automation that are faltering due to internal resistance.  Generally, we’re talking to the SREs on the team.  Sadly, those SREs are often stymied by narrowly scoped teams and house-of-cards technical debt. READ MORE

The Case for Ops Engineering Pay Equity with Charity Majors
http://bit.ly/2tZBjYD

Charity Majors is one of my DevOps and SRE heroes* so it was great fun to be able to debate SRE with her at Gluecon this spring.  Encouraged by Mike Maney to retell the story, we got to recapture our disagreement about “Is SRE is Good Term?” from the evening before. READ MORE

Datanauts #89 Dives Deep on SRE Approach and Urgency
http://bit.ly/2tqmbGl

In Datanauts 089, Chris Wahl and Ethan Banks help me break down the concepts from my “DevOps vs SRE vs Cloud Native” presentation from DevOpsDays Austin last spring. They do a great job exploring the tough topics and concepts from the presentation.  It’s almost like an extended Q&A so you may want to review the slides or recording before diving into the podcast.

Here are my notes from the podcast READ MORE

5 Laws every aspiring Devops engineer should know by @ChrisShort
https://opensource.com/open-organization/17/5/5-devops-laws

“A good engineer is a lazy engineer,” some will say. And to a certain extent, it’s true: Laziness is a great quality if you’re automating repetitive tasks. But laziness flies in the face of learning new technologies and getting new work done. Somewhere between Junior Systems Administrator and Senior DevOps Engineer, laziness no longer becomes an advantage.

Let’s discuss the five laws aspiring DevOps engineers should follow if they want to become great DevOps engineers. READ MORE
___________

newsletter
Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

  • 2017 New York Venture Summit – LINK

OTHER NEWSLETTERS