Open Source, Operators, and DevOps Come Together for Data Center Automation

Running data centers is a complex challenge as the typical environment consists of multiple hardware platforms, operating systems, and processes to manage. Operators face daily “fire drills” to keep the machines running while simultaneously trying to expand service offerings and learn new technologies. The adoption of virtualization and cloud has not simplified anything for IT teams and has only made their job more complicated.

Our founders have years of experience working on deploying and operating large, complex data center environments and clouds. They are also well versed in the open source community space and see the merger of community with operations leading to a better way forward for data center management.

We are building an operators community sharing best practices and code to reuse across work sites to fully automate data centers. Working together operators can solve operational challenges for not just their infrastructure, but also find common patterns to leverage across a broad set of architectures.

Community is a powerful force in the software industry and there is no reason why those concepts cannot be leveraged by operators and DevOps teams to completely change the ROI of running a data center. RackN is founded on this belief that working together we can transform data center management via automation and physical ops.

Join us today to help build the future of data center automation and provisioning technology.

RackN talks Cloud Native Landscape on Rishidot.TV

Rob Hirschfeld speaks on Rishidot.TV  as part of the Cloud Native Landscape video interview series. Questions asked:

  • Background on RackN
  • Cloud Native Ecosystem Fit – embracing DevOps and Site Reliability Engineering
    • Running “Cloud” in their existing data centers
  •  Differentiation – Build on open source Digital Rebar replacing Cobbler, Maas, and other provisioning tools
    • API driven, Infrastructure as Code feel
  • Use Cases –  Immutable Infrastructure & API driven design
    • Image-based Deployments direct to Metal
    • CI/CD infrastructure, zero-touch automation

 

Migration Best Practices from Cobbler to Digital Rebar Provision

In this video, Rob Hirschfeld and Greg Althaus provide operators real-world examples of how best to migrate your provisioning platform to Digital Rebar Provision. This blog highlights one of these migration ideas.

Scenario

  • 10 Servers running in multiple subnets
  • DHCP Server
  • Cobbler Provisioning Tool

Migration Process

  • Setup Digital Rebar Provision (DRP) in the Network
    • Create a new subnet with DHCP server installed
    • Operate the DHCP in reservation mode
  • Run DRP to discover the entire network across subnets without DHCP access
    • Create a mapping of infrastructure including MAC address to IP address
  • Migrate DRP control server by server
    • Turn off old DHCP server control for a specific MAC address and turn it on for new DHCP server
    • Reboot the specific MAC address node and DRP will manage the provisioning for that specific server
    • Confirm reset server and continue to manage the changeover server by server
  • Other Options
    • Continue to manage Cobbler for existing infrastructure and use DRP for all new nodes
    • Split provisioning services based on application being deployed

Watch the full video below to hear other scenarios presented for migration options.

Video Participants:

Rob Hirschfeld, Co-Founder/ CEO, RackN   Twitter: @zehicle
Greg Althaus, Co-Founder / CTO, RackN      Twitter: @galthaus

Get started with Digital Rebar today:

Digital Rebar Provision: Community Content Demos

Shane Gibson, Community Evangelist takes the viewer on a tour of the Digital Rebar Provision tool running all the freely open and available community content packages. The tour consists of both CLI and Web UI options allowing the user to select a platform they are most comfortable with.

 

Video Activities Start Time (Minutes.Seconds)
Introduction / Setup 0.43
Login to Existing Node 1.30
Install DR Provision from Tip 1.48
Start Server / Load Community ISOs 3.00
What Community Content is 4.00
“Contents Show” 4.23
What Bootenvs in a Content Pack? 6.12
Ubuntu Distribution Components 8.25
Templates in Ubuntu 11.22
Templates in Detail (CLI) 12.30
Template in Details (WebUI) 14.12
Clone a Read-Only Template (WebUI) 15.35
Content Page on WebUI 16.25
Root Access Keys (WebUI) 17.54
Root Access Keys (CLI) 18.50
Edit your own Content Pack 19.22
Set the Preferences to use Bootenvs 20.35
Adding Subnet 21.06
Packet Plugin (for Packet.net) 22.25
DRP Data Directory 23.50
TFTP Boot Directory 24.34
Swagger UI API 25.15

Additional Information:

Immutable Deployment Challenges for DevOps

Last week, Gareth Rushgrove (@garethr) and I hashed out our viewpoints on the intersection of DevOps and Immutable Infrastructure.  We recorded the call because we want to expand the discussion to include a broader audience and we’d love to hear your opinions!

The gist of the call is that DevOps processes are moving faster and faster as teams embrace the create-destroy-repeat pattern of cloud automation.  This pattern favors immutable images driven by cloudinit style bootstrapping.  This changes our configuration management practice because configuration is front loaded.  It also means that we destroy rather than patch.

We both felt that this immutable pattern will become dominate overtime.

However, there was significant nuance in our position about this change and the challenges that it will pose to operators.  If you care about how immutable infrastructure is going to impact your DevOps plans then you’ll enjoy listening to our short discussion.

If you’re still hungry for the how’s and why’s of Immutable infrastructure, I suggest listening to the excellent panel discussion RackN hosted last May.

July 28 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

This week, we launched our new RackN website to provide more information on our solutions and services as well as provide customer examples. Click over to our new site and let us know your thoughts.

SRE Items of the Week

Site Reliability Engineer: Don’t fall victim to the bias blind spot
http://sdtimes.com/site-reliability-engineer-dont-fall-victim-to-the-bias-blind-spot/

To ensure websites and applications deliver consistently excellent speed and availability, some organizations are adopting Google’s Site Reliability Engineering (SRE) model. In this model, a Site Reliability Engineer (SRE) – usually someone with both development and IT Ops experience – institutes clear-cut metrics to determine when a website or application is production-ready from a user performance perspective. This helps reduce friction that often exists between the “dev” and “ops” sides of organizations. More specifically, metrics can eliminate the conflict between developers’ desire to “Ship it!” and operations desire to not be paged when they are on-call. If performance thresholds aren’t met, releases cannot move forward. READ MORE

Episode 50 – SRE Revisited plus the Challenge of Ops and more with Rob Hirschfeld
http://podcast.discoposse.com/e/ep-50-sre-revisited-plus-the-challenges-of-ops-and-more-with-rob-hirschfeld-zehicle/

This fun chat expands on what we started talking about in episode 42 (http://podcast.discoposse.com/e/ep-42-spiraling-ops-debt-sre-solutions-and-rackn-chat-with-rob-hirschfeld-zehicle/) as we dive into the challenges and potential solutions for thinking and acting with the SRE approach. Big thansk to Rob Hirschfeld from @RackN for sharing his thoughts and experiences from the field on this very exciting subject. LISTEN HERE

Site Reliability Engineering – Operators and Developers Working Together
http://bit.ly/2u7eSmm 

Rob Hirschfeld, Co-Founder and CEO of RackN provides his thoughts on how operators are equivalent to developers and work together to accomplish the critical task of keep the infrastructure running and available with constant changes in the data center

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

July 14 – Weekly Recap of All Things Site Reliability Engineering (SRE)

Welcome to the weekly post of the RackN blog recap of all things SRE. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

SRE Items of the Week

Teradata Acquires San Diego-based Start-up StackIQ to Strengthen Teradata Everywhere and IntelliCloud Capabilities
http://prn.to/2vicpUb

SAN DIEGO, July 13, 2017 /PRNewswire/ — Teradata (NYSE:  TDC), the leading data and analytics company, today announced the acquisition of StackIQ, developers of one of the industry’s fastest bare metal software provisioning platforms which has managed the deployment of cloud and analytics software at millions of servers in data centers around the globe. The deal will leverage StackIQ’s expertise in open source software and large cluster provisioning to simplify and automate the deployment of Teradata Everywhere. Offering customers the speed and flexibility to deploy Teradata solutions across hybrid cloud environments, allows them to innovate quickly and build new analytical applications for their business.

How Platforms and SREs Change the DevOps Contract on  CapitalOne DevExchange
http://bit.ly/2uVXekf

capitalone
DevOps struggles under a “fully shared responsibility” contract for Developers and Operations that drives a futile search for elusive “full-stack engineers.” It’s time to revisit how to Dev and Ops are going to collaborate because these jobs often have different priorities.
READ MORE

RackN Introduction Video
Rob Hirschfeld, CEO and Co-Founder introduces RackN in 48 seconds

Kubernauts Worldwide Meetup
This video is from our first Kubernauts Worldwide Meetup covering the new features in Kubernetes 1.7 presented by Ihor Dvoretskyi, Kubernetes Pain Points and Upgrade presented by Rob Hirschfeld and about Kubernauts Training presented by Des Drury. Arash Kaffamanesh moderated the online meetup and provided a short overview about what Kubernauts are about.

Rob starts at 38 minute 50 seconds

Video Series w/ Packet.net
Three videos showing how to use Packet.net custom IPXE option with Digital Rebar IPXE provisioning

http://bit.ly/2t54J65      (Video 1 of 3)
http://bit.ly/2tO5WCy   (Video 2 of 3)
http://bit.ly/2vi5dXZ     (Video 3 of 3)

Let’s DevOps IRL: My SRE Postings on RackN by Rob Hirschfeld
http://bit.ly/2tzCvnj  

I’m investing in these Site Reliability Engineering (SRE) discussions because I believe operations (and by extension DevOps) is facing a significant challenge in keeping up with development tooling.   The links below have been getting a lot of interest on twitter and driving some good discussion. READ MORE

newsletter

Subscribe to our new daily DevOps, SRE, & Operations Newsletter https://paper.li/e-1498071701#/
_____________

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com.

OTHER NEWSLETTERS

 

The Case for Ops Engineering Pay Equity w/ Charity Majors

TL;DR: Operators need pay/status equity to succeed.

Charity Majors is one of my DevOps and SRE heroes* so it was great fun to be able to debate SRE with her at Gluecon this spring.  Encouraged by Mike Maney to retell the story, we got to recapture our disagreement about “Is SRE is Good Term?” from the evening before.

While it’s hard to fully recapture with adult beverages, we were able to recreate the key points.

First, we both strongly agree that we need status and pay equity for operators.  That part of the SRE message is essential regardless of the name of the department.

Then it get’s more nuanced. Charity, whose more of a Silicon Valley insider, believes that SRE is tainted by the “Google for Everyone” cargo cult.  She has trouble separating the term SRE from the specific Google practices that helped define it.  

As someone who simply commutes to Silicon Valley, I do not see that bias in the discussions I’ve been having.  I do agree that companies that try to simply copy Google (or other unicorns) in every way is a failure pattern.

Charity: “I don’t want get paid to keep someone else’s shit site alive”

I think Google did a good job with the book by defining the term for a broad audience. Charity believes this signals that SRE means you are working for a big org.  Charity suggested several better alternatives, Operations Engineer.  At the end, the danger seems to be when Dev and Ops create silos instead of collaborating.

Consensus: Job Title?  Who cares.  The need to to make operations more respected and equal.

What did you think of the video?  How is your team defining Operations titles and teams?

(*) yes, I’m working on an actual list – stay tuned.

OpenCrowbar 2.3 (Drill) Overview Videos

Last week, Scott Jensen, RackN COO, uploaded a batch of OpenCrowbar install and demo videos.  I’ve presented them in reverse chronological order so you can see what OpenCrowbar looks like before you run the installation process.

But…If you want to start downloading while you watch, here are the docs.

Please reach out on chat, email or irc (Freenode #crowbar) channels during your install and let us know how it’s going!

OpenCrowbar Basics & Provisioning (recommended start)

OpenCrowbar Install

OpenCrowbar Setup the Environment (install prep)

@NextCast chat about DefCore, Metal Ops and OpenStack evolution

In Vancouver, I sat down with Scott Sanchez (EMC) and Jeff Dickey (Redapt) for a NextCast discussion.   We covered a lot of my favorite subjects including DefCore and Ready State bare metal operations.

One of the things I liked about this discussion was that we were able to pull together the seemly disparate threads that I’m work on around OpenStack.