Migration Best Practices from Cobbler to Digital Rebar Provision

In this video, Rob Hirschfeld and Greg Althaus provide operators real-world examples of how best to migrate your provisioning platform to Digital Rebar Provision. This blog highlights one of these migration ideas.

Scenario

  • 10 Servers running in multiple subnets
  • DHCP Server
  • Cobbler Provisioning Tool

Migration Process

  • Setup Digital Rebar Provision (DRP) in the Network
    • Create a new subnet with DHCP server installed
    • Operate the DHCP in reservation mode
  • Run DRP to discover the entire network across subnets without DHCP access
    • Create a mapping of infrastructure including MAC address to IP address
  • Migrate DRP control server by server
    • Turn off old DHCP server control for a specific MAC address and turn it on for new DHCP server
    • Reboot the specific MAC address node and DRP will manage the provisioning for that specific server
    • Confirm reset server and continue to manage the changeover server by server
  • Other Options
    • Continue to manage Cobbler for existing infrastructure and use DRP for all new nodes
    • Split provisioning services based on application being deployed

Watch the full video below to hear other scenarios presented for migration options.

Video Participants:

Rob Hirschfeld, Co-Founder/ CEO, RackN   Twitter: @zehicle
Greg Althaus, Co-Founder / CTO, RackN      Twitter: @galthaus

Get started with Digital Rebar today:

Don’t Fear the Reboot – Safe Patterns for Automating Metal

Author: Greg Althaus, CTO/Co-Founder RackN

Over the past few years we spent time with a wide variety of IT organizations to better understand the challenges they face deploying and enabling solutions. Two key themes emerged from these conversations:

  1. Zero-Touch (or as close as possible) Infrastructure
  2. Manual Inventory and Processes Don’t Scale and are Error Prone

With these two fundamental concepts in mind, we developed a highly targeted solution built on technology from the Digital Rebar open source community:

  • Digital Rebar Provision(DRP) provides a light-weight and easy to deploy API driven system to drive machines though a complete life-cycle.
  • DRP is designed around the concept of composition; the ability to build units of function that can be added to workflows so that infrastructure can be built and rebuilt consistently with fast error paths to discover problems.
  • RackN and the community offer content packages built for DRP to meet the needs of operators.

DRP operation follows a workflow pattern for machines from discovery to provisioned to decommission. This workflow approach allows operators to stage infrastructure provisioning with checks at critical stages of the process. There are 5 common workflows built into the tool; however additional workflows can be created and customized:

  •      Workflow 1 : DISCOVERY
  •      Workflow 2 : INSTALL
  •      Workflow 3 : DECOMMISSION
  •      Workflow 4 : MAINTENANCE
  •      Workflow 5 : RESTART

Over the next few weeks, I will be posting detailed blogs about each workflow stage providing insight into why this architecture was chosen and the benefits to operators. If you are interested in learning more about the Digital Rebar community or getting started with our RackN technology built on Digital Rebar I encourage you to click below:

Virtual Toilet Backing Up? Internet Plumbers get the dirty jobs

The latest mantra in IT is to cleanly abstract away everything including hardware, software, management, processes, etc. Take “serverless” for example – there are still servers involved but much more hidden than before.  This abstraction obsession is rapidly changing the way that applications and services are developed and delivered.

However, the underlying abstractions hide, not remove infrastructure; it is still there and, like plumbing, simply becomes someone else’s problem to deal with. At RackN, we are focused on solving these hidden plumbing problems at the physical infrastructure operations layer.

Working with physical hardware is viewed as messy and is not going to be a trending hashtag anytime soon. We are ok with that. In fact, we view ourselves as Internet Plumbers keeping the “pipes” open without any hesitancy of getting dirty.

Part of our mission is to standardize the processes in physical ops to provide site reliability engineers and DevOps teams with an automated, open, secure, scalable, and reliable solution. Our solution is built not only for today’s needs but also the coming Edge computing revolution whereby physical ops will move from hundreds of nodes to hundreds of thousands of endpoints.

We offer several methods to being immediately working with our technology:

  • Digital Rebar Provision– Our open source DHCP/PXE/IPXE service with community or corporate plug-ins for additional features
  • RackN Trial – Get access to our solution built on Digital Rebar Provision; contact RackN sales

Based on a prior Rob Hirschfeld Post Physical Ops = Plumbers of the Internet. Celebrating dirty IT jobs 8 bit style

Sirens of Open Infrastructure beacons to OpenStack Community

OpenStack is a real platform doing real work for real users.  So why does OpenStack have a reputation for not working?  It falls into the lack of core-focus paradox: being too much to too many undermines your ability to do something well.  In this case, we keep conflating the community and the code.

I have a long history with the project but have been pretty much outside of it (yay, Kubernetes!) for the last 18 months.  That perspective helps me feel like I’m getting closer to the answer after spending a few days with the community at the latest OpenStack Summit in Sydney Australia.  While I love to think about the why, the what the leaders are doing about it is very interesting too.

Fundamentally, OpenStack’s problem is that infrastructure automation is too hard and big to be solved within a single effort.  

It’s so big that any workable solution will fail for a sizable number of hopeful operators.  That does not keep people from the false aspiration that OpenStack code will perfectly fit their needs (especially if they are unwilling to trim their requirements).

But the problem is not inflated expectations for OpenStack VM IaaS code, it’s that we keep feeding them.  I have been a long time champion for a small core with a clear ecosystem boundary.  When OpenStack code claims support for other use cases, it invites disappointment and frustration.

So why is OpenStack foundation moving to expand its scope as an Open Infrastructure community with additional focus areas?  It’s simple: the community is asking them to do it.

Within the vast space of infrastructure automation, there are clusters of aligned interest.  These clusters are sufficiently narrow that they can collaborate on shared technologies and practices.  They also have an partial overlap (Venn) with adjacencies where OpenStack is already present.  There is a strong economic and social drive for members in these overlapped communities to bridge together instead of creating new disparate groups.  Having the OpenStack foundation organize these efforts is a natural and expected function.

The danger of this expansion comes from also carrying the expectation that the technology (code) will also be carried into the adjacencies.  That’s my my exact rationale the original VM IaaS needs to be smaller.  The wealth of non-core projects crosses clusters of interests.  Instead of allowing these clusters to optimize their needs around shared interests, the users get the impression that they must broadly adopt unneeded or poorly fit components.  The idea of “competitive” projects should be reframed because they may overlap in function but not ui use-case fit.

It’s long past time to give up expectations that OpenStack is a “one-stop-shop” of infrastructure automation.  In my opinion, it undermines the community mission by excluding adjacencies.

I believe that OpenStack must work to embrace its role as an open infrastructure community; however, it must also do the hard work to create welcoming space for adjacencies.  These adjacencies will compete with existing projects currently under the OpenStack code tent.  The community needs to embrace that the hard work done so far may simply be sunk cost for new use cases. 

It’s the OpenStack community and the experience, not the code, that creates long term value.

Podcast with Yves Boudreau talks Heterogeneity in the Edge

 

 

 

 

Joining this week’s L8ist Sh9y Podcast is Yves Boudreau, VP or Partnerships and Ecosystem Strategy at Ericsson. Rob Hirschfeld and Yves discuss the Ericsson Unified Delivery Network platform and the concept of a global content provider service built on heterogeneous infrastructure. Yves also provides insight into what webscale customers are looking for in the Edge as they give thought around balancing their applications from public cloud services to future edge clouds.  Finally, Rob and Yves talk about the coming fundamental change in how software is created and run “independent” of hardware.”  Yves can be contacted via LinkedIn.

Topic                                      Time (Minutes.Seconds)

Introduction                                                   0.00 – 2.11
Ericsson Unified Delivery Network            2.11 – 3.01
Service Providers Space                              3.01 – 4.05
Operator Customers                                    4.05 – 5.22
Content Provider want global coverage  5.22 – 7.15
Example                                                         7.15 – 8.34
Edge Infrastructure w/ CDN                      8.34 – 9.42
Distributed Heterogeneous Infra               9.42 – 11.30
Baking Cloud Consumption into Edge    11.30 –  11.56
Multi-Tenant Infra at Edge                         11.56 – 14.05
Delivery of the Edge                                   14.05 – 16.16
Amazon Lambda is Expectation              16.06 – 20.36
Containers are Edge EC2?                         20.36 – 25.18
Is Edge Greenfield Work?                          25.18 – 29.12
Fundamental Software Change                29.12 – 31.29
Locked-In “Debt” always Re-appears      31.29 – 35.28
Conclusion                                                    35.28 – END

Podcast Guest: Yves Boudreau

Mr. Boudreau is a 20 year veteran of the Digital, Telecom and Cable TV industries. From modest beginnings of one of the first cable broadband ISPs in Canada to the fast paced technology hub of Silicon Valley, Yves joined ERICSSON in 2011 as Vice President of Technical Sales Support and most recently has accepted a position as the VP of Partnerships and Ecosystem Strategy for the ERICSSON Unified Delivery Network. Previously, Mr. Boudreau has worked in R&D, Systems Engineering & Business Development for companies such as Com21 Inc., ARRIS Group (Cable), Imagine Communication (Video Compression) and Verivue Inc. (CDN). Yves now resides in Atlanta, Georgia with his wife Josée and 3 children. Mr. Boudreau completed his undergraduate studies in Commerce @ Laurentian University and graduate studies in Information Technology Management @ Athabasca University. Yves currently also serves on the Board of Director of the Streaming Video Alliance (www.streamingvideoalliance.org)

October 13 – Weekly Recap of All Things Digital Rebar and RackN

Welcome to the weekly post of the RackN blog recap of all things Digital Rebar, RackN, SRE, and DevOps. If you have any ideas for this recap or would like to include content please contact us at info@rackn.com or tweet Rob (@zehicle) or RackN (@rackngo)

Items of the Week

Digital Rebar

Digital Rebar Online Community Meetup #2

Community Content Video

Stay in Touch with the Community

RackN

Making Server Deployments 10x Faster – the ROI on Immutable Infrastructure

Rob Hirschfeld discusses the benefits of Immutable Infrastructure or Image-based provisioning around three concepts:

  • Simplicity
  • Repeatability
  • Speed

Read the post here

Fast, Simple, and Open: 10x ROI of Building Infrastructure in Layers

Read our new white paper on why RackN created a new data center provisioning software solution to automate  and simplify data center operations.

<

Read the paper here

Coming Soon!

Next week we are releasing a new Podcast with Mark Thiele, Chief Strategy and Chief Information Officer at Apcera.

UPCOMING EVENTS

Rob Hirschfeld and Greg Althaus are preparing for a series of upcoming events where they are speaking or just attending. If you are interested in meeting with them at these events please email info@rackn.com

If you are attending any of these events please reach out to Rob Hirschfeld to setup time to learn more about our solutions or discuss the latest industry trends.

OTHER NEWSLETTERS

 

Making Server Deployment 10x Faster – the ROI on Immutable Infrastructure

Author’s note: We’re looking for RackN Beta participants who want to help refine next generation deployment capabilities like the one described below.  We have these processes working today – our goal is to make them broadly reusable and standardized.

We’ve been posting [Go CI/CD and Immutable Infrastructure for Edge Computing Management] and podcasting [Discoposse: The Death of Configuration Management, Immutable Deployment Challenges for DevOps] about the concept of immutable infrastructure because it offers simpler and more repeatable operations processes. Delivering a pre-built image with software that’s already installed and mostly configured can greatly simplify deployment (see cloud-init).  It is simpler because all of the “moving parts” of the image can be pre-wired together and tested as a unit.  This model is default for containers, but it’s also widely used in cloud deployments where it’s easy to push an AMI or VHD to the cloud as a master image.

It takes work and expertise to automate building these immutable images, so it’s important to understand the benefits of simplicity, repeatability and speed.

  • Simplicity: Traditional configuration approaches start from an operating system base and then run configuration scripts to install the application and its prerequisites.  This configuration process requires many steps that are sequence dependent and have external dependencies.  Even small changes will break the entire system and prevent deployments.  By doing this as an image, deploy time integration or configuration issues fare eliminated.
  • Repeatability: Since the deliverable is an image, all environments are using the exact same artifact from dev, test and production.  That consistency reduces error rates and encourages cross-team collaboration because all parties are invested in the providence of the images.  In fact, immutable images are a great way to ensure that development and operations are at the table because neither team can create a custom environment.
  • Speed: Post-deployment configuration is slow.  If your installation has to pull patches, libraries and other components every time you install it then you’ll spend a lot of time waiting for downloads.  Believe it or not, the overhead of downloading a full image is small compared to the incremental delays of configuring an application stack.  Even the compromise of pre-staging items and then running local only configuration still take a surprisingly long time.

These benefits have been relatively easy to realize with Docker containers (it’s built in!) or VM images; however, they are much harder to realize with physical systems.  Containers and VMs provide a consistent abstraction that is missing in hardware.  Variations in networking, storage or even memory can cause images deployments to fail.

But… if we could do image based deployments to metal then we’d be able to gain these significant advantages.  We’d also be able to create portability of images between cloud and physical infrastructure.  Between the pure speed of direct images to disk (compared to kickstart or pre-seed) and the elimination of post-provision configuration, immutable metal deploys can be 5x to 10x faster.  

Deployment going from 30 minutes down to 6 or even 3.  That’s a very big deal.

That’s exactly why RackN has been working to create a standardized, repeatable process for immutable deployments.  We have this process working today with some expert steps required in image creation.  

If this type of process would help your operations team then please contact us and join the RackN Beta Program with advanced extensions for Digital Rebar Provision.

Note: There are risks to this approach as well.  There is no system wide patch or update mechanism except creating a new image and redeploying.  That means it takes more time to generate and roll an emergency patch to all systems.  Also, even small changes require replacing whole images.  These are both practical concerns; however, they are mitigated by maintaining a robust continuous deployment process where images are being constantly refreshed.

Digital Rebar v3.1 Release Annoucement

We’ve made open network provisioning radically simpler.  So simple, you can install in 5 minutes and be provisioning in under 30.  That’s a bold claim, but it’s also an essential deliverable for us to bridge the Ops execution gap in a way that does not disrupt your existing tool chains.

We’ve got a remarkable list of feature additions between Digital Rebar Provision (DRP) v3.0 and v3.1 that take it from basic provision into a powerful distributed infrastructure automation tool.

But first, we need to put v3.1 into a broader perspective: the new features are built from hard learned DevOps lessons.  The v2 combination of integrated provisioning and orchestration meant we needed a lot of overhead like Docker, Compose, PostgreSQL, Consul and RAILS.  That was needed for complex “one-click” cluster builds; however it’s overkill for users of Ansible, Terraform and immutable infrastructure flows.  

The v3 mantra is about starting simple and allowing users to grow automation incrementally.  RackN has been building advanced automation packages and powerful UX management to support that mission.

So what’s in the release?  The v3.0 release focused on getting core Provision infrastructure APIs, process and patterns working as a stand alone service. The v3.1 release targeted major architectural needs to streamline content management, event notification and add out-of-band actions.  

Key v3.1 Features

  • New Mascot and Logo!  We have a cloud native bare metal bear.  DRP fans should ask about stickers and t-shirts. Name coming soon! 
  • Layered Storage System. DRP storage model allows for layered storage tiers to support the content model and a read only base layer. These features allow operators to distribute content in a number of different ways and make field upgrades and multi-site synchronization possible.
  • Content packaging system.  DRP contents API allows operators to manage packages of other models via a single API call.  Content bundles are read-only and versioned so that field upgrades and patches can be distributed.
  • Plug-in system.  DRP allows API extensions and event listeners that are in the same process space as the DRP server.  This enables IPMI extensions and slack notifiers.
  • Stages, Tasks & Jobs.  DRP has a simple work queue system in which tasks are stored and tracked on machines during stages in their boot sequences.  This feature combines server and DRP client actions to create fast, simple and flexible workflows that don’t require agents or SSH access.
  • Websocket API for event subscription.  DRP clients can subscribe to system events using a long term websocket interface.  Subscriptions include filters so that operators can select very narrow notification scopes.
  • Removal of the minimal embedded UI (moving to community hosted UX).   DRP decoupled the user interface from the service API.  This allows features to be added to the UX without having to replace the Service.  This also allows community members to create their own UX.  RackN has agreed to support community users at no cost on a limited version of our commercial UX.

All of these features enable DRP to perform 100% of the hardware provision workflows that our customers need to run a fully autonomous, CI/CD enabled data center.  RackN has been showing examples of Ansible, Kubernetes, and Terraform to Metal integration as a reference implementations.

Getting the physical layer right is critical to closing your infrastructure execution gaps.  DRP v3.1 goes beyond getting it right – it makes it fast, simple and open.  Take a test drive of the open source code or give RackN a call to see our advanced automation demos.

Go CI/CD and Immutable Infrastructure for Edge Computing Management

In our last post, we pretty much tore apart the idea of running mini-clouds on the edge because they are not designed to be managed at scale in resource constrained environments without deep hardware automation.  While I’m a huge advocate of API-driven infrastructure, I don’t believe in a one-size-fits-all API because a good API provides purpose-driven abstractions.

The logical extension is that having deep hardware automation means there’s no need for cloud (aka virtual infrastructure) APIs.  This is exactly what container-focused customers have been telling us at RackN in regular data centers so we’d expect the same to apply for edge infrastructure.

If “cloudification” is not the solution then where should we look for management patterns?  

We believe that software development CI/CD and immutable infrastructure patterns are well suited to edge infrastructure use cases.  We discussed this at a session at the OpenStack OpenDev Edge summit.

Continuous Integration / Continuous Delivery (CI/CD) software pipelines help to manage environments where the risk of making changes is significant by breaking the changes into small, verifiable units.  This is essential for edge because lack of physical access makes it very hard to mitigate problems.  Using CI/CD, especially with A/B testing, allows for controlled rolling distribution of new software.  

For example, in a 10,000 site deployment, the CI/CD infrastructure would continuously roll out updates and patches over the entire system.  Small incremental changes reduce the risk of a major flaw being introduced.  The effect is enhanced when changes are rolled slowly over the entire fleet instead of simultaneously rolled out to all sites (known as A/B or blue/green testing).  In the rolling deployment scenario, breaking changes can be detected and stopped before they have significant impacts.

These processes and the support software systems are already in place for large scale cloud software deployments.  There are likely gaps around physical proximity and heterogeneity; however, the process is there and initial use-case fit seems to be very good.

Immutable Infrastructure is a catch-all term for deployments based on images instead of configuration.  This concept is popular in cloud deployments were teams produce “golden” VM or container images that contain the exact version of software needed and then are provisioned with minimal secondary configuration.  In most cases, the images only need a small file injected (known as a cloud init) to complete the process.

In this immutable pattern, images are never updated post deployment; instead, instances are destroyed and recreated.  It’s a deploy, destroy, repeat process.  At RackN, we’ve been able to adapt Digital Rebar Provisioning to support this even at the hardware layer where images are delivered directly to disk and re-provisioning happens on a constant basis just like a cloud managing VMs.

The advantage of the immutable pattern is that we create a very repeatable and controlled environment.  Instead of trying to maintain elaborate configurations and bi-directional systems of record, we can simply reset whole environments.  In a CI/CD system, we constantly generate fresh images that are incrementally distributed through the environment.

Immutable Edge Infrastructure would mean building and deploying complete system images for our distributed environment.  Clearly, this requires moving around larger images than just pushing patches; however, these uploads can easily be staged and they provide critical repeatability in management.  The alternative is trying to keep track of which patches have been applied successfully to distributed systems.  Based on personal experience, having an atomic deliverable sounds very attractive.

CI/CD and Immutable patterns are deep and complex subjects that go beyond the scope of a single post; however, they also offer a concrete basis for building manageable data centers.

The takeaway is that we need to be looking first to scale distributed software management patterns to help build robust edge infrastructure platforms. Picking a cloud platform before we’ve figured out these concerns is a waste of time.

Previous 2 Posts on OpenStack Conference:

Post 1 – OpenStack on Edge? 4 Ways Edge is Distinct from Cloud
Post 2 – Edge Infrastructure is Not Just Thousands of Mini Clouds

Edge Infrastructure is Not Just Thousands of Mini Clouds

I left the OpenStack OpenDev Edge Infrastructure conference with a lot of concerns relating to how to manage geographically distributed infrastructure at scale.  We’ve been asking similar questions at RackN as we work to build composable automation that can be shared and reused.  The critical need is to dramatically reduce site-specific customization in a way that still accommodates required variation – this is something we’ve made surprising advances on in Digital Rebar v3.1.

These are very serious issues for companies like AT&T with 1000s of local exchanges, Walmart with 10,000s of in-store server farms or Verizon with 10,000s of coffee shop Wifi zones.  These workloads are not moving into centralized data centers.  In fact, with machine learning and IoT, we are expecting to see more and more distributed computing needs.

Running each site as a mini-cloud is clearly not the right answer.

While we do need the infrastructure to be easily API addressable, adding cloud without fixing the underlying infrastructure management moves us in the wrong direction.  For example, AT&T‘s initial 100+ OpenStack deployments were not field up-gradable and lead to their efforts to deploy OpenStack on Kubernetes; however, that may have simply moved the upgrade problem to a different platform because Kubernetes does not address the physical layer either!

There are multiple challenges here.  First, any scale infrastructure problem must be solved at the physical layer first.  Second, we must have tooling that brings repeatable, automation processes to that layer.  It’s not sufficient to have deep control of a single site: we must be able to reliably distribute automation over thousands of sites with limited operational support and bandwidth.  These requirements are outside the scope of cloud focused tools.

Containers and platforms like Kubernetes have a significant part to play in this story.  I was surprised that they were present only in a minor way at the summit.  The portability and light footprint of these platforms make them a natural fit for edge infrastructure.  I believe that lack of focus comes from the audience believing (incorrectly) that edge applications are not ready for container management.

With hardware layer control (which is required for edge), there is no need for a virtualization layer to provide infrastructure management.  In fact, “cloud” only adds complexity and cost for edge infrastructure when the workloads are containerized.  Our current cloud platforms are not designed to run in small environments and not designed to be managed in a repeatable way at thousands of data centers.  This is a deep architectural gap and not easily patched.

OpenStack sponsoring the edge infrastructure event got the right people in the room but also got in the way of discussing how we should be solving these operational.  How should we be solving them?  In the next post, we’ll talk about management models that we should be borrowing for the edge…

Read 1st Post of 3 from OpenStack OpenDev: OpenStack on Edge? 4 Ways Edge is Distinct from Cloud