DC2020: Is Exposing Bare Metal Practical or Dangerous?

One of IBM’s major announcements at Think 2018 was Managed Kubernetes on Bare Metal. This new offering combines elements of their existing offerings to expose some additional security, attestation and performance isolation. Bare metal has been a hot topic for cloud service providers recently with AWS adding it to their platform and Oracle using it as their primary IaaS. With these offerings as a backdrop, let’s explore the role of bare metal in the 2020 Data Center (DC2020).

Physical servers (aka bare metal) are the core building block for any data center; however, they are often abstracted out of sight by a virtualization layer such as VMware, KVM, HyperV or many others. These platforms are useful for many reasons. In this post, we’re focused on the fact that they provide a control API for infrastructure that makes it possible to manage compute, storage and network requests. Yet the abstraction comes at a price in cost, complexity and performance.

The historical lack of good API control has made bare metal less attractive, but that is changing quickly due to two forces.

These two forces are Container Platforms and Bare Metal as a Service or BMaaS (disclosure: RackN offers a private BMaaS platform called Digital Rebar). Container Platforms such as Kubernetes provide an application service abstraction level for data center consumers that eliminates the need for users to worry about traditional infrastructure concerns.  That means that most users no longer rely on APIs for compute, network or storage allowing the platform to handle those issues. On the other side, BMaaS VM infrastructure level APIs for the actual physical layer of the data center allow users who care about compute, network or storage the ability to work without VMs.  

The combination of containers and bare metal APIs has the potential to squeeze virtualization into a limited role.

The IBM bare metal Kubernetes announcement illustrates both of these forces working together.  Users of the managed Kubernetes service are working through the container abstraction interface and really don’t worry about the infrastructure; however, IBM is able to leverage their internal bare metal APIs to offer enhanced features to those users without changing the service offering.  These benefits include security (IBM White Paper on Security), isolation, performance and (eventually) access to metal features like GPUs. While the IBM offering still includes VMs as an option, it is easy to anticipate that becoming less attractive for all but smaller clusters.

The impact for DC2020 is that operators need to rethink how they rely on virtualization as a ubiquitous abstraction.  As more applications rely on container service abstractions the platforms will grow in size and virtualization will provide less value.  With the advent of better control of the bare metal infrastructure, operators have real options to get deep control without adding virtualization as a requirement.

Shifting to new platforms creates opportunities to streamline operations in DC2020.

Even with virtualization and containers, having better control of the bare metal is a critical addition to data center operations.  The ideal data center has automation and control APIs for every possible component from the metal up.

Learn more about the open source Digital Rebar community:

RackN Portal Management Connection to the 10 Minute Demo

In my previous blog, I provided step by step directions to install Digital Rebar Provision on a new endpoint and create a new node using Packet.net for users without a local hardware setup. (Demo Tool on GitHub) In this blog, I will introduce the RackN Portal and connect it to the active setup running on Packet.net at the end of the demo process.

NOTE – You will need to run the demo process again to have both the DRP installation and endpoint active on Packet.net.

Current Status

There will be two machines running in Packet:

  • Digital Rebar Provision running on an Endpoint
  • A new physical node provided by DRP

In order to have run the process in the previous blog, you will have created a RackN Portal account to get the RackN code to add into the Secrets file.

Steps to Connect RackN Portal

When you first go to the RackN Portal you will see the following screen:

The first step is to enter the Endpoint Address which will come from the Packet.net Endpoint server setup in the previous blog. To get the address go to the “Configure DRP” step and you will see the following which contains the Endpoint http address:

running ACTION:  drp-setup-demo
+ set +x
+ drpcli –endpoint=https://147.##.##.63:8092 bootenvs uploadiso centos-7-install
{
 “Path”: “CentOS-7-x86_64-Minimal-1708.iso”,

“Size”: 830472192
}
+ set +x
{
 “centos-7-install”: “packet-ssh-keys:Success”,

“discover”: “packet-discover:Success”,
“packet-discover”: “centos-7-install:Reboot”,
“packet-ssh-keys”: “complete-nowait:Success”
}

Enter the following https address https://147.##.##.63:8092 into the Endpoint Address and press the blue arrow. You will then be taken to the login screen where you enter the standard login info:

Select “Defaults” to have the system fill in the Login information. If you need more information on this screen, please review the Install Guide.

RackN Portal Tour

After completing the login your RackN Portal screen will look like this:

At this point, we want to see the new node that was created in the final step of our demo process. Select “Machines” on the left-hand navigation below SYSTEM and you will see the new machine that was created. NOTE – The Red X next to Subnets is appropriate for Packet.net infrastructure.

You can confirm this machine name with the name of the machine in the last stage of the process. Both the RackN Portal and the data below indicate that I have created a new node called “spectordemo-machines-ewr1-01“.

Selecting the newly created machine you will see the following information:

In the next blog, we will use the RackN Portal to create a second node and look at the Workflow process to install an operating system on both nodes.

If you have any questions or would like to get started learning more about Digital Rebar Provision and RackN please join the Slack community.

DC2020: Skeptics Guide to Blockchain in the Data Center

At Think 2018, Machine Learning and Blockchain technologies are beyond pervasive, they are assumed to be beneficial to ROI in every situation. That type of hype begs for closer review. In this post, we’ll look at a potentially real use of blockchain for operations.

There is so much noise about blockchain that it can be difficult to find a starting point. I’m leaving background reading as an exercise for the reader; instead, I want to focus on how blockchain creates a distributed ledger with shared trust. That’s a lot of buzz words! Basically, we’re talking about a system where nodes share data in a way that they use consensus with their peer to determine if the information is trustworthy.

The key concept in blockchain is moving from a central authority to a distributed authority.

In the data center, administrative trust is essential. The premises, networks, and access credentials all rely on the idea that we have a centralized authoritative group. Even PKI, which is designed for decentralized trust, relies on a centralized trust to sign keys. Looking objectively at the bundle of passwords, certificates, keys and isolation layers, there are gaping risks in this model. It only takes getting the right access to flip administrative control from an asset into a liability.

Blockchain allows us to decentralize trust in the data center by requiring systems to collaboratively validate administrative instructions.

In this model, we’d still have administrative controls and management; however, the nodes would be able to validate configuration changes with their peers or other administrative sources. For example, an out of process change (potential hack?) on a single node would be confirmed via consensus with other nodes instead of automatically trusting the source. The body of nodes protects from a bad administrative request. It also allows operators to quickly propagate configurations peer-to-peer instead of relying on a central hub and spoke model.

This is even more powerful if configuration is composited from multiple sources in a pipeline. In a multiple author system, each contributor will be involved in verifying that changes to the whole configuration. This ensures that downstream insertions are both communicated and accepted by upstream steps.  This works because blockchain is a distributed ledger. Changes made to the chain are passed back to all parties. Just like in a decentralized supply chain model, this ensures both validation and transparency.

Blockchain’s ability to provide both horizontal and vertical integrity for operations is an intriguing possibility.

I’m interested in hearing your thoughts about this application for blockchain. From a RackN and Digital Rebar perspective, these capabilities are well aligned with our composable approach to configuration. We’d be happy to talk with operators who want to look more deeply into this type of integration.

DC2020: Mono-clouds are easier! Why do Hybrid?

Background: This post was inspired by a mult-cloud session session at IBM Think2018 where I am attending as a guest of IBM. Providing hybrid solutions is a priority for IBM and it’s customers are clearly looking for multi-cloud options. In this way, IBM has made a choice to support competitive platforms. This post explores why they would do that.

There is considerable angst and hype over the terms multi-cloud and hybrid-cloud. While it would be much simpler if companies could silo into a single platform, innovation and economics requires a multi-party approach. The problem is NOT that we want to have choice and multiple suppliers. The problem is that we are moving so quickly that there is minimal interoperability and minimal efforts to create interoperability.

To drive interoperability, we need a strong commercial incentive to create an neutral ecosystem.

Even something with a clear ANSI standard like SQL has interoperability challenges. It also seems like the software industry has given up on standards in favor of APIs and rapid innovation. The reality on the ground is that technology is fundamentally heterogeneous and changing. For this reason, mono-anything is a myth and hybrid is really status quo.

If we accept multi-* that as a starting point, then we need to invest in portability and avoid platform assumptions when we build automation. Good design is to assume change at multiple points in your stack. Automation itself is a key requirement because it enables rapid iterative build, test and deploy cycles. It is not enough to automate for day 1, the key to working multi-* infrastructure is a continuous deployment pipeline.

Pipelines provide insurance for hybrid infrastructure by exposing issues quickly before they accumulate technical debt.

That means the utility of tools like Terraform, Ansible or Docker is limited to how often you exercise them. Ideally, we’d build abstraction automation layers above these primitives; however, this has proven very difficult in practice. The degrees of variation between environments and pace of innovation make it impossible to standardize without becoming very restrictive. This may be possible for a single company but is not practical for a vendor trying to support many customers with a single platform.

This means that hybrid, while required in the market, carries an integration tax that needs to be considered.

My objective for discussing Data Center 2020 topics is to find ways to lower that tax and improve the outcome. I’m interested in hearing your opinion about this challenge and if you’ve found ways to solve it.

Counterpoint Addendum: if you are in a position to avoid multi-* deployments (e.g. a start-up) then you should consider that option. There is measurable overhead of heterogeneous automation; however, I’ve found the tipping point away from a mono-stack can be surprising low and committing to a vertical stack does make applications less innovation resilient.

Series Intro: A Focus on Sustaining Operations

When discussing the data center of the future, it’s critical that we start by breaking the concept of the data center as a physical site with guarded walls, raised floors, neat rows of servers and crash cart pushing operators. The Data Center of 2020 (DC2020) is a distributed infrastructure comprised of many data centers, cloud services and connected devices.

The primary design concept of DC2020 is integrated automation not actual infrastructures.

As an industry, we need to actively choose implementations that unify our operational models to create portability and eliminate silos. This means investing more in sustaining operations (aka Day 2 Ops) that ensure our IT systems can be constantly patched, updated and maintained. The pace of innovation (and discovered vulnerabilities!) requires that we build with the assumption of change. DC2020 cannot be “fire and forget” building that assumes occasional updates.

There are a lot of disruptive and exciting technologies entering the market. These create tremendous opportunities for improvement and faster innovation cycles. They also create significant risk for further fragmenting our IT operations landscape in ways to increase costs, decrease security and further churn our market.

It is possible to be for both rapid innovation and sustaining operations, but it requires a plan for building robust automation.

The focus on tightly integrated development and operations work is a common theme in both DevOps and Site/System Reliability Engineering topics that we cover all the time. They are not only practical, we believe they are essential requirements for building DC2020.

Over this week, I’m going to be using the backdrop of IBM Think to outline the concepts for DC2020. I’ll both pull in topics that I’m hearing there and revisit topics that we’ve been discussing on our blogs and L8ist Sh9y podcast. Ultimately, we’ll create a comprehensive document: for now, we invite you to share your thoughts about this content in it’s more raw narrative form. 

Podcast – Oliver Gould on Service Mesh, Containers, and Edge

Joining us this week is Oliver Gould, CTO Buoyant who provides a service mesh abstraction view to micro-services and Kubernetes. Oliver and Rob also take a look at how applications are managed at the edge and highlights the future roadmap for Conduit.

Highlights

  • Defining microservices and Kubernetes from Buoyant viewpoint
  • Service mesh abstractions at a request level (load balance, get, put, …)
  • Conduit overview – client-side load balancing
  • Service mesh tool comparisons
  • Edge Computing discussion from service mesh view

Topic                                                                           Time (Minutes.Seconds)

Introduction                                                                0.0 – 1:39
Define Microservices                                                1:39 – 5.25
Define Kubernetes                                                     5.25 – 10.23 (Memory as a Service)
Service Mesh Abstractions                                       10.23 – 12.37 (L5 or L7)
Conduit Overview                                                      12.37 – 18.20 (Sidecar Container)
When do I need Service Mesh?                              18.20 – 19.55 (Complex Debugging)
Service Mesh Comparisons                                     19.55 – 22.31
Deployment Times / V2 to 3 for DRP                    22.31 – 25.13 (Kubernetes into Production)
Edge Computing                                                       25.13 – 27.04 (Define)
App in Cloud + Edge Device?                                  27.04 – 31.10 (POP = Point of Prescience)
Containers + Serverless                                            31.10 – 34.30 (Proxy in Browser)
Future Roadmap                                                       34.30 – 37.06 (Conduit.io)
Wrap Up                                                                     37.06 – END

Podcast Guest:  Oliver Gould, CTO Buoyant

Oliver Gould is the CTO of Buoyant, where he leads open source development efforts. Previously, he was a staff infrastructure engineer at Twitter, where he was the tech lead of the Observability, Traffic, and Configuration and Coordination teams. Oliver is the creator of linkerd and a core contributor to Finagle, the high-volume RPC library used at Twitter, Pinterest, SoundCloud, and many other companies.

Week in Review : Test Digital Rebar in Minutes with Hosted Physical Infrastructure

Welcome to the RackN and Digital Rebar Weekly Review. You will find the latest news related to Edge, DevOps, SRE and other relevant topics.

Deploy and Test Digital Rebar Provision with No Infrastructure in 10 Minutes

For operators looking to better understand Digital Rebar Provision (DRP) RackN has developed an easy to follow process leveraging Packet.net for physical device creation. This process allows new users to create a physical DRP endpoint and then provision a new physical node on Packet. Information and code to run this guide is available at https://github.com/digitalrebar/provision/tree/master/examples/pkt-demo.

In this blog, I will take the reader through the process with images based on running via my Mac.

Read More

Site Reliability Engineering: 4 Things to Know

Organizations that have embraced DevOps and cloud-native architecture might also want to investigate SRE. Interop ITX expert Rob Hirschfeld explains why.

To find out more about site reliability engineering, Network Computing spoke with Rob Hirschfeld, who has been in the cloud and infrastructure space for nearly 15 years, including work with early ESX betas and serving on the Open Stack Foundation Board. Hirschfeld, cofounder and CEO of RackN, will present “DevOps vs SRE vs Cloud Native” at Interop ITX 2018.

Read More


News

RackN

Digital Rebar Community

L8ist Sh9y Podcast

Social Media

Deploy and Test Digital Rebar Provision in less than 10 Minutes : How To Guide

Part 1 of 3 in Digital Rebar Provision How To Blog Series

For operators looking to better understand Digital Rebar Provision (DRP) RackN has developed an easy to follow process leveraging Packet.net for physical device creation. This process allows new users to create a physical DRP endpoint and then provision a new physical node on Packet. Information and code to run this guide is available at https://github.com/digitalrebar/provision/tree/master/examples/pkt-demo.

In this blog, I will take the reader through the process with images based on running via my Mac.

SETUP

  • You will need an account on Packet at https://www.packet.net/. I created a personal account and entered a credit card to pay for the services used. The cost on Packet to run this is minimal.
    • From your Packet.net account you will need to create a NEW Project and an API Key. The API key will look like 7DE1Be6NLjGP6KUH4mbUAbysjwOx9kHo and the Project will look like b5d29881-8561-4f3b-8efb-2d61003fe2e7. NOTE – The values shown are changed and will not work in Packet.
  • You will need an account on the RackN Portal via https://portal.rackn.io. From this account you will need your Username which looks like t98743fk-3865-4315-8d11-11127p9e41bd. NOTE – The value shown is not a valid Username.
  • Mac Users – I needed to have Homebrew installed on my machine to run this demo script. Run the 2 steps below…

PROCESS

  • Git Clone the guide (DO NOT run w/ “sudo”)
  • Edit the Secrets file with Packet and RackN Portal info from Setup
    • vi private-content/secrets

# specify your API KEY that has access to PROJECT ID below
API=”insert_api_key_here”
# specify the PROJECT ID that API KEY has access to
PROJECT=”insert_project_id_here”
# RackN Username – necessary to download registered (but free) content packs
USERNAME=”insert username here”

  • Run the demo-run.sh Script
    • ./demo-run.sh : this will launch the guide and you will see the Digital Rebar bear along with a request to run the next step

  • <RETURN> “Install Terraform”

  • <RETURN> “Install Secrets”

  • <RETURN> “Generate Public/Private RSA Keys”

  • <RETURN> “Packet SSH Key”

  • <RETURN> “2nd Packet SSH Key”

  • <RETURN> Creating the DRP Endpoint on Packet

  • <RETURN> Create a Terraform Plan

  • <RETURN> Download DRP to Endpoint

  • <RETURN> SSH Keygen

  • <RETURN> SSH Keyscan

  • <RETURN> Install DRP onto Packet Host Endpoint

Additional Installation Content Not Shown

  • <RETURN> Configure DRP

Additional Configuration Content Not Shown

NOTE – Getting a FAILED at this stage is expected and you should continue

  • <RETURN> Setup DRP Endpoint

  • <RETURN> Create new Packet Physical Node form DRP Endpoint

At this point you will have 2 machines running in Packet:

  • Digital Rebar Provision running on an Endpoint
  • A new physical node provisioned by DRP
  • To clean-up this process and shut down the 2 Packet machines run the following command ./bin/control.sh cleanup
    • It will clean up Packet as well as reset all files back to the original state when cloned from github.

In my next blog, I will introduce the process to connect your Packet Endpoint machine to the RackN Portal so you can see the newly created node and begin working with it from the RackN Portal.

If you have any questions, please leverage the RackN Slack #Community channel where Digital Rebar community members and RackN engineers are available to assist.

Cloud Native Surfing at IBM Think 2018

Rob Hirschfeld speaks with Kevin Allen, Content Lead, IBM [@KevJosephAllen] about next week’s IBM Think 2018 conference (Mach 19-22) in Las Vegas. Contact us if you are interested in setting up a meeting with Rob next week at the event.

Highlights:

What is RackN working on? Physical Infrastructure Automation to manage metal in the data center as you would a VM in the cloud.

Trends in Infrastructure and Cloud space?  Getting involved in immutable infrastructure, CI/CD pipelines, and focus on zero-touch management. We have also been talking about Edge Computing and how it will be managed vs cloud.

Cloud Native movement is developers on surfboards and see a huge wave in the distance, where are we now? We are still at the point in open source that the technology is powerful and people are still learning how they work. Layers are forming on top of these container tools and customers are moving up the stack to understand more and more. The tide is coming in and the waves are getting bigger with lots and lots of wavelets still growing out at sea.

Enterprise user base is looking for more integration from projects, doesn’t have to be in 1 project but multiple projects connecting with each other.

Hybrid Cloud conversation has changed? Hybrid Cloud is the way people do business. The focus has moved to Hybrid IT with infrastructure being located at various locations allowing customers to take advantage of best of breed based on needs. The market is hybrid and customers need to integrate data flows between these services. Tools are lacking in this marketplace to manage this.

Looking forward to Think 2018? Interested in new AI and machine learning but key focus for the event is talking to real users and seeing real applications. Focus on actual deployments of this technology is more important that what is coming.

Advice for Event? Comfortable shoes. Allow time for unexpected things to happen – attend new talks based on speakers or topics you don’t know much about.

Podcast – Yadin Porter de León on critical open source community failings

Joining us this week is Yadin Porter de León (@porterdeleon), IT Community at Druva as well as from the Level Up Project and host of the Tech Village podcast.

Highlights

  • Open Source Communities and the People
  • Relationship of Corporations in Open Source and Community
  • Users of Open Source care about Community?
  • Community’s should FOCUS and not overlap to adjacencies

Topic                                                                               Time (Minutes.Seconds)

Introduction                                                                     0.0 –  1.48
Community and people                                                1.48 – 2.14
Impact of code released on users                              2.14 – 5.47 (Community at speed of code)
Code can be an ugly child                                            5.47 – 8.55
Community can be an ugly child                                8:55 – 15.55 (Plamondon Podcast)
What open source is commoditizing                         15.55 – 20.25 (SIGs in projects & control)
How open source is being consumed                       20.25 – 25.53 (Guidance from Corp in OS)
Standard bodies and open source                             25.53 – 28.51 (Code has inertia)
Community changes to solve these issues               28.15 – 34.39 (Focus on core of project)
Defensive communities from code                            34.39 – 39.18 (Community volatility)
Projects must address existing customers               39.18 – 41.58
Wrap Up                                                                          41.58 – END

Podcast Guest: Yadin Porter de León, IT Community at Druva and Founder of Level Up Project

Yadin has been a B2B technology change agent across multiple industries for over 10 years. He currently leads content marketing, influence marketing and IT community at Druva and is the founder of the Level Up Project.  Yadin has helped grow companies small and large as a leader, individual contributor, and board member and, as a steering committee member of the Silicon Valley VMUG chapter, he has also been an event speaker and organizer.