We need DevOps without Borders! Is that “Hybrid DevOps?”

The RackN team has been working on making DevOps more portable for over five years.  Portable between vendors, sites, tools and operating systems means that our automation needs be to hybrid in multiple dimensions by design.

Why drive for hybrid?  It’s about giving users control.

launch!I believe that application should drive the infrastructure, not the reverse.  I’ve heard may times that the “infrastructure should be invisible to the user.”  Unfortunately, lack of abstraction and composibility make it difficult to code across platforms.  I like the term “fidelity gap” to describe the cost of these differences.

What keeps DevOps from going hybrid?  Shortcuts related to platform entangled configuration management.

Everyone wants to get stuff done quickly; however, we make the same hard-coded ops choices over and over again.  Big bang configuration automation that embeds sequence assumptions into the script is not just technical debt, it’s fragile and difficult to upgrade or maintain.  The problem is not configuration management (that’s a critical component!), it’s the lack of system level tooling that forces us to overload the configuration tools.

What is system level tooling?  It’s integrating automation that expands beyond configuration into managing sequence (aka orchestration), service orientation, script modularity (aka composibility) and multi-platform abstraction (aka hybrid).

My ops automation experience says that these four factors must be solved together because they are interconnected.

What would a platform that embraced all these ideas look like?  Here is what we’ve been working towards with Digital Rebar at RackN:

Mono-Infrastructure IT “Hybrid DevOps”
Locked into a single platform Portable between sites and infrastructures with layered ops abstractions.
Limited interop between tools Adaptive to mix and match best-for-job tools.  Use the right scripting for the job at hand and never force migrate working automation.
Ad hoc security based on site specifics Secure using repeatable automated processes.  We fail at security when things get too complex change and adapt.
Difficult to reuse ops tools Composable Modules enable Ops Pipelines.  We have to be able to interchange parts of our deployments for collaboration and upgrades.
Fragile Configuration Management Service Oriented simplifies API integration.  The number of APIs and services is increasing.  Configuration management is not sufficient.
 Big bang: configure then deploy scripting Orchestrated action is critical because sequence matters.  Building a cluster requires sequential (often iterative) operations between nodes in the system.  We cannot build robust deployments without ongoing control over order of operations.

Should we call this “Hybrid Devops?”  That sounds so buzz-wordy!

I’ve come to believe that Hybrid DevOps is the right name.  More technical descriptions like “composable ops” or “service oriented devops” or “cross-platform orchestration” just don’t capture the real value.  All these names fail to capture the portability and multi-system flavor that drives the need for user control of hybrid in multiple dimensions.

Simply put, we need devops without borders!

What do you think?  Do you have a better term?

DNS is critical – getting physical ops integrations right matters

Why DNS? Maintaining DNS is essential to scale ops.  It’s not as simple as naming servers because each server will have multiple addresses (IPv4, IPv6, teams, bridges, etc) on multiple NICs depending on the systems function and applications. Plus, Errors in DNS are hard to diagnose.

Names MatterI love talking about the small Ops things that make a huge impact in quality of automation.  Things like automatically building a squid proxy cache infrastructure.

Today, I get to rave about the DNS integration that just surfaced in the OpenCrowbar code base. RackN CTO, Greg Althaus, just completed work that incrementally updates DNS entries as new IPs are added into the system.

Why is that a big deal?  There are a lot of names & IPs to manage.

In physical ops, every time you bring up a physical or virtual network interface, you are assigning at least one IP to that interface. For OpenCrowbar, we are assigning two addresses: IPv4 and IPv6.  Servers generally have 3 or more active interfaces (e.g.: BMC, admin, internal, public and storage) so that’s a lot of references.  It gets even more complex when you factor in DNS round robin or other common practices.

Plus mistakes are expensive.  Name resolution is an essential service for operations.

I know we all love memorizing IPv4 addresses (just wait for IPv6!) so accurate naming is essential.  OpenCrowbar already aligns the address 4th octet (Admin .106 goes to the same server as BMC .106) but that’s not always practical or useful.  This is not just a Day 1 problem – DNS drift or staleness becomes an increasing challenging problem when you have to reallocate IP addresses.  The simple fact is that registering IPs is not the hard part of this integration – it’s the flexible and dynamic updates.

What DNS automation did we enable in OpenCrowbar?  Here’s a partial list:

  1. recovery of names and IPs when interfaces and systems are decommissioned
  2. use of flexible naming patterns so that you can control how the systems are registered
  3. ability to register names in multiple DNS infrastructures
  4. ability to understand sub-domains so that you can map DNS by region
  5. ability to register the same system under multiple names
  6. wild card support for C-Names
  7. ability to create a DNS round-robin group and keep it updated

But there’s more! The integration includes both BIND and PowerDNS integrations. Since BIND does not have an API that allows incremental additions, Greg added a Golang service to wrap BIND and provide incremental updates and deletes.

When we talk about infrastructure ops automation and ready state, this is the type of deep integration that makes a difference and is the hallmark of the RackN team’s ops focus with RackN Enterprise and OpenCrowbar.

Docker-Machine Crowbar Driver Delivers Metal Containers

I’ve just completed a basic Docker Machine driver for OpenCrowbar.  This enables you to quickly spin-up (and down) remote Docker hosts on bare metal servers from their command line tool.  There are significant cost, simplicity and performance advantages for this approach if you were already planning to dedicate servers to container workloads.

Docker Machine

The basics are pretty simple: using Docker Machine CLI you can “create” and “rm” new Docker hosts on bare metal using the crowbar driver.  Since we’re talking about metal, “create” is really “assign a machine from an available pool.”

Behind the scenes Crowbar is doing a full provision cycle of the system including installing the operating system and injecting the user keys.  Crowbar’s design would allow operators to automatically inject additional steps, add monitoring agents and security, to the provisioning process without changing the driver operation.

Beyond Create, the driver supports the other Machine verbs like remove, stop, start, ssh and inspect.  In the case of remove, the Machine is cleaned up and put back in the pool for the next user [note: work remains on the full remove>recreate process].

Overall, this driver allows Docker Machine to work transparently against metal infrastructure along side whatever cloud services you also choose.

Want to try it out?

  1. You need to setup OpenCrowbar – if you follow the defaults (192.168.124.10 ip, user, password) then the Docker Machine driver defaults will also work. Also, make sure you have the Ubuntu 14.04 ISO available for the Crowbar provisioner
  2. Discover some nodes in Crowbar – you do NOT need metal servers to try this, the tests work fine with virtual machines (tools/kvm-slave &)
  3. Clone my Machine repo (Wde’re looking for feedback before a pull to Docker/Machine)
  4. Compile the code using script/build.
  5. Allocate a Docker Node using  ./docker-machine create –driver crowbar testme
  6. Go to the Crowbar UI to watch the node be provisioned and configured into the Docker-Machines pool
  7. Release the node using ./docker-machine rm testme
  8. Go to the Crowbar UI to watch the node be redeployed back to the System pool
  9. Try to contain your enthusiasm 🙂

Want More?  Linux binary & readme.

Manage Hardware like a BOSS – latest OpenCrowbar brings API to Physical Gear

A few weeks ago, I posted about VMs being squeezed between containers and metal.   That observation comes from our experience fielding the latest metal provisioning feature sets for OpenCrowbar; consequently, so it’s exciting to see the team has cut the next quarterly release:  OpenCrowbar v2.2 (aka Camshaft).  Even better, you can top it off with official software support.

Camshaft coordinates activity

Dual overhead camshaft housing by Neodarkshadow from Wikimedia Commons

The Camshaft release had two primary objectives: Integrations and Services.  Both build on the unique functional operations and ready state approach in Crowbar v2.

1) For Integrations, we’ve been busy leveraging our ready state API to make physical servers work like a cloud.  It gets especially interesting with the RackN burn-in/tear-down workflows added in.  Our prototype Chef Provisioning driver showed how you can use the Crowbar API to spin servers up and down.  We’re now expanding this cloud-like capability for Saltstack, Docker Machine and Pivotal BOSH.

2) For Services, we’ve taken ops decomposition to a new level.  The “secret sauce” for Crowbar is our ability to interweave ops activity between components in the system.  For example, building a cluster requires setting up pieces on different systems in a very specific sequence.  In Camshaft, we’ve added externally registered services (using Consul) into the orchestration.  That means that Crowbar will either use existing DNS, Database, or NTP services or set it’s own.  Basically, Crowbar can now work FIT YOUR EXISTING OPS ENVIRONMENT without forcing a dedicated Crowbar only services like DHCP or DNS.

In addition to all these features, you can now purchase support for OpenCrowbar from RackN (my company).  The Enterprise version includes additional server life-cycle workflow elements and features like HA and Upgrade as they are available.

There are AMAZING features coming in the next release (“Drill”) including a message bus to broadcast events from the system, more operating systems (ESXi, Xenserver, Debian and Mirantis’ Fuel) and increased integration/flexibility with existing operational environments.  Several of these have already been added to the develop branch.

It’s easy to setup and test OpenCrowbar using containers, VMs or metal.  Want to learn more?  Join our community in Gitteremail list or weekly interactive community meetings (Wednesdays @ 9am PT).

Talking Functional Ops & Bare Metal DevOps with vBrownBag [video]

Last Wednesday (3/11/15), I had the privilege of talking with the vBrownBag crowd about Functional Ops and bare metal deployment.  In this hour, I talk about how functional operations (FuncOps) works as an extension of ready state.  FuncOps is a critical concept for providing abstractions to scale heterogeneous physical operations.

Timing for this was fantastic since we’d just worked out ESXi install capability for OpenCrowbar (it will exposed for work starting on Drill, the next Crowbar release cycle).

Here’s the brown bag:

If you’d like to see a demo, I’ve got hours of them posted:

Video Progression

Crowbar v2.1 demo: Visual Table of Contents [click for playlist]

too easy to bare metal? Ansible just works with OpenCrowbar

2012-01-15_10-21-12_716I’ve talked before about how OpenCrowbar distributes SSH keys automatically as part of its deployment process.  Now, it’s time to unleash some of the subsequent magic!

[5/21 Update: We added the “crowbar-access” role to the Drill release that allows you to inject/remove keys on a per node basis from the API or CLI at any point in the node life-cycle]

If you provision servers with your keys in place, then Ansible will just work with truly minimal configuration (one line in a file!).

Video Demo (steps bellow):

Here are my steps:

  1. Install OpenCrowbar and run some nodes to ready state [videos]
  2. Install Ansible [simple steps]
  3. Add hosts range “192.168.124.[81:83]  ansible_ssh_user=root” to the
    “/etc/ansible/hosts” file
  4. If you are really lazy, add “[Default] // host_key_checking = False” to your “~/.ansible.cfg” file
  5. now ping the hosts, “ansible all -m ping”
  6. pat yourself on the back, you’re done.
  7. to show off:
    1. touch all machines “ansible all -a “/bin/echo hello”
    2. look at types of Linux “ansible all -a “uname -a”

Further integration work can make this even more powerful.  

I’d like to see OpenCrowbar generate the Ansible inventory file from the discovery data and to map Ansible groups from deployments.  Crowbar could also call Ansible directly to use playbooks or even do a direct hand-off to Tower to complete an install without user intervention.

Wow, that would be pretty handy!   If you think so too, please join us in the OpenCrowbar community.

Online Meetup Today (1/13): Build a rock-solid foundation under your OpenStack cloud

Reminder: Online meetup w/ Crowbar + OpenStack DEMO TODAY

HFoundation Rawere’s the notice from the site (with my added Picture)

Building cloud infrastructure requires a rock-solid foundation. 

In this hour, Rob Hirschfeld will demo automated tooling, specifically OpenCrowbar, to prepare and integrate physical infrastructure to ready state and then use PackStack to install OpenStack.

 

The OpenCrowbar project started in 2011 as an OpenStack installer and had grown into a general purpose provisioning and infrastructure orchestration framework that works in parallel with multiple hardware vendors, operating systems and devops tools.  These tools create a fast, durable and repeatable environment to install OpenStack, Ceph, Kubernetes, Hadoop or other scale platforms.

 

Rob will show off the latest features and discuss key concepts from the Crowbar operational model including Ready State, Functional Operations and Late Binding. These concepts, built into Crowbar, can be applied generally to make your operations more robust and scalable.

OpenCrowbar v2.1 Video Tour from Metal to OpenStack and beyond

With the OpenCrowbar v2.1 out, I’ve been asked to update the video library of Crowbar demos.  Since a complete tour is about 3 hours, I decided to cut it down into focused demos that would allow you to start at an area of interest and work backwards.

I’ve linked all the videos below by title.  Here’s a visual table on contents:

Video Progression

Crowbar v2.1 demo: Visual Table of Contents [click for playlist]

The heart of the demo series is the Annealer and Ready State (video #3).

  1. Prepare Environment
  2. Bootstrap Crowbar
  3. Add Nodes ♥ Ready State (good starting point)
  4. Boot Hardware
  5. Install OpenStack (Juno using PackStack on CentOS 7)
  6. Integrate with Chef & Chef Provisioning
  7. Integrate with SaltStack

I’ve tried to do some post-production so limit dead air and focus on key areas.  As always, I value content over production values so feedback is very welcome!

Nextcast #14 Transcription on OpenStack & Crowbar > “we can’t hand out trophies to everyone”

Last week, I was a guest on the NextCast OpenStack podcast hosted by Niki Acosta (EMC) [Jeff Dickey could not join].   I’ve taken some time to transcribe highlights.

We had a great discussion nextcastabout OpenStack, Ops and Crowbar.  I appreciate Niki’s insightful questions and an opportunity to share my opinions.  I feel that we covered years of material in just 1 hour and I appreciate the opportunity to appear on the podcast.

Video from full post (youtube) and the audio for download.

Plus, a FULL TRANSCRIPT!  Here’s my Next Cast #14 Short Transcripton

The objective of this transcription is to help navigate the recording, not replace it.  I did not provide complete context for remarks.

  • 04:30 Birth of Crowbar (to address Ops battle scars)
  • 08:00 The need for repeatable Ready State baseline to help community work together
  • 10:30 Should hardware matter in OpenStack? It has to, details and topology matters not vendor.
  • 11:20 OpenCompute – people are trying to open source hardware design
  • 11:50 When you are dealing with hardware, it matters. You have to get it right.
  • 12:40 Customers are hardware heterogeneous by design (and for ops tooling). Crowbar is neutral territory
  • 14:50 It’s not worth telling people they are wrong, because they are not. There are a lot of right ways to install OpenStack
  • 16:10 Sometimes people make expensive choices because it’s what they are comfortable with and it’s not helpful for me to them they a wrong – they are not.
  • 16:30 You get into a weird corner if you don’t tell anyone no. And an equally weird corner if you tell everyone yes.
  • 18:00 Aspirations of having an interoperable cloud was much harder than the actual work to build it
  • 18:30 Community want to say yes, “bring your code” but to operators that’s very frustrating because they want to be able to make substitutions
  • 19:30 Thinking that if something is included then it’s required – that’s not clear
  • 19:50 Interlock Dilemma [see my back reference]
  • 20:10 Orwell Animal Farm reference – “all animals equal but pigs are more equal”
  • 22:20 Rob defines DefCore, it’s not big and scary
  • 22:35 DefCore is about commercial use, not running the technical project
  • 23:35 OpenStack had to make money for the companies are paying for the developers who participate… they need to see ROI
  • 24:00 OpenStack is an infrastructure project, stability is the #1 feature
  • 24:40 You have to give a reason why you are saying no and a path to yes
  • 25:00 DefCore is test driven: quantitative results
  • 26:15 Balance between whole project and parts – examples are Swiftstack (wants Object only) and Dreamhost (wants Compute only)
  • 27:00 DefCore created core components vs platform levels
  • 27:30 No vendor has said they can implement DefCore without some effort
  • 28:10 We have outlets for vendors who do not want to implement the process
  • 28:30 The Board is not in a position to make technical call about what’s in, we had to build a process for community input
  • 29:10 We had to define something that could say, “this is it and we have to move on”
  • 29:50 What we want is for people to start with the core and then bring in the other projects. We want to know what people are adding so we can make that core in time
  • 30:10 This is not a recommendation is a base.
  • 30:35 OpenStack is a bubble – does not help if we just get together to pad each other on the back, we want to have a thriving ecosystem
  • 31:15 Question: “have vendors been selfish”
  • 31:35 Rob rephrased as “does OpenStack have a tragedy of the commons” problem
  • 32:30 We need to make sure that everyone is contributing back upstream
  • 32:50 Benefit of a Benevolent Dictator is that they can block features unless community needs are met
  • 33:10 We have NOT made it clear where companies should be contributing to the community. We are not doing a good job directing community efforts
  • 33:45 Hidden Influencers becomes OpenStack Product group
  • 34:55 Hidden Influencers were not connecting at the summit in a public way (like developers were)
  • 35:20 Developers could not really make big commitments of their time without the buy in from their managers (product and line)
  • 35:50 Subtle selfishness – focusing on your own features can disrupt the whole release where things would flow better if they helped others
  • 37:40 Rob was concerned that there was a lot of drift between developers and company’s product descriptions
  • 38:20 BYLAWS CHANGES – vote! here’s why we need to change
  • 38:50 Having whole projects designated as core sucks – code in core should be slower and less changing. Innovation at the core will break interoperability
  • 39:40 Hoping that core will help product managers understand where they are using the standard and adding values
  • 41:10 All babies are ugly > with core, that’s good. We are looking for the grown ups who can do work and deliver value. Babies are things you nurture and help grow because they have potential.
  • 42:00 We undermine our credibility in the community when we talk about projects that are babies as if they were ready.
  • 43:15 DefCore’s job was to help pick projects. If everyone is core then we look like a youth soccer team where everyone is getting a trophy
  • 44:30 Question: “What do you tell to users to instill confidence in OpenStack”
  • 44:50 first thing: focus on operations and automation. Table stakes (for any cloud) is getting your deployments automated. Puppies vs Cattle.
  • 45:25 People who were successful with early OpenStack were using automated deployments against the APIs.
  • 46:00 DevOps is a fundamental part of cloud computing – if you’re hand-built and not automated then you are old school IT.
  • 46:40 Niki references Gartner “Bimodal IT” [excellent reference, go read it!]
  • 47:20 VMWare is a great crutch for OpenStack. We can use VMWare for the puppies.
  • 47:45 OpenStack is not going to run on every servers (perhaps that’s heresy) but it does not make sense in every workload
  • 48:15 One size does not fit all – we need to be good at what we’re good at
  • 48:30 OpenStack needs to focus on doing something really well. That means helping people who want to bring automated workloads into the cloud
  • 49:20 Core was about sending a signal about what’s ready and people can rely on
  • 49:45 Back in 2011, I was saying OpenStack was ready for people who would make the operational investment
  • 50:30 We use Crowbar because it makes it easier to do automated deployments for infrastructure like Hadoop and Ceph where you want access to the physical media
  • 51:00 We should be encouraging people to use OpenStack for its use cases
  • 51:30 Existential question for OpenStack: are we a suite or product. The community is split here
  • 51:30 In comparing with Amazon, does OpenStack have to implement it or build an ecosystem to compete
  • 53:00 As soon as you make something THE OpenStack project (like Heat) you are sending a message that the alternates are not welcome
  • 54:30 OpenStack ends up in a trap if we pick a single project and make it the way that we are going do something. New implementations are going to surface from WITHIN the projects and we need to ready for that.
  • 55:15 new implementations are coming, we have to be ready for that. We can make ourselves vulnerable to splitting if we do not prepare.
  • 56:00 API vs Implementation? This is something that splits the community. Ultimately we to be an API spec but we are not ready for that. We have a lot of work to do first using the same code base.
  • 56:50 DefCore has taken a balanced approach using our diversity as a strength
  • 57:20 Bylaws did not allow for enough flexibility for what is core
  • 59:00 We need voters for the quorum!
  • 59:30 Rob recommended Rocky Grober (Huawei) and Shamail Tahir (EMC) for future shows

unBIOSed? Is Redfish an IPMI retread or can vendors find unification?

Server management interfaces stink.  They are inconsistent both between vendors and within their own product suites.  Ideally, Vendors would agree on a single API; however, it’s not clear if the diversity is a product of competition or actual platform variation.  Likely, it’s both.

From RedFish SiteWhat is Redfish?  It’s a REST API for server configuration that aims to replace both IPMI and vendor specific server interfaces (like WSMAN).  Here’s the official text from RedfishSpecification.org.

Redfish is a modern intelligent [server] manageability interface and lightweight data model specification that is scalable, discoverable and extensible.  Redfish is suitable for a multitude of end-users, from the datacenter operator to an enterprise management console.

I think that it’s great to see vendors trying to get on the same page and I’m optimistic that we could get something better than IPMI (that’s a very low bar).  However, I don’t expect that vendors can converge to a single API; it’s just not practical due to release times and pressures to expose special features.  I think the divergence in APIs is due both to competitive pressures and to real variance between platforms.

Even if we manage to a grand server management unification; the problem of interface heterogeneity has a long legacy tail.

In the best case reality, we’re going from N versions to N+1 (and likely N*2) versions because the legacy gear is still around for a long time.  Adding Redfish means API sprawl is going to get worse until it gets back to being about the same as it is now.

Putting pessimism aside, the sprawl problem is severe enough that it’s worth supporting Redfish on the hope that it makes things better.

That’s easy to say, but expensive to do.  If I was making hardware (I left Dell in Oct 2014), I’d consider it an expensive investment for an uncertain return.  Even so, several major hardware players are stepping forward to help standardize.  I think Redfish would have good ROI for smaller vendors looking to displace a major player can ride on the standard.

Redfish is GREAT NEWS for me since RackN/Crowbar provides hardware abstraction and heterogeneous interface support.  More API variation makes my work more valuable.

One final note: if Redfish improves hardware security in a real way then it could be a game changer; however, embedded firmware web servers can be tricky to secure and patch compared to larger application focused software stacks.  This is one area what I’m hoping to see a lot of vendor collaboration!  [note: this should be it’s own subject – the security issue is more than API, it’s about system wide configuration.  stay tuned!]