About Rob H

A Baltimore transplant to Austin, Rob drives an electric car, the RAVolt, and thinks about ways of building software for the clouds using Agile processes. He works for Dell and works software to enable hyperscale clouds.

Can’t Contain(erize) the Hype – is Docker real or a bubble?

The new application portability darling, Docker, was so popular at this week’s Red Hat Summit that I was expecting Miley Cyrus’ flock of paparazzi to abandon in her favor of Ben Golub.

Personally, I find Docker to be a useful tool and we’ve been embedding it into our dev and test processes in useful ways for DefCore TCUP (at Conference), OpenCrowbar Admin and Dev Nodes.  To me, these a concrete and clear use cases.

There are clearly a lot more great use cases for Docker but I can’t help but feel like it’s being thrown into architectural layer cakes and markitectures as a substitute for the non-worlds “cloud”, “amazing” and “revolutionary.”

How do I distinguish hot from hype?  I look for places where Docker is solving just one problem set instead being a magic wand solution to a raft of systemic issues.

Places where I think Docker is potent and disruptive

  • Creating a portable and consistent environment for dev, test and delivery
  • Helping Linux distros keep updating the kernel without breaking user space (RHEL 7 anyone?)
  • Reducing the virtualization overhead of tenant isolation (containers are lighter)
  • Reducing the virtualization overhead for DevOps developers testing multi-node deployments

But I’m concerned that we’re expecting too many silver bullets

  • Packaging is still tricky:  Creating a locked box helps solve part of downstream problem (you know what you have) but not the upstream problem (you don’t know what you depend on).
  • Container sprawl: Breaking deployments into more functional discrete parts is smart, but that means we have MORE PARTS to manage.   There’s an inflection point between separation of concerns and sprawl.
  • PaaS Adoption: Docker helps with PaaS but it does not solve neither the “you have to model your apps for a PaaS” nor the “PaaS needs scalable data services” problems

Speaking of Miley Cyrus, it’s not the container that matters, but what’s on the inside.  Docker can take a lesson from Miley: attention is great but you’ve still got to be able to sing.    I’m not sure about Miley, but I am digging the tracks that Docker is laying down.  Docker is worth putting on your play list.

Rocking Docker – OpenCrowbar builds solid foundation & life-cycle [VIDEOS]

Docker has been gathering a substantial about of interest as an additional way to solve application portability and dependency hell.  We’ve been enthusiastic participants in this fledgling community (Docker in OpenStack) and my work in DefCore’s Tempest in a Container (TCUP).

flying?  not flying!In OpenCrowbar, we’ve embedded Docker much deeper to solve a few difficult & critical problems: speeding up developing multi-node deployments and building the environment for the containers.  Check out my OpenCrowbar does Docker video or the community demo!

Bootstrapping Docker into a DevOps management framework turns out to be non-trivial because integrating new nodes into a functioning operating environment is very different on Docker than using physical servers or a VMs.  Containers don’t PXE boot and have more limited configuration options.

How did we do this?  Unlike other bare metal provisioning frameworks, we made sure that Crowbar did not require DHCP+PXE as the only node discovery process.  While we default to and fully support PXE with our sledgehammer discovery image, we also allow operators to pre-populate the Crowbar database using our API and make configuration adjustments before the node is discovered/created.

We even went a step farther and enabled the Crowbar dependency graph to take alternate routes (we call it the “provides” role).  This enhancement is essential for dealing with “alike but different” infrastructure like Docker.

The result is that you can request Docker nodes in OpenCrowbar (using the API only for now) and it will automatically create the containers and attach them into Crowbar management.  It’s important to stress that we are not adding existing containers to Crowbar by adding an agent; instead, Crowbar manages the container’s life-cycle and then then work inside the container.

Getting around the PXE cycle using containers as part of Crowbar substantially improves Ops development cycle time because we don’t have to wait for boot > discovery > reboot > install to create a clean environment.  Bringing fresh Docker containers into a dev system takes seconds instead,

The next step is equally powerful: Crowbar should be able to configure the Docker host environment on host nodes (not just the Admin node as we are now demonstrating).  Setting up the host can be very complex: you need to have the correct RAID, BIOS, Operating System and multi-NIC networking configuration.  All of these factors must be done with a system perspective that match your Ops environment.  Luckily, this is exactly Crowbar’s sweet spot!

Until we’ve got that pulled together, OpenCrowbar’s ability to use upstream cookbooks and this latest Dev/Test focused step provides remarkable out of the gate advantages for everyone build multi-node DevOps tools.

Enjoy!

PS: It’s worth noting that we’ve already been using Docker to run & develop the Crowbar Admin server.  This extra steps makes Crowbar even more Dockeriffic.

Running with scissors > DefCore “must-pass” Road Show Starts [VIDEOS]

The OpenStack DefCore committee has been very active during this cycle turning the core definition principles into an actual list of “must-pass” capabilities (working page).  This in turn gives the community something tangible enough to review and evaluate.

Capabilities SelectionTL;DR!  We appreciate those in the community who have been patient enough to help define and learn the process we’re using the make selections; however, we also recognize that most people want to jump to the results.

This week, we started a “DefCore roadshow” with the goal of learning how to make this huge body of capabilities, process and impact easier to digest (draft write-up for review & Troy Toman’s notes).  So far we’ve had two great sessions on this topic.  We took notes and recorded at both meetups (San Francisco & Austin).

My takeaways of these initial meetups are:

  • Jump to the Capabilities right away, the process history is not needed up front
  • You need more graphics – specifically, one for the selection criteria (what do you think of my 1st attempt?)
  • Work from some examples of scored capabilities
  • Include some specific use-cases with a user, 2 types of private cloud and a public cloud to help show the impact

Overall, people like what they are hearing.  It makes sense and decisions are justified.

We need more feedback!  Please help us figure out how to explain this for the broader community.

OpenCrowbar Multi-OS deploy from Docker Admin

Last week I talked about OpenCrowbar reaching a critical milestone and this week I’ve posted two videos demonstrating how the new capabilities work.

annealingThe first video highlights the substantial improvements we’ve made testing and developing OpenCrowbar.  By using Docker containers, OpenCrowbar is fast and reliable to setup and test.  We’ve dramatically streamlined the development environment and consolidated the whole code base into logical groups with logical names.

The second video shows off the OpenCrowbar doing it’s deployment work (including setting up Docker nodes!).  This demonstration goes through the new node discovery and install process.  The new annealing process is very transparent and gives clear and immediate feedback about the entire discovery and provisioning process.  I also show how to configure networks (IPv4 and IPv6) and choose which operating system gets installed.

Note: In the videos, I demonstrate using our Docker install process.  Part of moving from Crowbar v2 (in the original Crowbar repo) to OpenCrowbar was so that we could also organize the code for an RPM install.  In either install process, OpenCrowbar no longer uses bloated ISOs with all components pre-cached so you must be connected to the Internet to complete the installation.

Mayflies and Dinosaurs (extending Puppies and Cattle)

Dont Be FragileJosh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.

OpenCrowbar reaches critical milestone – boot, discover and forge on!

OpenCrowbarWe started the Crowbar project because we needed to make OpenStack deployments to be fast, repeatable and sharable.  We wanted a tool that looked at deployments as a system and integrated with our customers’ operations environment.  Crowbar was born as an MVP and quickly grew into a more dynamic tool that could deploy OpenStack, Hadoop, Ceph and other applications, but most critically we recognized that our knowledge gaps where substantial and we wanted to collaborate with others on the learning.  The result of that learning was a rearchitecture effort that we started at OSCON in 2012.

After nearly two years, I’m proud to show off the framework that we’ve built: OpenCrowbar addresses the limitations of Crowbar 1.x and adds critical new capabilities.

So what’s in OpenCrowbar?  Pretty much what we targeted at the launch and we’ve added some wonderful surprises too:

  • Heterogeneous Operating Systems – chose which operating system you want to install on the target servers.
  • CMDB Flexibility – don’t be locked in to a devops toolset.  Attribute injection allows clean abstraction boundaries so you can use multiple tools (Chef and Puppet, playing together).
  • Ops Annealer –the orchestration at Crowbar’s heart combines the best of directed graphs with late binding and parallel execution.  We believe annealing is the key ingredient for repeatable and OpenOps shared code upgrades
  • Upstream Friendly – infrastructure as code works best as a community practice and Crowbar use upstream code without injecting “crowbarisms” that were previously required.  So you can share your learning with the broader DevOps community even if they don’t use Crowbar.
  • Node Discovery (or not) – Crowbar maintains the same proven discovery image based approach that we used before, but we’ve streamlined and expanded it.  You can use Crowbar’s API outside of the PXE discovery system to accommodate Docker containers, existing systems and VMs.
  • Hardware Configuration – Crowbar maintains the same optional hardware neutral approach to RAID and BIOS configuration.  Configuring hardware with repeatability is difficult and requires much iterative testing.  While our approach is open and generic, my team at Dell works hard to validate a on specific set of gear: it’s impossible to make statements beyond that test matrix.
  • Network Abstraction – Crowbar dramatically extended our DevOps network abstraction.  We’ve learned that a networking is the key to success for deployment and upgrade so we’ve made Crowbar networking flexible and concise.  Crowbar networking works with attribute injection so that you can avoid hardwiring networking into DevOps scripts.
  • Out of band control – when the Annealer hands off work, Crowbar gives the worker implementation flexibility to do it on the node (using SSH) or remotely (using an API).  Making agents optional means allows operators and developers make the best choices for the actions that they need to take.
  • Technical Debt Paydown – We’ve also updated the Crowbar infrastructure to use the latest libraries like Ruby 2, Rails 4, Chef 11.  Even more importantly, we’re dramatically simplified the code structure including in repo documentation and a Docker based developer environment that makes building a working Crowbar environment fast and repeatable.

Why change to OpenCrowbar?  This new generation of Crowbar is structurally different from Crowbar 1 and we’ve investing substantially in refactoring the tooling, paying down technical debt and cleanup up documentation.  Since Crowbar 1 is still being actively developed, splitting the repositories allow both versions to progress with less confusion.  The majority of the principles and deployment code is very similar, I think of Crowbar as a single community.

Interested?  Our new Docker Admin node is quick to setup and can boot and manage both virtual and physical nodes.

Anyone else find picking OpenStack summit sessions overwhelming?!

Choices!The scope and diversity of sessions for the upcoming OpenStack conference in Atlanta are simply overwhelming.   As a board member, that’s a positive sign of our success as a community; however, it’s also a challenge as we attempt to pick topics.  That’s why we turn to you, the OpenStack community, to help sift and select the content.

Even if you are not attending, we need your help in selection!  Content from the summits is archived and has a much larger outside of the two conference days.  You’re voice matters for the community.

While it’s a simple matter to ask you to vote for my DefCore presentation and some excellent ones from my peers at Dell, I’d also like share some of my thoughts about general trends I saw illustrated by the offerings:

  • Swift has a strong following as a solution outside of other products
  • Ceph seems to emerging as a critical component with Cinder
  • Neutron has breath but not depth in practice
  • HA and Upgrades remain challenges
  • We are starting to see specializations emerge (like NFV)
  • OpenStack case studies!  There are many – some of uncertain utility as references
  • Some community members and companies are super prolific in submitting sessions.  Perhaps these sessions are all great but on first pass it seems out of balance.
  • Vendor pitch or conference session?  You often get both in the same session.  We’re still not certain how to balance this.

The number and diversity of sessions is staggering – we need your help on voting.

We also need you to be part of the dialog about the conference and summits to make sure they are meeting the community needs.  My review of the sessions indicates that we are trying to serve many different audiences in a very limited time window.  I’m interested in hearing yours!  Review some sessions and let me know.

OpenStack Board Elections: What I’ll do in 2014: DefCore, Ops, & Community

Rob HirschfeldOpenStack Community,

The time has come for you to choose who will fill the eight community seats on the Board (ballot links went out Sunday evening CST).  I’ve had the privilege to serve you in that capacity for 16 months and would like to continue.  I have leadership role in Core Definition and want to continue that work.

Here are some of the reasons that I am a strong board member:

  • Proven & Active Leadership on Board - I have been very active and vocal representing the community on the Board.  In addition to my committed leadership in Core Definition, I have played important roles shaping the Gold Member grooming process and trying to adjust our election process.  I am an outspoken yet pragmatic voice for the community in board meetings.
  • Technical Leader but not on the TC – The Board needs members who are technical yet detached from the individual projects enough to represent outside and contrasting views.
  • Strong User Voice – As the senior OpenStack technologist at Dell, I have broad reach in Dell and RedHat partnership with exposure to a truly broad and deep part of the community.  This makes me highly accessible to a lot of people both in and entering the community.
  • Operations Leadership – Dell was an early leader in OpenStack Operations (via OpenCrowbar) and continues to advocate strongly for key readiness activities like upgrade and high availability.  In addition, I’ve led the effort to converge advanced cookbooks from the OpenCrowbar project into the OpenStack StackForge upstreams.  This is not a trivial effort but the right investment to make for our community.
  • And there’s more… you can read about my previous Board history in my 2012 and 2013 “why vote for me” posts or my general OpenStack comments.

And now a plea to vote for other candidates too!

I had hoped that we could change the election process to limit blind corporate affinity voting; however, the board was not able to make this change without a more complex set of bylaws changes.  Based on the diversity and size of OpenStack community, I hope that this issue may no longer be a concern.  Even so, I strongly believe that the best outcome for the OpenStack Board is to have voters look beyond corporate affiliation and consider a range of factors including business vs. technical balance, open source experience, community exposure, and ability to dedicate time to OpenStack.

How are we picking the OpenStack DefCore “must pass” tests?

Fancy ElephantThis post comes with a WARNING LABEL… THE FOLLOWING SELECTION CRITERIA ARE PRELIMINARY TO GET FEEDBACK AND HELP VALIDATE THE PROCESS.
As part of the DefCore work, we have the challenge of taking all the Tempest tests and figuring out which ones are the “must-pass” tests that will define core (our note pages).  We want to have a very transparent and objective process for picking the tests so we need to have well defined criteria and a selection process.
Figuring out the process will be iterative.  The list below represents a working set of selection criteria that are applied to the tests.  The DefCore committee will determine relative weights for the criteria after the tests have been scored because it was clear in discussion that not all of these criteria should have equal weight.
Once a test passes the minimum criteria score and becomes “must-pass” the criteria score does not matter – the criteria are only used for selecting tests. As per the Core principles, passing all “must-pass” test will be required to be considered core.
So what are these 13 preliminary criteria (source)?
1. Test is required stable for >2 releases (because things leaving Core are bad)
  • the least number/amount of must pass tests as possible (due to above)
  • but noting that the number will increase over time
  • least amount of change from current requirements as possible (nova, swift 2 versions)
  • (Acknowledge that deprecation is punted for now, but can be executed by TC)
2. Where the code being tested has an designed area of alternate implementation (extension framework) as per the Core Principles, there should be parity in capability tested across extension implementations
  • Test is not configuration specific (test cannot meet criteria if it requires a specific configuration)
  • Test does not require an non-open extension to pass (only the OpenStack code)
3. Capability being tested is Service Discoverable (can be found in Keystone and via service introspection) – MONTY TO FIX WORDING around REST/DOCS, etc.
  • Nearly core or “compatible” clouds need to be introspected to see what’s missing
  • Not clear at this point if it’s project or capability level enforced.  Perhaps for Elephant it’s project but it should move to capability for later
4A, 4B & 4C. Candidates are widely used capabilities
  • 4A favor capabilities that are supported by multiple public cloud providers and private cloud products
    • Allow the committee to use expert judgement to promote capabilities that need to resolve the “chicken-and-egg”
    • Goals are both diversity and quantity of users
  • 4B. Should be included if supported by common tools (Ecosystem products includes)
  • 4C. Should be included if part of common libraries (Fog, Apache jclouds, etc)
5. Test capabilities that are required by other must-pass tests and/or depended on by many other capabilities
6. Should reflect future technical direction (from the project technical teams and the TC)
  • Deprecated capabilities would be excluded (or phased out)
  • This could potentially become a “stick” if used incorrectly because we could force capabilities
7. Should be well documented, particularly the expected behavior.
  • includes the technical references for others in the project as well as documentation for the users and or developers accessing the feature or functionality
8. A test that is a must-pass test should stay a must-pass test (makes must-pass tests sticky release per release)
9. A test for a Capability with must-pass tests is more likely to be considered must-pass
10 Capabilities is unique and cannot be build out of other must-pass capabiliies
  • Candidates favor capabilities that users cannot implement if given the presence of other capabilities
  • consider the pain to users if a cloud doesn’t have the capability – not so much pain if they can run it themselves
  • “Unique capabilities that cannot be build out of other must-pass capabilities should not be considered as strongly”
11. Tests do not require administrative right to execute
We expect these criteria to change based on implementation experience and community input; however, we felt that further discussion without implementation was getting diminishing returns.  It’s import to remember that all of the criteria are not equal, they will have relative weights to help drive tune the results.

OpenStack Core Definition (DefCore) Progress in 6 key areas

DefCore Elephant Cycle

I’m excited to report about the OpenStack Board progress on defining OpenStack core.  At the Hong Kong summit, Joshua McKenty and I were asked to chair a new standing committee, now known as DefCore, to define “OpenStack Core” based on the core principles that we determined over the last 6 months (aka “the spider”).

Joshua and I took on the challenge with gusto and I’m proud to say that we’ve already made significant progress against an aggressive timeline to have the pilot must-pass tests for Havana defined before the Juno Summit in April 2014.  It’s important to remember that we’re moving from a project based definition of core to test-driven capabilities because this best addresses our interoperability objectives.

In the 8 weeks since the summit, we’ve had six very productive meetings (etherpads for Prep, DefCore.1, DefCore.2, Criteria 1 and 2) with detailed notes and recorded content. Here’s my summary of our results so far:

  1. An Aggressive Timeline for having pilot Havana must-pass tests approved by the Juno summit in May 2014.  That drives the schedule backward toward a preliminary list in March.  Once we have a pilot list for Havana, we expect to have Ice House done +90 days and Juno done at the Paris summit.

  2. Test Selection Criteria a preliminary set of 14 criteria (needs a stand alone post) that will be used to quantitatively score the current 700+ tests.  We also agreed to use a max 100 point weighting system for the criteria.  The weights and score requirement iteratively once we have done a first scoring pass.  Our objective is to make must-pass test selection as objective and transparent as possible (post with details).

  3. Distinction between Capability & Test is important because we recognize that individual tests may validate multiple capabilities and individual capabilities may have multiple tests.  Our hope is to present the results in terms of capabilities not individual tests.

  4. Holding Off on Bylaws Changes needed to clarify how OpenStack manage core definition.  It was widely expected that the DefCore committee would have to make changes to the OpenStack bylaws; however, we believe we can proceed without rushing changes.  We have an active subcommittee preparing changes in advance of the next DefCore cycle.

  5. Program vs. Project Definition efforts are needed to help take pressure off requests to have “projects promoted to core status” and how the OpenStack trademark is used for projects.  We are trying to clarify OpenStack Programs (e.g.: OpenStack™ Compute) carry to the trademark while OpenStack Projects (e.g.: Nova and Glace) are members of those programs and do not carry the OpenStack trademark directly.  Consequently, we’d expect people to say “OpenStack Compute Project Nova” instead of “OpenStack Nova.”  This approach addresses several issues that impact DefCore Board activities around trademark, core and brand.

  6. RefStack Development and Use Cases provide the framework for community reporting of test results.  We consider this infrastructure critical to getting community input about must-pass tests and also sharing interoperability information.  This effort is just beginning and needs help from the community.

For all this progress, we are only starting!  We’ve cleared the blocks preventing implementation and that will expose a new set issues to discuss.  Look for us to start applying the criteria to tests in the next months.  That will quickly expose the strengths and weaknesses of our criteria set.  We’ve also got to make progress on Program vs. Project and start RefStack coding.

We want community participation!  Please let us know what you think.