Mayflies and Dinosaurs (extending Puppies and Cattle)

Dont Be FragileJosh McKenty and I were discussing the common misconception of the “Puppies and Cattle” analogy. His position is not anti-puppy! He believes puppies are sometimes unavoidable and should be isolated into portable containers (VMs) so they can be shuffled around seamlessly. His more provocative point is that we want our underlying infrastructure to be cattle so it remains highly elastic and flexible. More cattle means a more resilient system. To me, this is a fundamental CloudOps design objective.

We realized that the perfect cloud infrastructure would structurally discourage the creation of puppies.

Imagine a cloud in which servers were automatically decommissioned after a week of use. In a sort of anti-SLA, any VM running for more than 168 hours would be (gracefully) terminated. This would force a constant churn of resources within the infrastructure that enables true cattle-like management. This cloud would be able to very gracefully rebalance load and handle disruptive management operations because the workloads are designed for the churn.

We called these servers mayflies due to their limited life span.

While this approach requires a high degree of automation, the most successful cloud operators I have met are effectively building workloads with this requirement. If we require application workloads to be elastic and fault-resilient then we have a much higher degree of flexibility with the underlying infrastructure. I’ve seen this in practice with several OpenStack clouds: operators with helped applications deploy using automation were able to decommission “old” clouds much more gracefully. They effectively turned their entire cloud into a cow. Sadly, the ones without that investment puppified™ the ops infrastructure and created a much more brittle environment.

The opposite of a mayfly is the dinosaur: a server that is so brittle and locked that the slightest disturbance wipes out everything it touches.

Dinosaurs are puppies grown into a T-Rex with rows of massive razor sharp teeth and tiny manicured hands. These are systems that are so unique and historical that there’s no way to recreate them if there’s a failure. The original maintainers exit happy hour was celebrated by people who were laid-off two CEOs ago. The impact of dinosaurs goes beyond their operational risk; they are typically impossible to extend or maintain and, consequently, ossify other server around them. This type of server drains elasticity from your ops team.

Puppies do not grow up to become dogs, they become dinosaurs.

It’s a classic lean adage to do hard things more frequently. Perhaps it’s time to start creating mayflies in your ops infrastructure.

OpenStack Havana provides foundation for XXaaS you need

Folsom SummitIt’s been a long time, and a lot of summits, since I posted how OpenStack was ready for workloads (back in Cactus!).  We’ve seen remarkable growth of both the platform technology and the community surrounding it.  So much growth that now we’re struggling to define “what is core” for the project and I’m proud be on the Foundation Board helping to lead that charge.

So what’s exciting in Havana?

There’s a lot I am excited about in the latest OpenStack release.

Complete Split of Compute / Storage / Network services

In the beginning, OpenStack IaaS was one service (Nova).  We’ve been breaking that monolith into distinct concerns (Compute, Network, Storage) for the last several releases and I think Havana is the first release where all of the three of the services are robust enough to take production workloads.

This is a major milestone for OpenStack because knowledge that the APIs were changing inhibited adoption.

ENABLING TECH INTEGRATION: Docker & Ceph

We’ve been hanging out with the Ceph and Docker teams, so you can expect to see some interesting.  These two are proof of the a fallacy that only OpenStack projects are critical to OpenStack because neither of these technologies are moving under the official OpenStack umbrella.  I am looking forward to seeing both have dramatic impacts in how cloud deployments.

Docker promises to make Linux Containers (LXC) more portable and easier to use.  This paravirtualization approach provides near bear metal performance without compromising VM portability.  More importantly, you can oversubscribe LXC much more than VMs.  This allows you to dramatically improve system utilization and unlocks some other interesting quality of service tricks.

Ceph is showing signs of becoming the scale out storage king.  Beyond its solid data dispersion algorithm, a key aspect of its mojo is that is delivers both block and object storage.  I’ve seen a lot of interest in consolidating both types of storage into a single service.  Ceph delivers on that plus performance and cost.  It’s a real winner.

Crowbar Integration & High Availability Configuration!

We’ve been making amazing strides in the Crowbar + OpenStack integration!  As usual, we’re planning our zero day community build (on the “Roxy” branch) to get people started thinking about operationalizing OpenStack.   This is going to be especially interesting because we’re introducing it first on Crowbar 1 with plans to quickly migrate to Crowbar 2 where we can leverage the attribute injection pattern that OpenStack cookbooks also use.  Ultimately, we expect those efforts to converge.  The fact that Dell is putting reference implementations of HA deployment best practices into the open community is a major win for OpenStack.

Tests, Tests, Tests & Continuous Delivery

OpenStack continue to drive higher standards for reviews, integration and testing.  I’m especially excited to the volume and activity around our review system (although backlogs in reviews are challenges).  In addition, the community continues to invest in the test suites like the Tempest project.  These are direct benefits to operators beyond simple code quality.  Our team uses Tempest to baseline field deployments.  This means that OpenStack test suites help validate live deployments, not just lab configurations.

We achieve a greater level of quality when we gate code check-ins on tests that matter to real deployments.   In fact, that premise is the basis for our “what is core” process.  It also means that more operators can choose to deploy OpenStack continuously from trunk (which I consider to be a best practice scale ops).

Where did we fall short?

With growth comes challenges, Havana is most complex release yet.  The number of projects that are part the OpenStack integrated release family continues to expand.  While these new projects show the powerful innovation engine at work with OpenStack, they also make the project larger and more difficult to comprehend (especially for n00bs).  We continue to invest in Crowbar as a way to serve the community by making OpenStack more accessible and providing open best practices.

We are still struggling to resolve questions about interoperability (defining core should help) and portability.  We spent a lot of time at the last two summits on interoperability, but I don’t feel like we are much closer than before.  Hopefully, progress on Core will break the log jam.

Looking ahead to Ice House?

I and many leaders from Dell will be at the Ice House Summit in Hong Kong listening and learning.

The top of my list is the family of XXaaS services (Database aaS, Load Balanacer aaS, Firewall aaS, etc) that have appeared.  I’m a firm believer that clouds are more than compute+network+storage.  With a stable core, OpenStack is ready to expand into essential platform services.

If you are at the summit, please join Dell (my employer) and Intel for the OpenStack Summit Welcome Reception (RSVP!) kickoff networking and social event on Tuesday November 5, 2013 from 6:30 – 8:30pm at the SkyBistro in the SkyCity Marriott.   My teammate, Kamesh Pemmaraju, has a complete list of all Dell the panels and events.

Collaboration between Dell Crowbar & VMware Cloud Foundry – unleashes your inner cloud

Sometimes a single sprint can deliver magic: when I signed up to document how to create a Crowbar module (aka a barclamp) two weeks ago, I had no idea that it would add a new flavor to Crowbar .

I’m proud to announce that the first public non-Dell Crowbar module will be supporting the VMware Cloud Foundry Open PaaS project.

Development is still in progress (on the Crowbar “CF” branch) and you’ll be able to watch us (even help!) collaborate on this project.  Initially, the deployment will be to single server but we’re hoping to quickly expand to a distributed install that fully leverages the capabilities of both projects.

By creating a Crowbar module, Cloud Foundry™ is able to leverage the cloud deployment capabilities that allow it to be setup on any physical or virtualized data center.  This is core to the Crowbar message: the value of a cloud solution can best be realized when it’s coupled with open practices for deploying it.

There are many significant aspects of this collaboration:

  1. Cloud Foundry is taking the right approach to PaaS.  Their team’s perspective on PaaS mirrors my own: A PaaS is a collection of application services.  That approach makes it extensible and flexible.  Plus, they are also multi-language and multi-platform.
  2. Crowbar is proving our breadth of support.  Last week we announced coming RHEL support and now adding Cloud Foundry is a natural extension.  We did not design Crowbar to be a one-trick pony.  It’s modular design makes it easy to extend while leveraging the existing body of work.
  3. Big companies are acting like start-ups.  Both Crowbar and Cloud Foundry are projects that focus on putting the core functionality out quickly to prove their value proposition, get feedback, and change the game.  This collaboration is positive proof of these companies being Agile and starting a project Lean.
  4. Big companies are acting in the open.  Both Dell via Crowbar and VMware via Cloud Foundry are contributing their source and working on it in the open.

Stay tuned for that “how to create a barclamp” post (or check out the barclamp rake task).

For more information:

PaaS Simplified: an application architecture that responds to load

handoff

In addition to attending the great sessions at the OpenStack Design Conference, our Dell team realized that we’ve been making Platform as a Service (PaaS) much more complex.  Stripping away the detritus is important because it looks like “What is a PaaS” is changing on a daily basis so boiling it down to the must fundamental is essential.

At its core, a PaaS is an application that changes its architecture based on the load.   That’s it no further definition is required.

I’ve been playing with this definition since April and am finding that it’s a much more productive definition of PaaS than any that I’ve used so far.  The reason is that it’s

  1. application focused,
  2. not language or services bound and
  3. captures the business use cases

Of course, I’m going to have to provide more backup in future posts.  I want to invite discussion about this perspective on PaaS.  I’m especially interesting in seeing how recent offerings from VMware (OpenPaaS/CloudFoundry) or Amazon (Elastic Beanstalk) measure against this concept.