“Stack Shop” cover of Macklemore’s Thrift Shop

Sometimes a meme glitters too strongly for me to resist getting pulled in… that happened to great effect that just before the OpenStack Havana summit. When my code-addled mind kept swapping “poppin’ tags” for “OpenStack” on the radio edit, I stopped fighting and rewrote the Thrift Shop lyrics for OpenStack (see below the split).

With a lot of help from summit attendees (many of them are OpenStack celebrities, CEOs, VPs and members OpenStack Foundation board), I was able to create a freaking awesome cover of Macklemore’s second hand confection (NSFW).

Frankly, I don’t know everyone in the video (what, what?)!

But here’s a list of those that I do know.  I’m happy to update so the victims actors get credit.  Singers (in order):

Rob Hirschfeld (me) & Monty Taylor, Peter Poulliot, Judd Maltin, Forrest Norrod, Josh Kleinpeter, Tristan Goode, Dan Bode, Jay Pipes, Prabhakar Gopalan, Peter Chadwick, Simon Andersen, Vish Ishaya, Wayne Walls, Alex Freedland, Niki Acosta, Ops Track Monday Session 1, Ben Cherian, Eric Windisch, Brandon Draeger, Joseph B George,  Mark Collier, Joseph Heck, Tim Bell,  Chris Kemp, Kyle McDonald & Joshua McKenty,

Continue reading

OpenStack’s next hurdle: Interoperability. Why should you care?

SXSW life size Newton's Cradle

SXSW life size Newton’s Cradle

The OpenStack Board spent several hours (yes, hours) discussing interoperability related topics at the last board meeting.  Fundamentally, the community benefits when uses can operate easily across multiple OpenStack deployments (their own and/or public clouds).

Cloud interoperability: the ability to transfer workloads between systems without changes to the deployment operations management infrastructure.

This is NOT hybrid (which I defined as a workload transparently operating in multiple systems); however it is a prereq to achieve scalable hybrid operation.

Interoperability matters because the OpenStack value proposition is all about creating a common platform.  IT World does a good job laying out the problem (note, I work for Dell).  To create sites that can interoperate, we have to some serious lifting:

At the OpenStack Summit, there are multiple chances to engage on this.   I’m moderating a panel about Interop and also sharing a session about the highly related topic of Reference Architectures with Monty Tayor.

The Interop Panel (topic description here) is Tuesday @ 5:20pm.  If you join, you’ll get to see me try to stump our awesome panelists

  • Jonathan LaCour, DreamHost
  • Troy Toman, Rackspace
  • Bernard Golden,  Enstratius
  • Monty Taylor, OpenStack Board (and HP)
  • Peter Pouliot, Microsoft

PS: Oh, and I’m also talking about DevOps Upgrades Patterns during the very first session (see a preview).

DevOps approaches to upgrade: Cube Visualization

I’m working on my OpenStack summit talk about DevOps upgrade patterns and got to a point where there are three major vectors to consider:

  1. Step Size (shown as X axis): do we make upgrades in small frequent steps or queue up changes into larger bundles? Larger steps mean that there are more changes to be accommodated simultaneously.
  2. Change Leader (shown as Y axis): do we upgrade the server or the client first? Regardless of the choice, the followers should be able to handle multiple protocol versions if we are going to have any hope of a reasonable upgrade.
  3. Safeness (shown as Z axis): do the changes preserve the data and productivity of the entity being upgraded? It is simpler to assume to we simply add new components and remove old components; this approach carries significant risks or redundancy requirements.

I’m strongly biased towards continuous deployment because I think it reduces risk and increases agility; however, I laying out all the vertices of the upgrade cube help to visualize where the costs and risks are being added into the traditional upgrade models.

Breaking down each vertex:

  1. Continuous Deploy – core infrastructure is updated on a regular (usually daily or faster) basis
  2. Protocol Driven – like changing to HTML5, the clients are tolerant to multiple protocols and changes take a long time to roll out
  3. Staged Upgrade – tightly coordinate migration between major versions over a short period of time in which all of the components in the system step from one version to the next together.
  4. Rolling Upgrade – system operates a small band of versions simultaneously where the components with the oldest versions are in process of being removed and their capacity replaced with new nodes using the latest versions.
  5. Parallel Operation – two server systems operate and clients choose when to migrate to the latest version.
  6. Protocol Stepping – rollout of clients that support multiple versions and then upgrade the server infrastructure only after all clients have achieved can support both versions.
  7. Forced Client Migration – change the server infrastructure and then force the clients to upgrade before they can reconnect.
  8. Big Bang – you have to shut down all components of the system to upgrade it

This type of visualization helps me identify costs and options. It’s not likely to get much time in the final presentation so I’m hoping to hear in advance if it resonates with others.

PS: like this visualization? check out my “magic 8 cube” for cloud hosting options.

What foo is “contribution” to open source? Mik Kersten & Tasktop @ SXSW

Nested

How do we really know who influences most in a software project?  We can easily track code commits, but there are more bits to the project than the commits.

I had the good fortune to attend Mik Kersten’s Code Graph presentation at SXSW last week. Mik started the Eclipse Mylyn project and went on to found Tasktop. Both are built on the very intriguing concepts that software development production (aka work) is organized around tasks.

His premise is that organizing around tasks provides a more manageable and actionable view of a project than a more traditional application life-cycle management (ALM) approaches.  I’m a sucker for any presentation about lean development process that includes references to both DevOps and industrial engineering (I have an MS in IE), but Mik surprised me by taking his code graph concept to a whole ‘nutha level.

The software value chain is much deeper than just the people who write code. Mik’s approach included managers, testers and operators in the interaction graphs for his projects.

By including all of the ALM artifacts in the analysis, you get a much richer picture of the influencers for a project.

For example, the development manager may never show up as a code committer; however, they are hugely influential in which work gets prioritized. If your graph includes who is touching the work assignments and stories then the manager’s influence jumps out immediately. That knowledge would completely change how and who you may interact with a team. It effectively brings a shadow contributor into the light.

The same is true for QA members who are running tests and opening defects and operators who are building deployment scripts. Ideally, it should include users who exercise different parts of the applications capabilities.

Mik’s graphs clearly showed the influence impact of managers because they touched all of the story cards for the project.  The people who own the story cards are the most potent influencers in a project, yet they are invisible in code repositories!

I would love to see an impact graph for a software project that equally reflected the wide range of contributions that people make to its life-cycle.  This type of information helps rebalance the power in a project.

Industrial engineering legend W.E. Demming‘s advice is to look at production as a system.  Finding ways to show everyone’s contributions is an important step towards bringing lean processes fully into software manufacturing.

5 things keeping DevOps from playing well with others (Chef, Crowbar and Upstream Patterns)

Sharing can be hardSince my earliest days on the OpenStack project, I’ve wanted to break the cycle on black box operations with open ops. With the rise of community driven DevOps platforms like Opscode Chef and Puppetlabs, we’ve reached a point where it’s both practical and imperative to share operational practices in the form of code and tooling.

Being open and collaborating are not the same thing.

It’s a huge win that we can compare OpenStack cookbooks. The real victory comes when multiple deployments use the same trunk instead of forking.

This has been an objective I’ve helped drive for OpenStack (with Matt Ray) and it has been the Crowbar objective from the start and is the keystone of our Crowbar 2 work.

This has proven to be a formidable challenge for several reasons:

  1. diverging DevOps patterns that can be used between private, public, large, small, and other deployments -> solution: attribute injection pattern is promising
  2. tooling gaps prevent operators from leveraging shared deployments -> solution: this is part of Crowbar’s mission
  3. under investing in community supporting features because they are seen as taking away from getting into production -> solution: need leadership and others with join
  4. drift between target versions creates the need for forking even if the cookbooks are fundamentally the same -> solution: pull from source approaches help create distro independent baselines
  5. missing reference architectures interfere with having a stable baseline to deploy against -> solution: agree to a standard, machine consumable RA format like OpenStack Heat.

Unfortunately, these five challenges are tightly coupled and we have to progress on them simultaneously. The tooling and community requires patterns and RAs.

The good news is that we are making real progress.

Judd Maltin (@newgoliath), a Crowbar team member, has documented the emerging Attribute Injection practice that Crowbar has been leading. That practice has been refined in the open by ATT and Rackspace. It is forming the foundation of the OpenStack cookbooks.

Understanding, discussing and supporting that pattern is an important step toward accelerating open operations. Please engage with us as we make the investments for open operations and help us implement the pattern.

Installing SSD + Windows8 = Blank Primary Monitor (fixable!)

2012-10-28_12-44-51_691I could not find the solution to this easily, so I’m leaving a breadcrumb trail here… I did not keep the links so I cannot give proper attribution but will try to pay it forward.

Short version: Try HDMI/VGA output if your Win8 primary monitor is blank.  Then update the BIOS.

Long version:

I decided to update my wife’s Dell Inspiron N5110 laptop to an SSD and Windows 8.  Sadly, the machine’s factory config had a very slow HDD and that was impacting the system’s total performance.  Replacing the HDD with an SSD required major surgery to the laptop – it is not for the faint of heart.

After installing the SSD and installing Windows 8 (painless!) the system booted though the splash screen and turned off the display.  Yes, it simply went completely blank.

I stumbled upon a tip that suggested that the system was working but using the HDMI output.  That proved correct.  I was able to complete the configuration using HDMI and/or VGA monitors.

Even after completing and updating the monitor (still blank) was clearly working because the BIOS screens and splash screen worked on the monitor.  Deleting the Video Card from Devices did NOT work.

Ultimately, I found a site that recommended updating the BIOS (was A09, now A11 from 11/2012).  The BIOS update corrected the problem.

I should have known to update the BIOS and firmware before starting the upgrade.  I hope you learn from my experience.

Oh…. the SSD+Win8 made an AMAZING performance difference.  It’s like a brand new 10x faster laptop and an excellent investment.  I’ve become a bit of a Linux appologist; however, I was pleasantly surprised to find Windows 8 to be very usable once I learned the latest hot-key assignments (Search on Win key -> Win+F).

Behavior Driven Development (BDD) and Crowbar

Test Test TestI’m a huge advocate of both behavior and test driven development (BDD & TDD). For the Crowbar 2 refactor, I’ve insisted (with community support) that new code has test coverage to the largest extent possible. While this inflicted some technical debt, it dramatically reduces the risk and effort for new developers to contribute.

For open source projects, they are even more important because they allow the community to contribute to the project with confidence.

A core part of this effort has been the Erlang BDD (“bravo delta”) tool that I had started before my team began Crowbar (code link).

I’m a big fan of BDD domain specific languages (DSL) because I think that they are descriptive. Ideally, everyone on the team including marketing and documentation authors should be able to understand and contribute to these tests.

I’ve been training our QA team on how the BDD system works and they are surprised at the clarity of the DSL. By reading the DSL for a feature, then can figure out what the developer had in mind for the system to do. Even better, they can see which use-cases the developer is already testing. Yet the real excitement comes from the potential to collaborate on the feature definitions before the code is written. My blue-sky-with-rainbows hope is that developers and testers will sit down together and review the BDD feature descriptions before code is written (perhaps even during planning). Even short of that nirvana, the BDD feature descriptions provide something that everyone can review and discuss where code (even with verbose documentation) falls short.

Ok, so you already know the benefits of BDD. Why didn’t I do this in the Cucumber? It’s the leading tool and a logical fit for a Rails project like Crowbar. Frankly, I have a love-hate relationship with Cucumber.

  1. It’s slow. And that does not scale for testing. I’m of the belief that slows tests destroy developer productivity because they encourage distractions.  Our BDD is fast and is not yet optimized.
  2. Too coupled to app framework – you can bypass the UI/API for testing if needed. If I’m doing behavior testing then I want to make sure that everything I test is accessible to the user too.
  3. While Cucumber has a lot of good “webrat” steps to validate basic web pages, I found that these were very basic and I quickly had to write my own.
  4. Erlang pattern matching made it much easier to define steps in a logical way with much less RegEx than Cucumber
  5. Erlang is designed to let us parallelize the tests.
  6. I like programming in Erlang (and I had started BDD before I started Crowbar)

And it goes beyond just testing our code…

We ultimately want to use the BDD infrastructure to gate Crowbar deployments not just code check-ins. That means that when Crowbar orchestrates an application deployment, the BDD infrastructure will execute tests to verify that the deployment is exhibiting the expected behaviors. If the installation does not pass then Crowbar would roll-back or hold the deployment.

This objective is not new or unique – it’s modus operandi at advanced companies who practice continuous deployment. Our position is that this should be an integral part of the orchestration framework.

One side benefit of the BDD system as designed is that it is also a simulator. We are able to take the same core infrastructure and turn it into a load generator and database populator. Unlike more coupled tools, you can run these from anywhere.

Post Script: Here’s the topic that I’m submitting for presentation at OSCON

Continue reading

Don’t complicate my cloud! It’s just infrastructure with an API

Getty MazeI’ve been “in cloud” for over 13 years (@dmcrory and I submitted patents using it starting in 2001) and I’m continually amazed at how complicated people want to make it.

For my role at Dell, I’m continually invited to seasons of meetings to define cloud, cloud architecture and cloud strategy. The reason these meetings go on and on is that everyone wants to make cloud complicated when it’s really very simple.

Cloud is infrastructure with an API.

That’s it. Everything else is just a consequence of having infrastructure with an API because API provides the ability to provide remote control.

What else do people try to lump into cloud?  Here are some of my topic cloud obfuscators:

  • (inter)network.  Yes, networks make an API interesting.  They are just an essential component but they are not cloud.  Most technologies are interesting because of networks: can we stop turning everything networked into cloud?  Thanks to nonsensical mega-dollar marketing campaigns, I despair this is a moot point.
  • as-a-service.  That’s another way of saying “accessible via an API.”  We have many flavors of Platform, Data, Application, Love or whatever as a Service.  That means they have a API.  Infrastructure as a Service (IaaS) is a cloud.
  • virtualization.  VMs were the first good example of hardware with an API; however, we had virtual containers (on Mainframes!) long before we had “cloud.”    They make cloud easier but they are not cloud.
  • pay-as-you-go (service pricing).  This is a common cloud requirement but it’s really a business model.  If someone builds a private cloud then it is still a cloud.

  • multi-tenant.  Another common requirement where we expect a cloud to be able to isolate users.  I agree that this is a highly desirable attribute of a good API implementation; however, it’s not essential to a cloud.  For example, most public clouds do not have true network isolation model.
  • elastic demand.  IMHO, another word for API driven provisioning.
  • live migration.  This is a cool feature often implemented on top of virtualization, but it’s not cloud.  We were doing live migrate with shared storage and clusters before anyone heard of cloud.   I don’t think this is cloud at all but someone out there does so I included it in the list.
  • security.  Totally important consideration and required for deployments large and small, but not presence/lack does not make something cloud.

We start talking about these points and then forget the whole API thing.  These items are important, but they do not make it “a cloud.”  When Dave McCrory and I first discussed API Infrastructure as “cloud,” it was driven by the fact that you could hide the actual infrastructure behind the API.  The critical concept was that the API allowed a you to manage a server anywhere from anywhere.

When Amazon offered the first EC2 service, it had to be a cloud because the servers were remote. It was not a cloud because it was on the internet; plenty of other companies were offering hosted servers. It was a cloud because their offering allowed required operators to use and API to interact with the infrastructure.  I remember that EC2’s lack of UI (and SLA) causing many to predict it would be a failure; instead, it sparked a revolution.

I’m excited now because we’re entering a new generation of cloud where Infrastructure APIs include networking and storage in addition to compute.  Mix in some of the interesting data and network services and we’re going to have truly dynamic and powerful clouds.  More importantly, we going to have some truly amazing applications.

What do you think?  Is API a sufficient definition of cloud in your opinion?

PS: Yes, if you have a physical server/network/store that is completely controllable by an API then you’ve got a cloud on your hands.

double Block Head with OpenStack+Equallogic & Crowbar+Ceph

Block Head

Whew….Yesterday, Dell announced TWO OpenStack block storage capabilities (Equallogic & Ceph) for our OpenStack Essex Solution (I’m on the Dell OpenStack/Crowbar team) and community edition.  The addition of block storage effectively fills the “persistent storage” gap in the solution.  I’m quadrupally excited because we now have:

  1. both open source (Ceph) and enterprise (Equallogic) choices
  2. both Nova drivers’ code is in the open at part of our open source Crowbar work

Frankly, I’ve been having trouble sitting on the news until Dell World because both features have been available in Github before the announcement (EQLX and Ceph-Barclamp).  Such is the emerging intersection of corporate marketing and open source.

As you may expect, we are delivering them through Crowbar; however, we’ve already had customers pickup the EQLX code and apply it without Crowbar.

The Equallogic+Nova Connector

block-eqlx

If you are using Crowbar 1.5 (Essex 2) then you already have the code!  Of course, you still need to have the admin information for your SAN – we did not automate the configuration of the storage system, but the Nova Volume integration.

We have it under a split test so you need to do the following to enable the configuration options:

  1. Install OpenStack as normal
  2. Create the Nova proposal
  3. Enter “Raw” Attribute Mode
  4. Change the “volume_type” to “eqlx”
  5. Save
  6. The Equallogic options should be available in the custom attribute editor!  (of course, you can edit in raw mode too)

Want Docs?  Got them!  Check out these > EQLX Driver Install Addendum

Usage note: the integration uses SSH sessions.  It has been performance tested but not been tested at scale.

The Ceph+Nova Connector

block-ceph

The Ceph capability includes a Ceph barclamp!  That means that all the work to setup and configure Ceph is done automatically done by Crowbar.  Even better, their Nova barclamp (Ceph provides it from their site) will automatically find the Ceph proposal and link the components together!

Ceph has provided excellent directions and videos to support this install.

The Atlantic magazine explains why Lean process rocks (and saves companies $$)

GearsI’m certain that the Atlantic‘s Charles Fishman was not thinking software and DevOps when he wrote the excellent article about “The Insourcing Boom.”  However, I strongly recommend reading this report for anyone who is interested in a practical example of the inefficiencies of software lean process (If you are impatient, jump to page 2 and search for toaster).

It’s important to realize that this article is not about software! It’s an article about industrial manufacturing and the impact that lean process has when you are making stuff.  It’s about how US companies are using Lean to make domestic plants more profitable than Asian ones.  It turns out that how you make something really matters – you can’t really optimize the system if you treat major parts like a black box.

When I talk about Agile and Lean, I am talking about proven processes being applied broadly to companies that want to make profit selling stuff. That’s what this article is about

If you are making software then you are making stuff! Your install and deploy process is your assembly line. Your unreleased code is your inventory.

This article does a good job explaining the benefits of being close to your manufacturing (DevOps) and being flexible in deployment (Agile) and being connected to customers (Lean).  The software industry often acts like it’s inventing everything from scratch. When it comes to manufacturing processes, we can learn a lot from industry.

Unlike software, industry has real costs for scrap and lost inventory. Instead of thinking “old school” perhaps we should be thinking of it as the school of hard knocks.