Four OpenStack Trends from Summit: Practical, Friendly, Effective and Deployable

With the next OpenStack Austin meetup on Thursday (sponsored by Puppet), I felt like it was past time for me to post my thoughts and observations about the Spring 2012 OpenStack design conference.  This was my fifth OpenStack conference (my notes about Bexar, Cactus, Diablo & Essex).  Every conference has been unique, exciting, and bigger than the previous.

My interest lies in the trend lines of OpenStack.  For details about sessions, I recommend Stefano Maffulli‘s  excellent link aggregation post for the Summit.

1. Technology Trend: Practical with Potential.

OpenStack started with a BIG vision to become the common platform for cloud API and operations.  That vision is very much alive and on-track; however, our enthusiasm for what could be is tempered by the need to build a rock solid foundation.  The drive to stability over feature expansion has had a very positive impact.  I give a lot of credit for this effort to the leadership of the project technical leads (PTLs), Canonical‘s drive to include OpenStack in the 12.04 LTS and the Rackspace Cloud drive to deploy Essex.  My team at Dell has also been part of this trend by focusing so much effort on making OpenStack production deployable (via Crowbar).

Overall, I am seeing a broad-based drive to minimize disruption.

2. Culture Trend: Friendly but some tension.

Companies at both large and small ends of the spectrum are clearly jockeying for position.  I think the market is big enough for everyone; however, we are also bumping into each other.  Overall, we are putting aside these real and imagined differences to focus on enlarging the opportunity of having a true community cloud platform.  For example, the OpenStack Foundation investment formation has moneyed competitors jostling for position to partner together.

However, it’s not just about paying into the club; OpenStack’s history is clearly about execution.  Looking back to the original Austin Summit sponsors, we’ve clearly seen that intent and commitment are different.

3. Discussion Trend: Small Groups Effective

The depth & quality of discussions inside sessions was highly variable.  Generally, I saw that large group discussions stayed at a very high level.  The smaller sessions required deep knowledge of the code to participate and seemed more productive.  We continue to have a juggle between discussions that are conceptual or require detailed knowledge of the code.  If conceptual, it’s too far removed.  If code, it becomes inaccessible to many people.

This has happened at each Summit and I now accept that it is natural.  We are using vision sessions to ensure consensus and working sessions to coordinate deliverables for the release.

I cannot over emphasize importance of small groups and delivery driven execution interactions: I spent most of my time in small group discussions with partners aligning efforts.

4. Deployment Trend: Testing and Upstreams matter

Operations for deploying OpenStack is a substantial topic at the Summit.  I find that to be a significant benefit to the community because there are a large block of us who were vocal advocates for deployability at the very formation of the project.

From my perspective at Dell, we are proud to see that wide spread acknowledgement of our open source contribution, Crowbar, as the most prominent OpenStack deployer.   Our efforts at making OpenStack installable are recognized as a contribution; however, we’re also getting feedback that we need to streamline and simplify Crowbar.  We also surprised to hear that Crowbar is “opinionated.”   On reflection, I agree (and am proud) of this assessment because it matches best practice coding styles.  Since our opinions also drive our test matrix there is a significant value for our OpenStack deployment is that we spend a lot of time testing (automated and manual) our preferred install process.

There’s a push to reconcile the various Chef OpenStack cookbooks into a single upstream.  This seems like a very good idea because it will allow various parties to collaborate on open operations.  The community needs leadership from Opscode to make this happen.  It appears that Puppet Labs is interested in playing a similar role for Puppet modules but these are still emerging and have not had a chance to fragment.

No matter which path we take, the deployment scripts are only as good as their level of testing.   Unreliable deployment scripts have are less than worthless.

OpenStack vs CloudStack? It’s about open innovation.

Yesterday, I got a short drive in a “Kick-Ass” Fisker Karma.  As someone who converted a car to electric, it was a great treat to see the amazing quality, polish and sophistication of the Karma.  Especially since, six years ago, I had to build my own EV.  Today there is a diversity of choices ranging from the Nissan, GM, Tesla, Aptera, Fisker and others.  Yet even with all these choices, EVs are far from the main stream.

How does that relate to Cloud *aaS?  It’s all about innovation cycles and adoption.

Cloud platforms (really, IaaS software) have transformed from DIY to vibrant projects in the last few years; however, we still don’t know what the finished products will look like – we are only at the beginning of the innovation cycle.

With yesterday’s Citrix’s “CloudStack joins Apache” announcement painted as a shot against OpenStack, it is tempting to get pulled into a polarized view of the right or wrong way to implement cloud software (NetworkWorld,  JavaWorld, CloudPundit).  I think that feature by feature comparisons miss the real dynamics of the cloud market.

The question is not about features today, it is about forward velocity tomorrow.  There are important areas needing technology development (network, storage, etc) in the cloud infrastructure space.  

So the real story, expressed eloquently by Thierry Carrez, is about open collaboration and the resulting pace of innovation.  That means that I consider all the cloud platforms in the market to be immature because we are still learning the scope of the “cloud” opportunity.

The critical question is how the various cloud projects will maintain growth and adopt innovation.  Like the current generation of EVs, we must both prove value in production and demonstrate our ability to learn and improve.

The Citrix decision to submit CloudStack to the Apache Foundation underscores this point: success of projects is about attracting collaboration and innovation.

From the perspective of building innovation and attracting developers, the tension between OpenStack and CloudStack is very real.

Open Source Cloud Bootstrapping Revisised

At the OpenStack last design conference, Greg Althaus and I presented about updates (presentation here) we were making to a Nov 2010 cloud architecture white paper.

The revised “Bootstrapping Open Source Clouds” white paper has been out for a few months so I thought it was past time to throw out a link.

I’m really pleased about this update because it reflects real world experience my team has working with customers and partners on OpenStack (and Hadoop) deployments.

Executive Summary
Bringing a cloud infrastructure online can be a daunting bootstrapping challenge. Before
hanging out a shingle as a private or public cloud service provider, you must select a platform,
acquire hardware, configure your network, set up operations services, and integrate it to work
well together. That is a lot of moving parts before you have even installed a sellable application.
This white paper walks you through the decision process to get started with an open source
cloud infrastructure based on OpenStack™ and Dell™ PowerEdge™ C servers. At the end, you’ll
be ready to design your own trial system that will serve as the foundation of your hyperscale
cloud.
2011 Revision Notes
In the year since the the original publication of this white paper, we worked with many
customers building OpenStack clouds. These clouds range in size from small six-node lab
systems to larger production deployments. Based on these experiences, we updated this white
paper to reflect lessons learned.

Four alternatives to Process Interlock

Note: This is the third and final part of 3 part series about the “process interlock dilemma.”

In post 1, I’ve spelled out how evil Process Interlock causes well intentioned managers to add schedule risk and opportunity cost even as they appear to be doing the right thing. In post 2, I offered some alternative outcomes when process interlock is avoided. In this post, I attempt to provide alternatives to the allure of process interlock. We must have substitute interlocks types to replace our de facto standard because there are strong behavioral and traditional reasons to keep broken processes. In other words, process Interlock feels good because it gives you the illusion that your solution is needed and vital to other projects.

If your product is vital to another team then they should be able to leverage what you have, not what you’re planning to have.

We should focus on delivered code instead of future promises. I am not saying that roadmaps and projections are bad – I think they are essential. I am saying that roadmaps should be viewed as potential not as promises.

  1. No future commits (No interlock)

    The simplest way to operate without any process interlock is to never depend on other groups for future deliveries. This approach is best for projects that need to move quickly and have no tolerance for schedule risk. This means that your project is constrained to use the “as delivered” work product from all external groups. Depending on needs, you may further refine this as only rely on stable and released work.

    For example, OpenStack Cactus relied on features that were available in the interim 10.10 Ubuntu version. This allowed the project to advance faster, but also limited support because the OS this version was not a long term support (LTS) release.

  2. Smaller delivery steps (MVP interlock)

    Sometimes a new project really needs emerging capabilities from another project. In those cases, the best strategy is to identify a minimum viable feature set (or “product”) that needs to be delivered from the other project. The MVP needs to be a true minimum feature set – one that’s just enough to prove that the integration will work. Once the MVP has been proven, a much clearer understanding of the requirements will help determine the required amount of interlock. My objective with an MVP interlock is to find the true requirements because IMHO many integrations are significantly over specified.

    For example, the OpenStack Quantum project (really, any incubated OpenStack projects) focuses on delivering the core functionality first so that the ecosystem and other projects can start using it as soon as possible.

  3. Collaborative development (Shared interlock)

    A collaborative interlock is very productive when the need for integration is truly deep and complex. In this scenario, the teams share membership or code bases so that the needs of each team is represented in real time. This type of transparency exposes real requirements and schedule risk very quickly. It also allows dependent teams to contribute resources that accelerate delivery.

    For example, our Crowbar OpenStack team used this type of interlock with the Rackspace OpenStack team to ensure that we could get Diablo code delivered and deployed as fast as possible.

  4. Collaborative requirements (Fractal interlock)

    If you can’t collaborate or negotiate an MVP then you’re forced into working at the requirements level instead of development collaboration. You can think of this as a sprint-roadmap fast follow strategy because the interlocked teams are mutually evolving design requirements.

    I call this approach Fractal because you start at big concepts (road maps) and drill down to more and more detail (sprints) as the monitored project progresses. In this model, you interlock on a general capability initially and then work to refine the delivery as you learn more. The goal is to avoid starting delays or injecting false requirements that slow delivery.

    For example, if you had a product that required power from hamsters running in wheels then you’d start saying that you needed a small fast running animal. Over the next few sprints, you’d likely refine that down to four legged mammals and then to short tailed high energy rodents. Issues like nocturnal or bites operators could be addressed by the Hamster team or by the Wheel team as the issues arose. It could turn out that the right target (a red bull sipping gecko) surfaces during short tail rodent design review. My point is that you can avoid interlocks by allowing scope to evolve.

Breaking Process Interlocks delivers significant ROI

I have been trying to untangle both the cause and solution of process interlock for a long time. My team at Dell has an interlock-averse culture and it accelerates our work delivery. I write about this topic because I have real world experience that eliminating process interlocks increases

  1. team velocity
  2. collaboration
  3. quality
  4. return on investment

These are significant values that justify adoption of these non-interlock approachs; however, I have a more selfish motivation.

We want to work with other teams that are interlock-averse because the impacts multiply. Our team is slowed when others attempt to process interlock and accelerated when we are approached in the ways I list above.

I suspect that this topic deserves a book rather than a three part blog series and, perhaps, I will ultimately create one. Until then, I welcome your comments, suggestions and war stories.

Cloudera Manager Barclamp posted! (part of updated Dell | Cloudera Apache Hadoop Solution)

My team at Dell has been driving to transparency and openness around Crowbar plus our OpenStack and Hadoop powered solutions.  Specifically, our work for our coming release is maintained in the open on the Dell CloudEdge Github site.  You can see (and participate in!) our development and validation work in advance of our official release.

I’m pleased to note that our Cloudera Manager barclamp has been posted to Github!

This barclamp supersedes  the Hadoop barclamp in the next release of the Dell | Cloudera Apache Hadoop solution.  You can built it in Crowbar using the “cloudera-os-build”  branch for Crowbar.  Do not fear!  The Hadoop barclamp still exists (hadoop-os-build branch).

Both the new and original Hadoop barclamp use the Cloudera Hadoop distribution (aka CDH); however, the new barclamp is able to leverage Cloudera‘s latest management capabilities.  For the Dell solution, Cloudera Manager has always been part of the offering.  The primary difference is that we are improving the level of integration.  I promise to post more about the features of the solution as we get closer to release.

OpenStack Austin: What we’d like to see at the Design Summit

Last week, the OpenStack Austin user group discussed what we’d like to see at the upcoming OpenStack Design Summit. We had a strong turnout (48?!).

  1. To get the meeting started, Marc Padovani from HP (this month’s sponsor) provided some lessons learned from the HP OpenStack-Powered Cloud. While Marc noted that HP has not been able to share much of their development work on OpenStack; he was able to show performance metrics relating to a fix that HP contributed back to the OpenStack community. The defect related to the scheduler’s ability to handle load. The pre-fix data showed a climb and then a gap where the scheduler simply stopped responding. Post-fix, the performance curve is flat without any “dead zones.” (sharing data like this is what I call “open operations“)
  2. Next, I (Rob Hirschfeld) gave a brief overview of the OpenStack Essex Deploy Day (my summary) that Dell coordinated with world-wide participation. The Austin deploy day location was in the same room as the meetup so several of the OSEDD participants were still around.
  3. The meat of the meetup was a freeform discussion about what the group would like to see discussed at the Design Summit. My objective for the discussion was that the Austin OpenStack community could have a broader voice is we showed consensus for certain topics in advance of the meeting.

At Jim Plamondon‘s suggestion, we captured our brain storming on the OpenStack etherpad. The Etherpad is super cool – it allows simultaneous editing by multiple parties, so the notes below were crowd sourced during the meeting as we discussed topics that we’d like to see highlighted at the conference. The etherpad preserves editors, but I removed the highlights for clarity.

The next step is for me to consolidate the list into a voting page and ask the membership to rank the items (poll online!) below.

Brain storm results (unedited)

Stablity vs. Features

API vs. Code

  • What is the measurable feature set?
  • Is it an API, or an implementation?
  • Is the Foundation a formal-ish standards body?
  • Imagine the late end-game: can Azure/VMWare adopt OPenStack’s APIs and data formats to deliver interop, without running OpenStack’s code? Is this good? Are there conversations on displacing incumbents and spurring new adoption?
  • Logo issues

Documentation Standards

  • Dev docs vs user docs
  • Lag of update/fragmentation (10 blogs, 10 different methods, 2 “work”)
  • Per release getting started guide validated and available prior or at release.

Operations Focus

  • Error messages and codes vs python stack traces
  • Alternatively put, “how can we make error messages more ops-friendly, without making them less developer-friendly?”
  • Upgrade and operations of rolling updates and upgrades. Hot migrations?

If OpenStack was installable on Windows/Hyper-V as a simple MSI/Service installer – would you try it as a node?

  • Yes.

Is Nova too big?  How does it get fixed?

  • libraries?
  • sections?
  • make it smaller sub-projects
  • shorter release cycles?

nova-volume

  • volume split out?
  • volume expansion of backend storage systems
  • Is nova-volume the canonical control plane for storage provisioning?  Regardless of transport? It presently deals in block devices only… is the following blueprint correctly targeted to nova-volume?

https://blueprints.launchpad.net/nova/+spec/filedriver

Orchestration

  • Is the Donabe project dead?

Discussion about invitations to Summit

  • What is a contribution that warrants an invitation
  • Look at Launchpad’s Karma system, which confers karma for many different “contributory” acts, including bug fixes and doc fixes, in addition to code commitments

Summit Discussions

  • Is there a time for an operations summit?
  • How about an operators’ track?
  • Just a note: forums.openstack.org for users/operators to drive/show need and participation.

How can we capture the implicit knowledge (of mailing list and IRC content) in explicit content (documentation, forums, wiki, stackexchange, etc.)?

Hypervisors: room for discussion?

  • Do we want hypervisor featrure parity?
  • From the cloud-app developer’s perspective, I want to “write once, run anywhere,” and if hypervisor features preclude that (by having incompatible VM images, foe example)
  • (RobH: But “write once, run anywhere” [WORA] didn’t work for Java, right?)
  • (JimP: Yeah, but I was one of Microsoft’s anti-Java evangelists, when we were actively preventing it from working — so I know the dirty tricks vendors can use to hurt WORA in OpenStack, and how to prevent those trick from working.)

CDMI

Swift API is an evolving de facto open alternative to S3… CDMI is SNIA standards track.  Should Swift API become CDMI compliant?  Should CDMI exist as a shim… a la the S3 stuff.

Analyze This! Big Data | Apache Hadoop | Dell | Cloudera | Crowbar

This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).

Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.

Big Data Analytics spins data straws into information gold.

Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information.  (video of my Hadoop “escalator pitch”)

Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.

The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.

This is really two problems: storing a lot of data and then computing over it.

Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.

For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).

Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.

Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.

Why Governance Matters in Open Source: Discussing the OpenStack Foundation

This post is part of my notes from the 2/1 Boston OpenStack meetup.

OpenStack Foundation

Your’s truly (Rob Hirschfeld) gave the presentation about the OpenStack Foundation.  To readers of this blog, it’s obvious that I’m a believer in the OpenStack mission; however, it’s not obvious how creating a foundation helps with that mission and why OpenStack needs its own. As one person at the meetup put it, “Why not? Every major project needs a foundation!”

Governance does not sound sexy compared to writing code and deploying clouds, but it’s very important to the success of the project.

Here are my notes without the poetic elocution I exuded during the meetup…

The basics:

  • What: Creating a neutral body to govern OpenStack. Rackspace has been leading OpenStack. This means that they own the copyrights, name and also pay the people who organize the community. They committed (to executives at Dell and others) that they would ultimately setup a standalone body to govern the project before the project was public and endorsed by those early partners. Dell (my employer), Citrix, Accenture and NASA were some of biggest names at the Austin conference launch.
  • Why: A neutral body is needed because a lot of companies are committing significant time and money to the project. They cannot risk their investments on Rackspace good will alone. This may mean many things. It could be they don’t like Rackspace direction or they feel that Rackspace is not investing enough.
  • When: Right now and over the next few releases.  You should give feedback right now on the OpenStack Foundations mission.  The actual foundation will take more time to establish because it requires legal work and funding commitments.
  • Who: The community – all stakeholders. This is important stuff! While trying to standup a financially independent Foundation, which requires moneys, the little guys are not left out. There is a clear realization and desire to enable independent developers and contributors and small players to have a seat at the table.
  • How Much: The amounts are unclear, but establishing a foundation will require a significant ongoing investment from highly involved and moneyed parties (Rackspace, Dell, Cisco, HP, Citrix, NTT, startups?, etc).  The funding will pay salaries for people dedicated to the community doing the things that I’ll discuss below.  Overall, the ROI for those investments must be clear!

The foundation does “governance.” But, what does that mean? Here is a list of vitally important work that the foundation is responsible for.

  • Branding – Protecting, certifying, and promoting the OpenStack brand is important because it ensures that “OpenStack” has a valuable and predictable meaning to contributors and users. A strong the brand also means a stronger temptation for people to abuse the brand by claiming compatibility, participation and integration.
  • API – Many would assume that the OpenStack API is the very heart of the project and there is merit to this position. As more and more OpenStack implementations emerge, it is essential that we have a body that can certify which implementations (and even which versions of the implementation!) are valid. This is a substantial value to the community because API integrity ensures project continuity and helps the ecosystem monetize the project. Note: my opinion differs from others here because I think we should favor API over implementation
  • Community – The OpenStack community is not an accident. It is the function of deliberate actions and choices made by Rackspace and supported by key contributors. That community requires virtual and physical places to coalesce and leaders to organize and manage those meeting places. The excellent conferences, wikis, blogs, media awareness, documentation and meetups are a product of consistent community management.
  • Arbitration – An open source community is a family and siblings do not always get along. Today, Rackspace must be very careful about balancing their own interests because they are like the oldest sibling playing the parent role – you can get away with it until something serious happens. We need a neutral party so that Rackspace can protect their own interests (alternate spin: because Rackspace protects their own interests at the expense of the community).
  • Leadership – OpenStack today is a collection of projects with individual leadership. We will increasingly need coordinated leadership as the number of projects and users increases. Centralized leadership is essential because the good of the project as a whole may mean sacrifices within individual projects. It may even mean that some projects chose to leave the OpenStack tent. Stewarding these challenges will require a new level of leadership.
  • Legal – This is a function of all the above but also something more. From a legal stand point, OpenStack be able to represent itself. There is a significant amount of intellectual property being created. It would be foolish to overlook that this property is valuable and needs adequate legal representation.

I used “vitally important” to describe the above items. Is that an exaggeration? Our goal is collaboration and that requires some infrastructure and rules to make it sustainable. We must have a foundation that encourages innovation (multiple implementations) and collaboration (discourages forking). Innovation and collaboration are the heartbeat of an open source project.

The foundation is vitally important because collaboration by competitors is fragile.

In addition to the core areas above, the foundation needs to handle routine tactical items such as:

  • Delivering on milestones & releases
  • Moving new subprojects into OpenStack
  • Electing and maintaining Project Policy Board
  • Electing and maintaining Project Technical Leads
  • Ensuring adherence and extensions to the current bylaws

At the end of the day, OpenStack monetization is the central value for the Foundation.

In order for the OpenStack project, and thus its foundation, to flourish, the contributors, ecosystem, sponsors and users of the project must be able to see a reasonable return (ROI) on their investment. I would love to believe that the foundation is allow about people banding together to solve important problems for the benefit of all; however, it is more realistic to embrace that we can both collaborate and profit simultaneously. Acknowledging the pragmatic self-interested view allows us to create the right incentives and processes as embodied by the OpenStack foundation.

CloudOps white paper explains “cloud is always ready, never finished”

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.