double Block Head with OpenStack+Equallogic & Crowbar+Ceph

Block Head

Whew….Yesterday, Dell announced TWO OpenStack block storage capabilities (Equallogic & Ceph) for our OpenStack Essex Solution (I’m on the Dell OpenStack/Crowbar team) and community edition.  The addition of block storage effectively fills the “persistent storage” gap in the solution.  I’m quadrupally excited because we now have:

  1. both open source (Ceph) and enterprise (Equallogic) choices
  2. both Nova drivers’ code is in the open at part of our open source Crowbar work

Frankly, I’ve been having trouble sitting on the news until Dell World because both features have been available in Github before the announcement (EQLX and Ceph-Barclamp).  Such is the emerging intersection of corporate marketing and open source.

As you may expect, we are delivering them through Crowbar; however, we’ve already had customers pickup the EQLX code and apply it without Crowbar.

The Equallogic+Nova Connector

block-eqlx

If you are using Crowbar 1.5 (Essex 2) then you already have the code!  Of course, you still need to have the admin information for your SAN – we did not automate the configuration of the storage system, but the Nova Volume integration.

We have it under a split test so you need to do the following to enable the configuration options:

  1. Install OpenStack as normal
  2. Create the Nova proposal
  3. Enter “Raw” Attribute Mode
  4. Change the “volume_type” to “eqlx”
  5. Save
  6. The Equallogic options should be available in the custom attribute editor!  (of course, you can edit in raw mode too)

Want Docs?  Got them!  Check out these > EQLX Driver Install Addendum

Usage note: the integration uses SSH sessions.  It has been performance tested but not been tested at scale.

The Ceph+Nova Connector

block-ceph

The Ceph capability includes a Ceph barclamp!  That means that all the work to setup and configure Ceph is done automatically done by Crowbar.  Even better, their Nova barclamp (Ceph provides it from their site) will automatically find the Ceph proposal and link the components together!

Ceph has provided excellent directions and videos to support this install.

Interview Transcript about OpenStack, Foundation, Dell & Crowbar

Rafael Knuth, Dell Rafael Knuth (@RafaelKnuth with Dell as EMEA Social Media Community Manager) has set an ambitious objective to interview all 24 of the OpenStack Foundation Board members.  I have the privilege of being the first interview posted (Japanese version!).   Here’s the whole series.

This is not puff interview – We spent an hour together and Rafael did not shy away from asking hard questions like “Why did Dell jump into OpenStack?” and “is VMware a threat to OpenStack?”  Rather than posting the whole transcript (it’s posted here), I’m including the questions (as a teaser) below.   There is some real meat in these answers about OpenStack, Dell, Crowbar and challenges facing the project.

WARNING: My job is engineering, not marketing.  You may find my answers (which are MY OWN ANSWERS) to be more direct that you are expecting.  If you find yourself needing additional circumlocution then simply close your browser and move on.

Some highlights…

Dell’s interest in OpenStack has been very pragmatic. OpenStack is something we really see a market need for.

Rackspace …  runs on OpenStack pretty much off trunk …  That’s exactly the type of vibrant community we want to see.  At the same time, there is a growing community that wants to use OpenStack distributions with support, certifications and they are fine with being 6 months behind OpenStack off trunk. That’s good, and we want that shadow, we want that combination of pure minded early adopters and less sophisticated OpenStack users both working together.

We are working with different partners to bring OpenStack to different customers in different ways. It is confusing. Your question about Dell Crowbar was right … it is targeted at a certain class of users, and I don’t want enterprise customers who expect a lot of shiny chrome and zero touch. That’s not the target by now for Dell Crowbar. We definitely need that sort of magic decoder page to help customers understand our commercial offering.

Questions from Interview:

  1. Dell is one of the very early contributors to OpenStack. Why is Dell engaging in this project?
  2. How does Dell contribute to OpenStack?
  3. Let’s talk a bit about Dell Crowbar, your team’s deployment mechanism for OpenStack.
  4. Let’s talk a bit about OpenStack raw vs. OpenStack distributions.
  5. What are the biggest barriers to OpenStack adoption as of now?
  6. What does a customer specifically need to do when moving from OpenStack Essex to Folsom for example?
  7. My next question is around proof of concept versus production, Rob. How are customers using OpenStack and can you give examples for both scenarios?
  8. I hear very often two different statements: “Open Stack is an alternative to Amazon.” The other is: “OpenStack is an alternative to VMware … maybe, hopefully in two or three years from now.”  Which of both statements is true?
  9. How do you view VMware joining OpenStack. Is it a threat to OpenStack or does VMware add value to the project?
  10. Let us speak about market adoption. Who are the early adopters of OpenStack? And when do you expect OpenStack to hit the tipping point for mass market adoption?
  11. Rob, for all those interested in Dell’s commercial offering around OpenStack … can you give a brief overview?
  12. Dell TechCenter that provides customers an overview over our OpenStack offering: Dell Crowbar as our DevOps tool in its various shapes and forms, OpenStack distros we support … cloud services we build around OpenStack … hardware capabilities optimized for OpenStack.
  13. What are the challenges for the OpenStack Board of Directors?

Crowbar releases for OpenStack Essex and Folsom

On the eve of the OpenStack design summit, it’s worth noting that the Crowbar team at Dell cut our final Essex release (aka Betty) last week. We’ve also committed the initial Folsom deployment scripts to the 1x development trunk under “feature/folsom” if you are doing Crowbar builds from DevTool (see bit.ly/crowbardevtool).

If you don’t want to build it, I’ve cut ISOs from the open source bits and posted them to http://crowbar.zehicle.com.

Andi Abes is presenting about the new Pull From Source (pfs) feature at the Summit on Monday. There’s a feature branch for that too and I’m going to check with him and try to post and ISO for that too.

20121014-132246.jpg

OpenStack Summit: Let’s talk DevOps, Fog, Upgrades, Crowbar & Dell

If you are coming to the OpenStack summit in San Diego next week then please find me at the show! I want to hear from you about the Foundation, community, OpenStack deployments, Crowbar and anything else.  Oh, and I just ordered a handful of Crowbar stickers if you wanted some CB bling.

Matt Ray (Opscode), Jason Cannavale (Rackspace) and I were Ops track co-chairs. If you have suggestions, we want to hear. We managed to get great speakers and also some interesting sessions like DevOps panel and up streaming deploy working sessions. It’s only on Monday and Tuesday, so don’t snooze or you’ll miss it.

My team from Dell has a lot going on, so there are lots of chances to connect with us:

At the Dell booth, Randy Perryman will be sharing field experience about hardware choices. We’ve got a lot of OpenStack battle experience and we want to compare notes with you.

I’m on the board meeting on Monday so likely occupied until the Mirantis party.

See you in San Diego!

PS: My team is hiring for Dev, QA and Marketing. Let me know if you want details.

Why doesn’t Chef call them “bowls” instead of roles?

The extended Crowbar team (my employer Dell and community) recently had a bit of a controversy heated discussion over the renaming of “proposals” to “configurations.” It was pretty clear that the term “proposal” confused users because an “active proposal” seems like a bit of an oxymoron. Excepting Scott Jensen, our schedule-czar and director of engineering, we had relatively few die-hard “I love proposal” advocates; however, deciding on an alternative was not quite so easy.

Ultimately, the question came down to “do we use an invented term or an intuitive term.”  The answer lies in when to use or avoid the congruity theory as articulated by Roger Cauvin.

We considered many alternatives like calling them “fixtures” to go along with the Crowbar & Barclamp tool theme. Even “Chuck Norris” was considered until copyright issues were flagged. The top alternative, “configuration” seemed just too bland. Frankly and amazingly, we originally considered it “too descriptive!”

The crux of the argument really revolved around the users’ ability to intuitively grasp a concept or to force them learn a new term. For example, we specifically chose “barclamp” instead of “module” because we felt that there were more components to a barclamp than just being a Crowbar module. In many ways, module would be sufficiently descriptive; however, we saw that there was benefit to the user tax in introducing a new term. It also fit nicely within our tool theme.

Opscode Chef is an example of investing heavily in a naming theme. For example, the concept of “cookbooks” and “recipes” seems relatively intuitive for users but starts getting stretched for “knife” because it is not immediately clear to users what that component does (it executes instructions on nodes and the server). After learning Chef, I appreciate “knife” as the universal tool but still remember having to figure it out.

A good theme is awesome, but it can quickly encumber usability.

For example, what if Chef has used “bowl” instead of role. It’s logical: you put a group of ingredients to mix into a bowl that acts as a container. While it may be logical to the initiated, it mainly extends the learning curve for new users. A role is a commonly accepted term for an operational classification so it is a much better term for users. The same is true for “node” and “data bag.”

I love a good incongruent theme as much as any meme-enabled tech geek but themes must not hinder usability. After all, we all fight for the users.

Crowbar 2.0 Design Summit Notes (+ open weekly meetings starting)

I could not be happier with the results Crowbar collaborators and my team at Dell achieved around the 1st Crowbar design summit. We had great discussions and even better participation.

The attendees represented major operating system vendors, configuration management companies, OpenStack hosting companies, OpenStack cloud software providers, OpenStack consultants, OpenStack private cloud users, and (of course) a major infrastructure provider. That’s a very complete cross-section of the cloud community.

I knew from the start that we had too little time and, thankfully, people were tolerant of my need to stop the discussions. In the end, we were able to cover all the planned topics. This was important because all these features are interlocked so discussions were iterative. I was impressed with the level of knowledge at the table and it drove deep discussion. Even so, there are still parts of Crowbar that are confusing (networking, late binding, orchestration, chef coupling) even to collaborators.

In typing up these notes, it becomes even more blindingly obvious that the core features for Crowbar 2 are highly interconnected. That’s no surprise technically; however, it will make the notes harder to follow because of knowledge bootstrapping. You need take time and grok the gestalt and surf the zeitgeist.

Collaboration Invitation: I wanted to remind readers that this summit was just the kick-off for a series of open weekly design (Tuesdays 10am CDT) and coordination (Thursdays 8am CDT) meetings. Everyone is welcome to join in those meetings – information is posted, recorded, folded, spindled and mutilated on the Crowbar 2 wiki page.

These notes are my reflection of the online etherpad notes that were made live during the meeting. I’ve grouped them by design topic.

Introduction

  • Contributors need to sign CLAs
  • We are refactoring Crowbar at this time because we have a collection of interconnected features that could not be decoupled
  • Some items (Database use, Rails3, documentation, process) are not for debate. They are core needs but require little design.
  • There are 5 key topics for the refactor: online mode, networking flexibility, OpenStack pull from source, heterogeneous/multi operating systems, being CDMB agnostic
  • Due to time limits, we have to stop discussions and continue them online.
  • We are hoping to align Crowbar 2 beta and OpenStack Folsom release.

Online / Connected Mode

  • Online mode is more than simply internet connectivity. It is the foundation of how Crowbar stages dependencies and components for deploy. It’s required for heterogeneous O/S, pull from source and it has dependencies on how we model networking so nodes can access resources.
  • We are thinking to use caching proxies to stage resources. This would allow isolated production environments and preserves the run everything from ISO without a connection (that is still a key requirement to us).
  • Suse’s Crowbar fork does not build an ISO, instead it relies on RPM packages for barclamps and their dependencies.
  • Pulling packages directly from the Internet has proven to be unreliable, this method cannot rely on that alone.

Install From Source

  • This feature is mainly focused on OpenStack, it could be applied more generally. The principals that we are looking at could be applied to any application were the source code is changing quickly (all of them?!). Hadoop is an obvious second candidate.
  • We spent some time reviewing the use-cases for this feature. While this appears to be very dev and pre-release focused, there are important applications for production. Specifically, we expect that scale customers will need to run ahead of or slightly adjacent to trunk due to patches or proprietary code. In both cases, it is important that users can deploy from their repository.
  • We discussed briefly our objective to pull configuration from upstream (not just OpenStack, but potentially any common cookbooks/modules). This topic is central to the CMDB agnostic discussion below.
  • The overall sentiment is that this could be a very powerful capability if we can manage to make it work. There is a substantial challenge in tracking dependencies – current RPMs and Debs do a good job of this and other configuration steps beyond just the bits. Replicating that functionality is the real obstacle.

CMDB agnostic (decoupling Chef)

  • This feature is confusing because we are not eliminating the need for a configuration management database (CMDB) tool like Chef, instead we are decoupling Crowbar from the a single CMDB to a pluggable model using an abstraction layer.
  • It was stressed that Crowbar does orchestration – we do not rely on convergence over multiple passes to get the configuration correct.
  • We had strong agreement that the modules should not be tightly coupled but did need a consistent way (API? Consistent namespace? Pixie dust?) to share data between each other. Our priority is to maintain loose coupling and follow integration by convention and best practices rather than rigid structures.
  • The abstraction layer needs to have both import and export functions
  • Crowbar will use attribute injection so that Cookbooks can leverage Crowbar but will not require Crowbar to operate. Crowbar’s database will provide the links between the nodes instead of having to wedge it into the CMDB.
  • In 1.x, the networking was the most coupled into Chef. This is a major part of the refactor and modeling for Crowbar’s database.
  • There are a lot of notes captured about this on the etherpad – I recommend reviewing them

Heterogeneous OS (bare metal provisioning and beyond)

  • This topic was the most divergent of all our topics because most of the participants were using some variant of their own bare metal provisioning project (check the etherpad for the list).
  • Since we can’t pack an unlimited set of stuff on the ISO, this feature requires online mode.
  • Most of these projects do nothing beyond OS provisioning; however, their simplicity is beneficial. Crowbar needs to consider users who just want a stream-lined OS provisioning experience.
  • We discussed Crowbar’s late binding capability, but did not resolve how to reconcile that with these other projects.
  • Critical use cases to consider:
    • an API for provisioning (not sure if it needs to be more than the current one)
    • pick which Operating Systems go on which nodes (potentially with a rules engine?)
    • inventory capabilities of available nodes (like ohai and factor) into a database
    • inventory available operating systems

Open Community Access to Crowbar 2 Efforts

We’re moving along on the Crowbar2 refactoring work (`/release/rails3anddb/master` branch for now) and it’s time to start making it easier for you to participate if you are interested.

We are planning to start having TWO weekly community sprint meetings.

NOTE: Times can still shift depending on community input! We are trying to a truly global community so we need your input.

The weekly design discussions will be on Tuesdays @ 10am Central (GMT -6). The topics will be relevant to the coming sprint and we expect dialog. The topics will be determined by Greg Althaus based on progress on the refactor. You’re welcome to contact him or the list with suggestions.

The purpose of these meetings is to

  • discuss/resolve design related to the refactor
  • Document use-cases
  • identify issues that need to be addressed in the next sprint

The weekly coordination meetings will be on Thursdays @ 8am Central (GMT -6). We want to respect everyone’s time and will strictly limit these to 1 hour. This meeting is in between our internal sprint review and planning so we have flexibility to adjust our plans based on your input. It is important to us that we make it possible for you to contribute and we need your input to make sure that you are not blocked!

The coordination meetings will be structured as follows (times approximate)

  • Voice & Screencast: https://join.me/dellcrowbar
  • 25 minutes for review of current progress
  • 10 minutes for feedback/adjustments on process and workflow
  • 25 minutes for planning of next sprint
  • Online discussion & notes on the http://crowbar.sync.in/sprintMMDD etherpads
  • Identification of working groups for further discussion and coding collaboration

These meetings are the primary way that we will be making sure the community is not blocked by our development efforts.

The purpose of these meetings is

  • to synchronize quickly so that we can connect the people who should be collaborating
  • eliminate blocking items for Crowbar contributors

Of course, we will attempt to record and post all of these meetings.

Stay tuned! We will likely announce additional meetings for community collaboration.

PS: These are design meetings, they are NOT Crowbar training meetings. Please consult http://bit.ly/crowbarwiki for links and videos about learning Crowbar.

Ops “Late Binding” is critical best practice and key to Crowbar differentiation

From OSCON 2012: Portland's light rail understands good dev practice.Late binding is a programming term that I’ve commandeered for Crowbar’s DevOps design objectives.

We believe that late binding is a best practice for CloudOps.

Understanding this concept is turning out to be an important but confusing differentiation for Crowbar. We’ve effectively inverted the typical deploy pattern of building up a cloud from bare metal; instead, Crowbar allows you to build a cloud from the top down.  The difference is critical – we delay hardware decisions until we have the information needed to do the correct configuration.

If Late Binding is still confusing, the concept is really very simple: “we hold off all work until you’ve decided how you want to setup your cloud.”

Late binding arose from our design objectives. We started the project with a few critical operational design objectives:

  1. Treat the nodes and application layers as an interconnected system
  2. Realize that application choices should drive down the entire application stack including BIOS, RAID and networking
  3. Expect the entire system to be in a constantly changing so we must track state and avoid locked configurations.

We’d seen these objectives as core tenets in hyperscale operators who considered bare metal and network configuration to be an integral part of their application deployment. We know it is possible to build the system in layers that only (re)deploy once the application configuration is defined.

We have all this great interconnected automation! Why waste it by having to pre-stage the hardware or networking?

In cloud, late binding is known as “elastic computing” because you wait until you need resources to deploy. But running apps on cloud virtual machines is simple when compared to operating a physical infrastructure. In physical operations, RAID, BIOS and networking matter a lot because there are important and substantial variations between nodes. These differences are what drive late binding as a one of Crowbar’s core design principles.

Late Binding Visualized

The real workloads begin: Crowbar’s Sophomore Year

Given Crowbar‘s frenetic Freshman year, it’s impossible to predict everything that Crowbar could become. I certainly aspire to see the project gain a stronger developer community and the seeds of this transformation are sprouting. I also see that community driven work is positioning Crowbar to break beyond being platforms for OpenStack and Apache Hadoop solutions that pay the bills for my team at Dell to invest in Crowbar development.

I don’t have to look beyond the summer to see important development for Crowbar because of the substantial goals of the Crowbar 2.0 refactor.

Crowbar 2.0 is really just around the corner so I’d like to set some longer range goals for our next year.

  • Growing acceptance of Crowbar as an in data center extension for DevOps tools (what I call CloudOps)
  • Deeper integration into more operating environments beyond the core Linux flavors (like virtualization hosts, closed and special purpose operating systems.
  • Improvements in dynamic networking configuration
  • Enabling more online network connected operating modes
  • Taking on production ops challenges of scale, high availability and migration
  • Formalization of our community engagement with summits, user groups, and broader developer contributions.

For example, Crowbar 2.0 will be able to handle downloading packages and applications from the internet. Online content is not a major benefit without being able to stage and control how those new packages are deployed; consequently, our goals remains tightly focused improvements in orchestration.

These changes create a foundation that enables a more dynamic operating environment. Ultimately, I see Crowbar driving towards a vision of fully integrated continuous operations; however, Greg & Rob’s Crowbar vision is the topic for tomorrow’s post.

Our Vision for Crowbar – taking steps towards closed loop operations

When Greg Althaus and I first proposed the project that would become Dell’s Crowbar, we had already learned first-hand that there was a significant gap in both the technologies and the processes for scale operations. Our team at Dell saw that the successful cloud data centers were treating their deployments as integrated systems (now called DevOps) in which configuration of many components where coordinated and orchestrated; however, these approaches feel short of the mark in our opinion. We wanted to create a truly integrated operational environment from the bare metal through the networking up to the applications and out to the operations tooling.

Our ultimate technical nirvana is to achieve closed-loop continuous deployments. We want to see applications that constantly optimize new code, deployment changes, quality, revenue and cost of operations. We could find parts but not a complete adequate foundation for this vision.

The business driver for Crowbar is system thinking around improved time to value and flexibility. While our technical vision is a long-term objective, we see very real short-term ROI. It does not matter if you are writing your own software or deploying applications; the faster you can move that code into production the sooner you get value from innovation. It is clear to us that the most successful technology companies have reorganized around speed to market and adapting to pace of change.

System flexibility & acceleration were key values when lean manufacturing revolution gave Dell a competitive advantage and it has proven even more critical in today’s dynamic technology innovation climate.

We hope that this post helps define a vision for Crowbar beyond the upcoming refactoring. We started the project with the idea that new tools meant we could take operations to a new level.

While that’s a great objective, we’re too pragmatic in delivery to rest on a broad objective. Let’s take a look at Crowbar’s concrete strengths and growth areas.

Key strength areas for Crowbar

  1. Late binding – hardware and network configuration is held until software configuration is known.  This is a huge system concept.
  2. Dynamic and Integrated Networking – means that we treat networking as a 1st class citizen for ops (sort of like software defined networking but integrated into the application)
  3. System Perspective – no Application is an island.  You can’t optimize just the deployment, you need to consider hardware, software, networking and operations all together.
  4. Bootstrapping (bare metal) – while not “rocket science” it takes a lot of careful effort to get this right in a way that is meaningful in a continuous operations environment.
  5. Open Source / Open Development / Modular Design – this problem is simply too complex to solve alone.  We need to get a much broader net of environments and thinking involved.

Continuing Areas of Leadership

  1. Open / Lean / Incremental Architecture – these are core aspects of our approach.  While we have a vision, we also are very open to ways that solve problems faster and more elegantly than we’d expected.
  2. Continuous deployment – we think the release cycles are getting faster and the only way to survive is the build change into the foundation of operations.
  3. Integrated networking – software defined networking is cool, but not enough.  We need to have semantics that link applications, networks and infrastructure together.
  4. Equilivent physical / virtual – we’re not saying that you won’t care if it’s physical or virtual (you should), we think that it should not impact your operations.
  5. Scale / Hybrid – the key element to hybrid is scale and to hybrid is scale.  The missing connection is being able to close the loop.
  6. Closed loop deployment – seeking load management, code quality, profit, and cost of operations as factor in managed operations.