Quick turn OpenStack Essex on Crowbar (BOOM, now we’re at v1.4!)

Don’t blink if you’ve been watching the Crowbar release roadmap!

My team at Dell is about to turn another release of Crowbar. Version 1.3 released 5/14 (focused on Cloudera Apache Hadoop) and our original schedule showed several sprints of work on OpenStack Essex. Upon evaluation, we believe that the current community developed Essex barclamps are ready now.

The healthy state of the OpenStack Essex deployment is a reflection of 1) the quality of Essex and 2) our early community activity in creating deployments based Essex RC1 and Ubuntu Beta1.

We are planning many improvements to our OpenStack Essex and Crowbar Framework; however, most deployments can proceed without these enhancements.  This also enables participants in the 5/31 OpenStack Essex Deploy Day.

By releasing a core stable Essex reference deployment, we are accelerating field deployments and enabling the OpenStack ecosystem. In terms of previous posts, we are eliminating release interlocks to enable more downstream development. Ultimately, we hope that we are also creating a baseline OpenStack deployment.

We are also reducing the pressure to rush more disruptive Crowbar changes (like enabling high availability, adding multiple operating systems, moving to Rails 3, fewer crowbarisms in cookbooks and streamlining networking). With this foundational Essex release behind us (we call it an MVP), we can work on more depth and breadth of capability in OpenStack.

One small challenge, some of the changes that we’d expected to drop have been postponed slightly. Specifically, markdown based documentation (/docs) and some new UI pages (/network/nodes, /nodes/families). All are already in the product under but not wired into the default UI (basically, a split test).

On the bright side, we did manage to expose 10g networking awareness for barclamps; however, we have not yet refactored to barclamps to leverage the change.

Cloudera Manager Barclamp posted! (part of updated Dell | Cloudera Apache Hadoop Solution)

My team at Dell has been driving to transparency and openness around Crowbar plus our OpenStack and Hadoop powered solutions.  Specifically, our work for our coming release is maintained in the open on the Dell CloudEdge Github site.  You can see (and participate in!) our development and validation work in advance of our official release.

I’m pleased to note that our Cloudera Manager barclamp has been posted to Github!

This barclamp supersedes  the Hadoop barclamp in the next release of the Dell | Cloudera Apache Hadoop solution.  You can built it in Crowbar using the “cloudera-os-build”  branch for Crowbar.  Do not fear!  The Hadoop barclamp still exists (hadoop-os-build branch).

Both the new and original Hadoop barclamp use the Cloudera Hadoop distribution (aka CDH); however, the new barclamp is able to leverage Cloudera‘s latest management capabilities.  For the Dell solution, Cloudera Manager has always been part of the offering.  The primary difference is that we are improving the level of integration.  I promise to post more about the features of the solution as we get closer to release.

Work with me! Our Dell team is hiring architects, engineers & open source gurus

If you’ve been watching my team’s progress at Dell on Crowbar, OpenStack and Hadoop and want a front row seat in these exciting open source projects then the ball is in our your court!   We are poised to take all three of these projects into new territories that I cannot reveal here, but, take my word for it, there has never been a better time to join our team.

Let me repeat: my team has a lot of open engineering and marketing positions.

Not only are we doing some really kick ass projects, we are also helping redefine how Dell delivers software.  Dell is investing significantly in building our software capabilities and focus.

Basically, we are looking for engineers with a passion for scale applications, devops and open source.   Experience in Hadoop and/or OpenStack will move you to the top of the pile.   These positions say Hadoop, but we’re also looking for OpenStack, DevOps and Chef.  We think like a start-up.

Ideally in Austin, Boston or the Bay.  We’ll also be happy to hear from you if you’ve got l33t chOps but are not as senior as these positions require.
If you are interested, the BEST NEXT  STEP IS TO APPLY ONLINE.
If you don’t want to click the links, I’m attaching the descriptions of the engineering positions after the split.

Continue reading

Analyze This! Big Data | Apache Hadoop | Dell | Cloudera | Crowbar

This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).

Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.

Big Data Analytics spins data straws into information gold.

Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information.  (video of my Hadoop “escalator pitch”)

Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.

The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.

This is really two problems: storing a lot of data and then computing over it.

Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.

For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).

Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.

Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.

Crowbar+OpenStack Insights for the week: Food Fight Podcast & Boston Meetup 2/1

Please don’t confuse a lack of posts with a lack of activity!  I’ve been in the center of a whirlwind of Crowbar, OpenStack and Hadoop for my team at Dell.  I’ve also working on an interesting side project with Liquid Leadership author (and would-be star ship captain) Brad Szollose.

I just don’t have time to post all of the awesomeness.  I can tell you that my team is very focused on Hadoop (RHEL 6.2/CentOS 6.2 + open Cloudera Distro) barclamps as we get some Diablo deployments done.  Also the Crowbar list has been very active about Diablo.  If you’re looking for advanced information, there is  some inside scoop on the Crowbar FoodFight podcast I did with Bryan Berry & Matt Ray.

I’ll be in BOSTON THIS WEDNESDAY 2/1 for the OpenStack Meetup there.  We’re going to be talking about Quantum and the OpenStack Foundation.  I suspect that Keystone will come up too (but that’s the subject of another post).  Of course, it’s not just your humble blogger: the whole Dell CloudEdge OpenStack/Crowbar team will be on hand!  So put on your cloud geek hat and take a trip to Harvard for the meetup!

Early crop of Crowbar 1.3 features popping up

My team at Dell is still figuring out some big items for the 1.3 release; however, somethings were just added that is worth calling out.

  1. Ubuntu 11.04 support!   Thanks to Justin Shepherd from Rackspace Cloud Builders!
  2. Alias names for nodes in the UI
  3. User managed node groups in the UI
  4. Ability to pre-populate the alias, description and group for a node (not integrated with DNS yet)
  5. Hadoop is working again – we addressed the missing Ganglia repo issue.  Thanks to Victor Lowther.
For items 2 – 4, I made a short video tour: Node Alias & Group

Also, I’ve spun new open source ISOs with the new features.  User beware!

2012: A year of Cloud Coalescence (whatever that means)

This post is a collaboration between three Dell Cloud activists: Rob Hirschfeld (@zehicle), Joseph B George (@jbgeorge) and Stephen Spector (@SpectoratDell).

We’re not making predictions for the “whole” Cloud market, this is a relatively narrow perspective based on technologies that on our daily radar. These views are strictly our own and based on publicly available data. They do not reflect plans, commitments, or internal data from our employer (Dell).

The major 2012 theme is cloud coalescence.  However, Rob worries that we’ll see slower adoption due to lack of engineers and confusing names/concepts.

Here are our twelve items for 2012:

  1. Open source continues to be a disruptive technology delivery model. It’s not “free” software – there’s an emerging IT culture that is doing business differently, including a number of large enterprises. The stable of sleeping giant vendors are waking up to this in 2012 but full engagement will take time.
  2. Linux. It is the cloud operating system and had a great 2012. It seems silly pointing this out since it seems obvious, but it’s the foundation for open source acceleration.
  3. Tight market for engineering and product development talent will get tighter. The catch-22 of this is that potential mentors are busy breaking new ground and writing code, making it hard for new experts to be developed.
  4. On track, OpenStack moves into its awkward adolescence. It is still gangly and rebelling against authority, but coming into its own. Expect to see a groundswell of installations and an expected wave of issues and challenges that will drive the community. By the “F” release, expect to see OpenStack cement itself as a serious, stable contender with notable public deployments and a significant international private deployment foot print.
  5. We’ll start seeing OpenStack Quantum (networking) in near-production pilots by year end. OpenStack Quantum is the glue that holds the big players in OpenStack Nova together. The potential for next generation cloud networking based on open standards is huge, but it will emerge without a killer app (OpenStack Nova in this case) pushing it forward. The OpenStack community will pull together to keep Quantum on track.
  6. Hadoop will cross into mainstream awareness as the need for big data analysis grows exponentially along with the data. Hadoop is on fire in select circles and completely obscure in others. The challenge for Hadoop is there are not enough engineers who know how to operate it. We suspect that lack of expertise will throttle demand until we get more proprietary tools to simplify analysis. We also predict a lot of very rich entrepreneurs and VCs emerging from this market segment.
  7. DevOps will enter mainstream IT discussions. Marketers from major IT brands will struggle and fail to find a better name for the movement. Our prediction is that by 2015, it will just be the way that “IT” is done and the name won’t matter.
  8. KVM continues to gain believers as the open source hypervisor. In 2011, I would not have believed this prediction but KVM making great strides and getting a lot of love from the OpenStack community, though Xen is also a key open source technology as well. I believe that Libvirt compatibility between LXE & KVM will further accelerate both virtualization approaches. 
      
  9. Big Data and NoSQL will continue to converge. While NoSQL enthusiasm as a universal replacement for structured databases appears to be deflating, real applications will win.
  10. Java will continue to encounter turbulence as a software platform under Oracle’s overly heady handed management.
  11. PaaS continues to be a confusing term. Cloud players will struggle with a definition but I don’t think a common definition will surface in 2012. I think the big news will be convergence between DevOps and PaaS; however, that will be under the radar since most of the market is still getting educated on both of those concepts.
  12. Hybrid cloud will continue to make strides but will not truly emerge in 2012 – we’ll try to develop this technology, and expose gaps that will get us there ultimately (see PaaS and Quantum above)

Thoughts?  We’d love to hear your comments.

Rob, JBG, and Stephen

You can follow Rob at www.RobHirschfeld.com or @zehicle on Twitter.
You can follow Joseph at www.JBGeorge.net or @jbgeorge on Twitter.

You can follow Stephen at http://en.community.dell.com/members/dell_2d00_stephen-sp/blogs/default.aspx or @SpectoratDell on Twitter.

Crowbar 1.2 released includes OpenStack Diablo Final

With the holiday rush, I neglected to post about Monday’s Crowbar v1.2 release (ISO here)!

The core focus for this release was to support the OpenStack Diablo Final bits (which my employer, Dell, includes as part of the “Dell OpenStack Powered Cloud Solution“); however, we added a lot of other capability as we continue to iterate on Crowbar.

I’m proud of our team’s efforts on this release on both on features and quality.  I’m equally delighted about the Crowbar community engagement via the Crowbar list server.  Crowbar is not hardware or operating system specific so it’s encouraging to hear about deployments on other gear and see the community helping us port to new operating system versions.

We driving more and more content to Crowbar’s Github as we are working to improve community visibility for Crowbar.  As such, I’ve been regularly updating the Crowbar Roadmap.  I’m also trying to make videos for Crowbar training (suggestions welcome!).  Please check back for updates about upcoming plans and sprint activity.

Crowbar Added Features in v1.2:

  • Central feature was OpenStack Diablo Final barclamps (tag “openstack-os-build”)
  • Improved barclamp packaging
  • Added concepts for “meta” barclamps that are suites of other barclamps
  • Proposal queue and ordering
  • New UI states for nodes & barclamps (led spinner!)
  • Install includes self-testing
  • Service monitoring (bluepill)

Looking forward

Dell has a long list of pending Hadoop and OpenStack deployments using these bits so you can expect to see updates and patches matching our field experiences.  We are very sensitive to community input and want to make Crowbar the best way to deliver a sustainable repeatable reference deployment of OpenStack, Hadoop and other cloud technologies.

Barclamps: now with added portability!

I had a question about moving barclamps between solutions.  Since Victor just changed the barclamp build to create a tar for each barclamp (with the debs/rpms), I thought it was the perfect time to explain the new feature.

You can find the barclamps on the Crowbar ISO under “/dell/barclamps” and you can install the TAR onto a Crowbar system using “./barclamp_install foo.tar.gz” where foo is the name of your barclamp.

Here’s a video of how to find and install barclamp tars:

Note: while you can install OpenStack into a Hadoop system, that combination is NOT tested.  We only test OpenStack on Ubuntu 10.10 and Hadoop on RHEL 5.7.   Community help in expanding support is always welcome!

Hadoop Crowbar released to open source! (plus AN HOUR of videos!)

I’m proud to announce that my team at Dell has open sourced our Apache Hadoop barclamps!  This release follows our Dell | Cloudera Hadoop Solution open source commitment from Hadoop World earlier this month.

As part of this release, we’ve created nearly AN HOUR of video content showing the Hadoop Barclamps in action, installing Crowbar (on CentOS), building Crowbar ISOs in the cloud and specialized developer focused builds.

If you want to talk to the Crowbar team.  We’re attending events in Boston 11/29, Seattle 11/30, and Austin 12/8.

Here are links to the videos:

More Hadoop perspectives from Dell:  Joseph George on what it means and  Barton George‘s backgrounder about barclamps.