jump to navigation

Analyze This! Big Data | Apache Hadoop | Dell | Cloudera | Crowbar February 20, 2012

Posted by Rob H in Big Data Analytics, Cloudera, Hadoop, Open source.
Tags: , , , , ,
add a comment

This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).

Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.

Big Data Analytics spins data straws into information gold.

Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information.  (video of my Hadoop “escalator pitch”)

Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.

The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.

This is really two problems: storing a lot of data and then computing over it.

Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.

For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).

Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.

Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.

2/9 Webcast about mixing GPUs & Big Data Analysis February 9, 2012

Posted by Rob H in Dell, Hadoop.
Tags: , , ,
add a comment

Here’s something from my employer (Dell) that may be interesting to you: it’s about using GPUs for Big Data Analytics.   I meant to discuss/post this earlier, but… oh well.  Here’s the information

Premieres LIVE: 2pm EST (11 AM PST) TODAY  Free  - Register Now!         

What You’ll Learn:

  • Not just for video games any more: GPUs for simulation and parallel processing
  • Impact on business workflows in seismic processing, interpretation and reservoir modeling
  • ROI: 5x performance in 5 days
  • Cost-effective and flexible cluster configurations
  • Show me the metrics: Tangible results from a variety of customers

Need More Details?

Crowbar+OpenStack Insights for the week: Food Fight Podcast & Boston Meetup 2/1 January 30, 2012

Posted by Rob H in Crowbar, Hadoop, Meetup, OpenStack.
Tags: , , , , , , ,
add a comment

Please don’t confuse a lack of posts with a lack of activity!  I’ve been in the center of a whirlwind of Crowbar, OpenStack and Hadoop for my team at Dell.  I’ve also working on an interesting side project with Liquid Leadership author (and would-be star ship captain) Brad Szollose.

I just don’t have time to post all of the awesomeness.  I can tell you that my team is very focused on Hadoop (RHEL 6.2/CentOS 6.2 + open Cloudera Distro) barclamps as we get some Diablo deployments done.  Also the Crowbar list has been very active about Diablo.  If you’re looking for advanced information, there is  some inside scoop on the Crowbar FoodFight podcast I did with Bryan Berry & Matt Ray.

I’ll be in BOSTON THIS WEDNESDAY 2/1 for the OpenStack Meetup there.  We’re going to be talking about Quantum and the OpenStack Foundation.  I suspect that Keystone will come up too (but that’s the subject of another post).  Of course, it’s not just your humble blogger: the whole Dell CloudEdge OpenStack/Crowbar team will be on hand!  So put on your cloud geek hat and take a trip to Harvard for the meetup!

CloudOps white paper explains “cloud is always ready, never finished” January 16, 2012

Posted by Rob H in CloudFoundry, CloudOps, Crowbar, DevOps, Greg Althaus, Hadoop, Linux, Open source, OpenStack.
add a comment

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.

Early crop of Crowbar 1.3 features popping up January 11, 2012

Posted by Rob H in Crowbar, Hadoop, Video.
Tags: , , , ,
7 comments

My team at Dell is still figuring out some big items for the 1.3 release; however, somethings were just added that is worth calling out.

  1. Ubuntu 11.04 support!   Thanks to Justin Shepherd from Rackspace Cloud Builders!
  2. Alias names for nodes in the UI
  3. User managed node groups in the UI
  4. Ability to pre-populate the alias, description and group for a node (not integrated with DNS yet)
  5. Hadoop is working again – we addressed the missing Ganglia repo issue.  Thanks to Victor Lowther.
For items 2 – 4, I made a short video tour: Node Alias & Group

Also, I’ve spun new open source ISOs with the new features.  User beware!

Crowbar 1.2 released includes OpenStack Diablo Final December 21, 2011

Posted by Rob H in Crowbar, Hadoop, OpenStack.
Tags: , , ,
17 comments

With the holiday rush, I neglected to post about Monday’s Crowbar v1.2 release (ISO here)!

The core focus for this release was to support the OpenStack Diablo Final bits (which my employer, Dell, includes as part of the “Dell OpenStack Powered Cloud Solution“); however, we added a lot of other capability as we continue to iterate on Crowbar.

I’m proud of our team’s efforts on this release on both on features and quality.  I’m equally delighted about the Crowbar community engagement via the Crowbar list server.  Crowbar is not hardware or operating system specific so it’s encouraging to hear about deployments on other gear and see the community helping us port to new operating system versions.

We driving more and more content to Crowbar’s Github as we are working to improve community visibility for Crowbar.  As such, I’ve been regularly updating the Crowbar Roadmap.  I’m also trying to make videos for Crowbar training (suggestions welcome!).  Please check back for updates about upcoming plans and sprint activity.

Crowbar Added Features in v1.2:

  • Central feature was OpenStack Diablo Final barclamps (tag “openstack-os-build”)
  • Improved barclamp packaging
  • Added concepts for “meta” barclamps that are suites of other barclamps
  • Proposal queue and ordering
  • New UI states for nodes & barclamps (led spinner!)
  • Install includes self-testing
  • Service monitoring (bluepill)

Looking forward

Dell has a long list of pending Hadoop and OpenStack deployments using these bits so you can expect to see updates and patches matching our field experiences.  We are very sensitive to community input and want to make Crowbar the best way to deliver a sustainable repeatable reference deployment of OpenStack, Hadoop and other cloud technologies.

Hadoop Crowbar released to open source! (plus AN HOUR of videos!) November 29, 2011

Posted by Rob H in Crowbar, Crowbar, Dell, Hadoop, Hadoop, Open source, Uncategorized, Video.
Tags: , , , , , , , ,
1 comment so far

I’m proud to announce that my team at Dell has open sourced our Apache Hadoop barclamps!  This release follows our Dell | Cloudera Hadoop Solution open source commitment from Hadoop World earlier this month.

As part of this release, we’ve created nearly AN HOUR of video content showing the Hadoop Barclamps in action, installing Crowbar (on CentOS), building Crowbar ISOs in the cloud and specialized developer focused builds.

If you want to talk to the Crowbar team.  We’re attending events in Boston 11/29, Seattle 11/30, and Austin 12/8.

Here are links to the videos:

More Hadoop perspectives from Dell:  Joseph George on what it means and  Barton George‘s backgrounder about barclamps.
(more…)

Seattle meetup on 11/30 (will bring massive laptops for OpenStack, Hadoop & Crowbar demos) November 22, 2011

Posted by Rob H in Crowbar, Greg Althaus, Hadoop, Meetup, OpenStack, Opscode.
add a comment

After Greg Althaus and I are done attending the sold out Opcode Community Summit (11/29-30), Opscode has offered to let us have an informal meetup at Opscode HQ from 6:30 to 8pm on 11/30.  I’ve proposed this as an official Seattle OpenStack Meetup (waiting on confirmation from @heckj).

We’re not limiting the agenda to OpenStack!  We’ll happily talk about Hadoop, Crowbar, Opscode or any other cloud technology that’s on your mind.  For 90 minutes, we’re offering Cloud Geeking as a Service (CGaaS).

Not in Seattle?  Never fear!  You can hook up with other members of my team at Dell in Boston on 11/29 & Austin 12/8.

Dell is open sourcing Crowbar Apache Hadoop barclamps! November 8, 2011

Posted by Rob H in Barton George, Crowbar, Hadoop, Hadoop, Joseph George, Open source, Video.
Tags: , , , , ,
6 comments

I’m very excited to announce that my team at Dell will be open sourcing our Apache Hadoop Crowbar barclamps by the end of the month.

This release raises the bar on open Hadoop deployments by making them faster, scalable, more integrated and repeatable.

These barclamps were developed in conjunction with our licensed Dell | Cloudera Solution. The licensed solution is for customers seeking large scale and professionally supported big data solutions. The purpose of the open barclamps (which pull the open source parts from the Cloudera distro) is to help you get started with Hadoop and reduce your learning curve. Our team invested significant testing effort in ensuring that these barclamps work smoothly because they are the foundational layer of our for-pay Hadoop solution.

Included in the Hadoop barclamp suite are Hadoop Map Reduce, Hive, Pig, ZooKeeper and Sqoop running on RHEL 5.7. These barclamps cover the core parts of the Hadoop suite. Like other Crowbar deployments (see OpenStack), the barclamps automatically discover the service configurations and interoperate. One of our team members (call him Scott Jensen) said it very simply “I can deploy a fully an integrated Hadoop cluster in a few hours. That friggin’ rocks!” I just can’t put it more eloquently than that!

I’ll post again when we flip the “open” bit and invite our community to dig in and help us continue to set the standards on open Hadoop deployments.

For more perspectives on this release, check out posts by Barton George (just for devs), Joseph George (About Hadoop) and Aurelian Dumitru

Barton posted these two videos of me talking about the release too:

Hadoop & Crowbar:

Dev’s Only Short:

Talk with Team Crowbar! Online 11/8, Austin 11/15, Boston 11/29 & 11/29 & Seattle 11/30 November 7, 2011

Posted by Rob H in Andi Abes, Dell, Greg Althaus, Hadoop, Meetup, Open source, OpenStack, Opscode.
Tags: , , , , , , , , ,
2 comments

My team at Dell has been getting a great response from our community about Crowbar. Thanks! We’re actively working a rock solid OpenStack deployment that will raise the bar on ease of deploy and drive operational excellence.

We have also heard that we need to improve access to the team; consequently, I’m delighted to announce a long list of places and dates where you can access us online AND in person.

Here’s the list:

Or in a calendar view:

Sun Mon Tuesday Wed Thursday Fri Sat
11/8 Online
Crowbar Chat
11/15 Austin
Cloud User
11/29 Boston
OpenStack Meetup
11/30 Seattle
Crowbar Drinks TBD
12/6 Boston
Opscode BoaF
12/8 Austin
OpenStack Meetup
Follow

Get every new post delivered to your Inbox.

Join 831 other followers