jump to navigation

Analyze This! Big Data | Apache Hadoop | Dell | Cloudera | Crowbar February 20, 2012

Posted by Rob H in Big Data Analytics, Cloudera, Hadoop, Open source.
Tags: , , , , ,
add a comment

This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).

Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.

Big Data Analytics spins data straws into information gold.

Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information.  (video of my Hadoop “escalator pitch”)

Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.

The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.

This is really two problems: storing a lot of data and then computing over it.

Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.

For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).

Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.

Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.

Why Governance Matters in Open Source: Discussing the OpenStack Foundation February 8, 2012

Posted by Rob H in RackSpace, OpenStack, Open source, Citrix, Meetup, Dell.
Tags: , , ,
1 comment so far

This post is part of my notes from the 2/1 Boston OpenStack meetup.

OpenStack Foundation

Your’s truly (Rob Hirschfeld) gave the presentation about the OpenStack Foundation.  To readers of this blog, it’s obvious that I’m a believer in the OpenStack mission; however, it’s not obvious how creating a foundation helps with that mission and why OpenStack needs its own. As one person at the meetup put it, “Why not? Every major project needs a foundation!”

Governance does not sound sexy compared to writing code and deploying clouds, but it’s very important to the success of the project.

Here are my notes without the poetic elocution I exuded during the meetup…

The basics:

  • What: Creating a neutral body to govern OpenStack. Rackspace has been leading OpenStack. This means that they own the copyrights, name and also pay the people who organize the community. They committed (to executives at Dell and others) that they would ultimately setup a standalone body to govern the project before the project was public and endorsed by those early partners. Dell (my employer), Citrix, Accenture and NASA were some of biggest names at the Austin conference launch.
  • Why: A neutral body is needed because a lot of companies are committing significant time and money to the project. They cannot risk their investments on Rackspace good will alone. This may mean many things. It could be they don’t like Rackspace direction or they feel that Rackspace is not investing enough.
  • When: Right now and over the next few releases.  You should give feedback right now on the OpenStack Foundations mission.  The actual foundation will take more time to establish because it requires legal work and funding commitments.
  • Who: The community – all stakeholders. This is important stuff! While trying to standup a financially independent Foundation, which requires moneys, the little guys are not left out. There is a clear realization and desire to enable independent developers and contributors and small players to have a seat at the table.
  • How Much: The amounts are unclear, but establishing a foundation will require a significant ongoing investment from highly involved and moneyed parties (Rackspace, Dell, Cisco, HP, Citrix, NTT, startups?, etc).  The funding will pay salaries for people dedicated to the community doing the things that I’ll discuss below.  Overall, the ROI for those investments must be clear!

The foundation does “governance.” But, what does that mean? Here is a list of vitally important work that the foundation is responsible for.

  • Branding – Protecting, certifying, and promoting the OpenStack brand is important because it ensures that “OpenStack” has a valuable and predictable meaning to contributors and users. A strong the brand also means a stronger temptation for people to abuse the brand by claiming compatibility, participation and integration.
  • API – Many would assume that the OpenStack API is the very heart of the project and there is merit to this position. As more and more OpenStack implementations emerge, it is essential that we have a body that can certify which implementations (and even which versions of the implementation!) are valid. This is a substantial value to the community because API integrity ensures project continuity and helps the ecosystem monetize the project. Note: my opinion differs from others here because I think we should favor API over implementation
  • Community – The OpenStack community is not an accident. It is the function of deliberate actions and choices made by Rackspace and supported by key contributors. That community requires virtual and physical places to coalesce and leaders to organize and manage those meeting places. The excellent conferences, wikis, blogs, media awareness, documentation and meetups are a product of consistent community management.
  • Arbitration – An open source community is a family and siblings do not always get along. Today, Rackspace must be very careful about balancing their own interests because they are like the oldest sibling playing the parent role – you can get away with it until something serious happens. We need a neutral party so that Rackspace can protect their own interests (alternate spin: because Rackspace protects their own interests at the expense of the community).
  • Leadership – OpenStack today is a collection of projects with individual leadership. We will increasingly need coordinated leadership as the number of projects and users increases. Centralized leadership is essential because the good of the project as a whole may mean sacrifices within individual projects. It may even mean that some projects chose to leave the OpenStack tent. Stewarding these challenges will require a new level of leadership.
  • Legal – This is a function of all the above but also something more. From a legal stand point, OpenStack be able to represent itself. There is a significant amount of intellectual property being created. It would be foolish to overlook that this property is valuable and needs adequate legal representation.

I used “vitally important” to describe the above items. Is that an exaggeration? Our goal is collaboration and that requires some infrastructure and rules to make it sustainable. We must have a foundation that encourages innovation (multiple implementations) and collaboration (discourages forking). Innovation and collaboration are the heartbeat of an open source project.

The foundation is vitally important because collaboration by competitors is fragile.

In addition to the core areas above, the foundation needs to handle routine tactical items such as:

  • Delivering on milestones & releases
  • Moving new subprojects into OpenStack
  • Electing and maintaining Project Policy Board
  • Electing and maintaining Project Technical Leads
  • Ensuring adherence and extensions to the current bylaws

At the end of the day, OpenStack monetization is the central value for the Foundation.

In order for the OpenStack project, and thus its foundation, to flourish, the contributors, ecosystem, sponsors and users of the project must be able to see a reasonable return (ROI) on their investment. I would love to believe that the foundation is allow about people banding together to solve important problems for the benefit of all; however, it is more realistic to embrace that we can both collaborate and profit simultaneously. Acknowledging the pragmatic self-interested view allows us to create the right incentives and processes as embodied by the OpenStack foundation.

Concerns about SOPA January 18, 2012

Posted by Rob H in Culture, Open source.
Tags: , ,
add a comment

Joining in the blackout; however, I’ll defer to more articulate bloggers on this topic.

CloudOps white paper explains “cloud is always ready, never finished” January 16, 2012

Posted by Rob H in CloudFoundry, CloudOps, Crowbar, DevOps, Greg Althaus, Hadoop, Linux, Open source, OpenStack.
add a comment

I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.

What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.

This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.

2012: A year of Cloud Coalescence (whatever that means) January 5, 2012

Posted by Rob H in Clouds, Hadoop, Linux, Open source, OpenStack.
Tags: , , , , ,
1 comment so far

This post is a collaboration between three Dell Cloud activists: Rob Hirschfeld (@zehicle), Joseph B George (@jbgeorge) and Stephen Spector (@SpectoratDell).

We’re not making predictions for the “whole” Cloud market, this is a relatively narrow perspective based on technologies that on our daily radar. These views are strictly our own and based on publicly available data. They do not reflect plans, commitments, or internal data from our employer (Dell).

The major 2012 theme is cloud coalescence.  However, Rob worries that we’ll see slower adoption due to lack of engineers and confusing names/concepts.

Here are our twelve items for 2012:

  1. Open source continues to be a disruptive technology delivery model. It’s not “free” software – there’s an emerging IT culture that is doing business differently, including a number of large enterprises. The stable of sleeping giant vendors are waking up to this in 2012 but full engagement will take time.
  2. Linux. It is the cloud operating system and had a great 2012. It seems silly pointing this out since it seems obvious, but it’s the foundation for open source acceleration.
  3. Tight market for engineering and product development talent will get tighter. The catch-22 of this is that potential mentors are busy breaking new ground and writing code, making it hard for new experts to be developed.
  4. On track, OpenStack moves into its awkward adolescence. It is still gangly and rebelling against authority, but coming into its own. Expect to see a groundswell of installations and an expected wave of issues and challenges that will drive the community. By the “F” release, expect to see OpenStack cement itself as a serious, stable contender with notable public deployments and a significant international private deployment foot print.
  5. We’ll start seeing OpenStack Quantum (networking) in near-production pilots by year end. OpenStack Quantum is the glue that holds the big players in OpenStack Nova together. The potential for next generation cloud networking based on open standards is huge, but it will emerge without a killer app (OpenStack Nova in this case) pushing it forward. The OpenStack community will pull together to keep Quantum on track.
  6. Hadoop will cross into mainstream awareness as the need for big data analysis grows exponentially along with the data. Hadoop is on fire in select circles and completely obscure in others. The challenge for Hadoop is there are not enough engineers who know how to operate it. We suspect that lack of expertise will throttle demand until we get more proprietary tools to simplify analysis. We also predict a lot of very rich entrepreneurs and VCs emerging from this market segment.
  7. DevOps will enter mainstream IT discussions. Marketers from major IT brands will struggle and fail to find a better name for the movement. Our prediction is that by 2015, it will just be the way that “IT” is done and the name won’t matter.
  8. KVM continues to gain believers as the open source hypervisor. In 2011, I would not have believed this prediction but KVM making great strides and getting a lot of love from the OpenStack community, though Xen is also a key open source technology as well. I believe that Libvirt compatibility between LXE & KVM will further accelerate both virtualization approaches. 
      
  9. Big Data and NoSQL will continue to converge. While NoSQL enthusiasm as a universal replacement for structured databases appears to be deflating, real applications will win.
  10. Java will continue to encounter turbulence as a software platform under Oracle’s overly heady handed management.
  11. PaaS continues to be a confusing term. Cloud players will struggle with a definition but I don’t think a common definition will surface in 2012. I think the big news will be convergence between DevOps and PaaS; however, that will be under the radar since most of the market is still getting educated on both of those concepts.
  12. Hybrid cloud will continue to make strides but will not truly emerge in 2012 – we’ll try to develop this technology, and expose gaps that will get us there ultimately (see PaaS and Quantum above)

Thoughts?  We’d love to hear your comments.

Rob, JBG, and Stephen

You can follow Rob at www.RobHirschfeld.com or @zehicle on Twitter.
You can follow Joseph at www.JBGeorge.net or @jbgeorge on Twitter.

You can follow Stephen at http://en.community.dell.com/members/dell_2d00_stephen-sp/blogs/default.aspx or @SpectoratDell on Twitter.

Hadoop Crowbar released to open source! (plus AN HOUR of videos!) November 29, 2011

Posted by Rob H in Crowbar, Crowbar, Dell, Hadoop, Hadoop, Open source, Uncategorized, Video.
Tags: , , , , , , , ,
1 comment so far

I’m proud to announce that my team at Dell has open sourced our Apache Hadoop barclamps!  This release follows our Dell | Cloudera Hadoop Solution open source commitment from Hadoop World earlier this month.

As part of this release, we’ve created nearly AN HOUR of video content showing the Hadoop Barclamps in action, installing Crowbar (on CentOS), building Crowbar ISOs in the cloud and specialized developer focused builds.

If you want to talk to the Crowbar team.  We’re attending events in Boston 11/29, Seattle 11/30, and Austin 12/8.

Here are links to the videos:

More Hadoop perspectives from Dell:  Joseph George on what it means and  Barton George‘s backgrounder about barclamps.
(more…)

Rackspace unveils OpenStack reference architecture & private cloud offering November 8, 2011

Posted by Rob H in Open source, OpenStack, RackSpace.
Tags: , , , , , , , ,
1 comment so far

Yesterday, Rackspace Cloud Builders unveiled both their open reference architecture (RA) and a private cloud offering (on GigaOM) based upon the RA.  The RA (which is well aligned with our Dell OpenStack RA) does a good job laying out the different aspects of an OpenStack deployment.  It also calls for the use of Dell C6100 servers and the open source version of Crowbar.

The Rackspace RA and Crowbar deployment barclamps share the same objective: sharing of best practices for OpenStack operations.

Over the last 12+ months, my team at Dell has had the opportunity to work with many customers on OpenStack deployment designs.  While no two of these are identical, they do share many similarities.  We are pleased to collaborate with Rackspace and others on capturing these practices as operational code (or “opscode” if you want a reference to the Chef cookbooks that are an intrinsic part of Crowbar’s architecture).

In our customer interactions, we hear clearly that Crowbar must remain flexible and ready to adapt to both customer on-site requirements and evolution within the OpenStack code base.  You are also telling us that there is a broader application space for Crowbar and we are listening to that too.

I believe that it will take some time for the community and markets to process today’s Rackspace announcements.  Rackspace is showing strong leadership in both sharing information and commercialization around OpenStack.  Both of these actions will drive responses from the community members.

Dell is open sourcing Crowbar Apache Hadoop barclamps! November 8, 2011

Posted by Rob H in Barton George, Crowbar, Hadoop, Hadoop, Joseph George, Open source, Video.
Tags: , , , , ,
6 comments

I’m very excited to announce that my team at Dell will be open sourcing our Apache Hadoop Crowbar barclamps by the end of the month.

This release raises the bar on open Hadoop deployments by making them faster, scalable, more integrated and repeatable.

These barclamps were developed in conjunction with our licensed Dell | Cloudera Solution. The licensed solution is for customers seeking large scale and professionally supported big data solutions. The purpose of the open barclamps (which pull the open source parts from the Cloudera distro) is to help you get started with Hadoop and reduce your learning curve. Our team invested significant testing effort in ensuring that these barclamps work smoothly because they are the foundational layer of our for-pay Hadoop solution.

Included in the Hadoop barclamp suite are Hadoop Map Reduce, Hive, Pig, ZooKeeper and Sqoop running on RHEL 5.7. These barclamps cover the core parts of the Hadoop suite. Like other Crowbar deployments (see OpenStack), the barclamps automatically discover the service configurations and interoperate. One of our team members (call him Scott Jensen) said it very simply “I can deploy a fully an integrated Hadoop cluster in a few hours. That friggin’ rocks!” I just can’t put it more eloquently than that!

I’ll post again when we flip the “open” bit and invite our community to dig in and help us continue to set the standards on open Hadoop deployments.

For more perspectives on this release, check out posts by Barton George (just for devs), Joseph George (About Hadoop) and Aurelian Dumitru

Barton posted these two videos of me talking about the release too:

Hadoop & Crowbar:

Dev’s Only Short:

Talk with Team Crowbar! Online 11/8, Austin 11/15, Boston 11/29 & 11/29 & Seattle 11/30 November 7, 2011

Posted by Rob H in Andi Abes, Dell, Greg Althaus, Hadoop, Meetup, Open source, OpenStack, Opscode.
Tags: , , , , , , , , ,
2 comments

My team at Dell has been getting a great response from our community about Crowbar. Thanks! We’re actively working a rock solid OpenStack deployment that will raise the bar on ease of deploy and drive operational excellence.

We have also heard that we need to improve access to the team; consequently, I’m delighted to announce a long list of places and dates where you can access us online AND in person.

Here’s the list:

Or in a calendar view:

Sun Mon Tuesday Wed Thursday Fri Sat
11/8 Online
Crowbar Chat
11/15 Austin
Cloud User
11/29 Boston
OpenStack Meetup
11/30 Seattle
Crowbar Drinks TBD
12/6 Boston
Opscode BoaF
12/8 Austin
OpenStack Meetup

Dell Crowbar Project: Open Source Cloud Deployer expands into the Community October 18, 2011

Posted by Rob H in CloudFoundry, Crowbar, Hadoop, Open source, OpenStack, Opscode, RackSpace, VMware.
Tags: , , , , , , , , , , ,
1 comment so far

Note: Cross posted on Dell Tech Center Blogs.

Background: Crowbar is an open source cloud deployment framework originally developed by Dell to support our OpenStack and Hadoop powered solutions.  Recently, it’s scope has increased to include a DevOps operations model and other deployments for additional cloud applications.

It’s only been a matter of months since we open sourced the Dell Crowbar Project at OSCON in June 2011; however, the progress and response to the project has been over whelming.  Crowbar is transforming into a community tool that is hardware, operating system, and application agnostic.  With that in mind, it’s time for me to provide a recap of Crowbar for those just learning about the project.

Crowbar started out simply as an installer for the “Dell OpenStack™-Powered Cloud Solution” with the objective of deploying a cloud from unboxed servers to a completely functioning system in under four hours.  That meant doing all the BIOS, RAID, Operations services (DNS, NTP, DHCP, etc.), networking, O/S installs and system configuration required creating a complete cloud infrastructure.  It was a big job, but one that we’d been piecing together on earlier cloud installation projects.  A key part of the project involved collaborating with Opscode Chef Server on the many system configuration tasks.  Ultimately, we met and exceeded the target with a complete OpenStack install in less than two hours.

In the process of delivering Crowbar as an installer, we realized that Chef, and tools like it, were part of a larger cloud movement known as DevOps.

The DevOps approach to deployment builds up systems in a layered model rather than using packaged images.  This layered model means that parts of the system are relatively independent and highly flexible.  Users can choose which components of the system they want to deploy and where to place those components.  For example, Crowbar deploys Nagios by default, but users can disable that component in favor of their own monitoring system.  It also allows for new components to identify that Nagios is available and automatically register themselves as clients and setup application specific profiles.  In this way, Crowbar’s use of a DevOps layered deployment model provides flexibility for BOTH modularized and integrated cloud deployments.

We believe that operations that embrace layered deployments are essential for success because they allow our customers to respond to the accelerating pace of change.  We call this model for cloud data centers “CloudOps.”

Based on the flexibility of Crowbar, our team decided to use it as the deployment model for our Apache™ Hadoop™ project (“Dell | Apache Hadoop Solution”).  While a good fit, adding Hadoop required expanding Crowbar in several critical ways.

  1. We had to make major changes in our installation and build processes to accommodate multi-operating system support (RHEL 5.6 and Ubuntu 10.10 as of Oct 2011).
  2. We introduced a modularization concept that we call “barclamps” that package individual layers of the deployment infrastructure.  These barclamps reach from the lowest system levels (IPMI, BIOS, and RAID) to the highest (OpenStack and Hadoop).

Barclamps are a very significant architecture pattern for Crowbar:

  1. They allow other applications to plug into the framework and leverage other barclamps in the solution.  For example, VMware created a Cloud Foundry barclamp and Dream Host has created a Ceph barclamp.  Both barclamps are examples of applications that can leverage Crowbar for a repeatable and predictable cloud deployment.
  2. They are independent modules with their own life cycle.  Each one has its own code repository and can be imported into a live system after initial deployment.  This allows customers to expand and manage their system after initial deployment.
  3. They have many components such as Chef Cookbooks, custom UI for configuration, dependency graphs, and even localization support.
  4. They offer services that other barclamps can consume.  The Network barclamp delivers many essential services for bootstrapping clouds including IP allocation, NIC teaming, and node VLAN configuration.
  5. They can provide extensible logic to evaluate a system and make deployment recommendations.  So far, no barclamps have implemented more than the most basic proposals; however, they have the potential for much richer analysis.

Making these changes was a substantial investment by Dell, but it greatly expands the community’s ability to participate in Crowbar development.  We believe these changes were essential to our team’s core values of open and collaborative development.

Most recently, our team moved Crowbar development into the open.  This change was reflected in our work on OpenStack Diablo (+ Keystone and Dashboard) with contributions by Opscode and Rackspace Cloud Builders.  Rather than work internally and push updates at milestones, we are now coding directly from the Crowbar repositories on Github.  It is important to note that for licensing reasons, Dell has not open sourced the optional BIOS and RAID barclamps.  This level of openness better positions us to collaborate with the crowbar community.

For a young project, we’re very proud of the progress that we’ve made with Crowbar.  We are starting a new chapter that brings new challenges such as expanding community involvement, roadmap transparency, and growing Dell support capabilities.  You will also begin to see optional barclamps that interact with proprietary and licensed hardware and software.  All of these changes are part of growing Crowbar in framework that can support a vibrant and rich ecosystem.

We are doing everything we can to make it easy to become part of the Crowbar community.  Please join our mailing list, download the open source code or ISO, create a barclamp, and make your voice heard.  Since Dell is funding the core development on this project, contacting your Dell salesperson and telling them how much you appreciate our efforts goes a long way too.

Follow

Get every new post delivered to your Inbox.

Join 831 other followers