Posted by Rob H in Crowbar, Dell, DevOps, Hadoop.
Tags: ARM, Crowbar, Dell, DevOps
One of my team at Dell’s most critical lessons from hyperscale cloud deployments was the DevOps tooling and operations processes are key to success. Our crowbar project was born out of this realization.
I have been tracking the progress the Copper ARM-based server from design to implementation internally. Now, I’m excited to see it getting some deserved attention.
The Copper platform is really cool because the cost, power, and density ratios of the nodes are unparalleled. This makes it an ideal platform for distributed mixed compute/store workloads like Hadoop. The nodes in the platform have excellent RAM/CPU/Spindle ratios.
While Copper is driving huge density, it also drives forward the same hyperscale challenges that we’ve been trying to address with Crowbar; consequently, we’re already working to ensure that we can deploy and manage Copper with Crowbar at scale.
Copper and Crowbar make a natural team and we’re excited to be part of today’s announcement:
Dell is staging clusters of the Dell “Copper” ARM server within the Dell Solution Centers and with TACC so developers may book time on the platforms. Dell also will deliver an ARM-supported version of Crowbar, Dell’s open-source management infrastructure software, to the industry in the future.
Congratulations to the Copper team!
Posted by Rob H in OpenStack, Lean, Crowbar, Hadoop, Dell, Process Interlock.
Tags: hadoop, OpenStack, Crowbar, Interlock, v1.4
Don’t blink if you’ve been watching the Crowbar release roadmap!
My team at Dell is about to turn another release of Crowbar. Version 1.3 released 5/14 (focused on Cloudera Apache Hadoop) and our original schedule showed several sprints of work on OpenStack Essex. Upon evaluation, we believe that the current community developed Essex barclamps are ready now.
The healthy state of the OpenStack Essex deployment is a reflection of 1) the quality of Essex and 2) our early community activity in creating deployments based Essex RC1 and Ubuntu Beta1.
We are planning many improvements to our OpenStack Essex and Crowbar Framework; however, most deployments can proceed without these enhancements. This also enables participants in the 5/31 OpenStack Essex Deploy Day.
By releasing a core stable Essex reference deployment, we are accelerating field deployments and enabling the OpenStack ecosystem. In terms of previous posts, we are eliminating release interlocks to enable more downstream development. Ultimately, we hope that we are also creating a baseline OpenStack deployment (link pending).
We are also reducing the pressure to rush more disruptive Crowbar changes (like enabling high availability, adding multiple operating systems, moving to Rails 3, fewer crowbarisms in cookbooks and streamlining networking). With this foundational Essex release behind us (we call it an MVP), we can work on more depth and breadth of capability in OpenStack.
One small challenge, some of the changes that we’d expected to drop have been postponed slightly. Specifically, markdown based documentation (/docs) and some new UI pages (/network/nodes, /nodes/families). All are already in the product under but not wired into the default UI (basically, a split test).
On the bright side, we did manage to expose 10g networking awareness for barclamps; however, we have not yet refactored to barclamps to leverage the change.
Posted by Rob H in Cloudera, Crowbar, Hadoop, OpenStack.
I’m very pleased to post we’ve cut the 1.3 Crowbar release!
1.3 Release Highlights (branch “elefante”)
- Introduction of Cloudera 3.7 Hadoop Barclamp
- New Operating System versions: Ubuntu 11.10, RHEL 6.2, Centos 6.2
- Upgrade the Sledgehammer image to Centos 6.2
- Alias & Group Feature (Alias is linked into DNS & Chef Search)
- Barclamp import from UI
- Pre-populate Node names & descriptions
- Export of logs & database snapshot from UI
- For the Dell Additions: Support for 12g Hardware Models (720xd & 720) via WSMAN
1.4 Previews already in the tree (“essex-hack” branch)
We’ve been working in advance for the 1.4 Release on the Essex-Hack branch.
- Ubuntu 12.04 Support
- OpenStack Essex Packages (from Ubuntu)
Posted by Rob H in Big Data Analytics, Cloudera, Crowbar, Dell, Hadoop, Open source.
Tags: Barclamp, CloudEra, Crowbar, Dell, hadoop, Open Source
My team at Dell has been driving to transparency and openness around Crowbar plus our OpenStack and Hadoop powered solutions. Specifically, our work for our coming release is maintained in the open on the Dell CloudEdge Github site. You can see (and participate in!) our development and validation work in advance of our official release.
I’m pleased to note that our Cloudera Manager barclamp has been posted to Github!
This barclamp supersedes the Hadoop barclamp in the next release of the Dell | Cloudera Apache Hadoop solution. You can built it in Crowbar using the “cloudera-os-build” branch for Crowbar. Do not fear! The Hadoop barclamp still exists (hadoop-os-build branch).
Both the new and original Hadoop barclamp use the Cloudera Hadoop distribution (aka CDH); however, the new barclamp is able to leverage Cloudera‘s latest management capabilities. For the Dell solution, Cloudera Manager has always been part of the offering. The primary difference is that we are improving the level of integration. I promise to post more about the features of the solution as we get closer to release.
Posted by Rob H in Big Data Analytics, Cloudera, Hadoop, Open source.
Tags: Big Data, CloudEra, hadoop, HDFS, map-reduce, Unstructured Data
This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).
Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.
Big Data Analytics spins data straws into information gold.
Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information. (video of my Hadoop “escalator pitch”)
Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.
The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.
This is really two problems: storing a lot of data and then computing over it.
Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.
For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).
Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.
Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.
Posted by Rob H in Dell, Hadoop.
Tags: Big Data, Dell, GPU, Webcast
Here’s something from my employer (Dell) that may be interesting to you: it’s about using GPUs for Big Data Analytics. I meant to discuss/post this earlier, but… oh well. Here’s the information
Premieres LIVE: 2pm EST (11 AM PST) TODAY Free - Register Now!
What You’ll Learn:
- Not just for video games any more: GPUs for simulation and parallel processing
- Impact on business workflows in seismic processing, interpretation and reservoir modeling
- ROI: 5x performance in 5 days
- Cost-effective and flexible cluster configurations
- Show me the metrics: Tangible results from a variety of customers
Need More Details?
Posted by Rob H in Crowbar, Hadoop, Meetup, OpenStack.
Tags: Chef, Crowbar, FoodFight, hadoop, meetup, OpenStack, quantum, szollose
Please don’t confuse a lack of posts with a lack of activity! I’ve been in the center of a whirlwind of Crowbar, OpenStack and Hadoop for my team at Dell. I’ve also working on an interesting side project with Liquid Leadership author (and would-be star ship captain) Brad Szollose.
I just don’t have time to post all of the awesomeness. I can tell you that my team is very focused on Hadoop (RHEL 6.2/CentOS 6.2 + open Cloudera Distro) barclamps as we get some Diablo deployments done. Also the Crowbar list has been very active about Diablo. If you’re looking for advanced information, there is some inside scoop on the Crowbar FoodFight podcast I did with Bryan Berry & Matt Ray.
I’ll be in BOSTON THIS WEDNESDAY 2/1 for the OpenStack Meetup there. We’re going to be talking about Quantum and the OpenStack Foundation. I suspect that Keystone will come up too (but that’s the subject of another post). Of course, it’s not just your humble blogger: the whole Dell CloudEdge OpenStack/Crowbar team will be on hand! So put on your cloud geek hat and take a trip to Harvard for the meetup!
Posted by Rob H in CloudFoundry, CloudOps, Crowbar, DevOps, Greg Althaus, Hadoop, Linux, Open source, OpenStack.
I don’t usually call out my credentials, but knowing the I have a Masters in Industrial Engineering helps (partially) explain my passion for process as being essential to successful software delivery. One of my favorite authors, Mary Poppendiek, explains undeployed code as perishable inventory that you need to get to market before it loses value. The big lessons (low inventory, high quality, system perspective) from Lean manufacturing translate directly into software and, lately, into operation as DevOps.
What we have observed from delivering our own cloud products, and working with customers on thier’s, is that the operations process for deployment is as important as the software and hardware. It is simply not acceptable for us to market clouds without a compelling model for maintaining the solution into the future. Clouds are simply moving too fast to be delivered without a continuous delivery story.
This white paper [link here!] has been available since the OpenStack conference, but not linked to the rest of our OpenStack or Crowbar content.
Posted by Rob H in Crowbar, Hadoop, Video.
Tags: Alias, Crowbar, Ganglia, hadoop, OpenStack
My team at Dell is still figuring out some big items for the 1.3 release; however, somethings were just added that is worth calling out.
- Ubuntu 11.04 support! Thanks to Justin Shepherd from Rackspace Cloud Builders!
- Alias names for nodes in the UI
- User managed node groups in the UI
- Ability to pre-populate the alias, description and group for a node (not integrated with DNS yet)
- Hadoop is working again – we addressed the missing Ganglia repo issue. Thanks to Victor Lowther.
Also, I’ve spun new open source ISOs with the new features. User beware!
Posted by Rob H in Crowbar, Hadoop, OpenStack.
Tags: Crowbar, Diablo, hadoop, OpenStack
With the holiday rush, I neglected to post about Monday’s Crowbar v1.2 release (ISO here)!
The core focus for this release was to support the OpenStack Diablo Final bits (which my employer, Dell, includes as part of the “Dell OpenStack Powered Cloud Solution“); however, we added a lot of other capability as we continue to iterate on Crowbar.
I’m proud of our team’s efforts on this release on both on features and quality. I’m equally delighted about the Crowbar community engagement via the Crowbar list server. Crowbar is not hardware or operating system specific so it’s encouraging to hear about deployments on other gear and see the community helping us port to new operating system versions.
We driving more and more content to Crowbar’s Github as we are working to improve community visibility for Crowbar. As such, I’ve been regularly updating the Crowbar Roadmap. I’m also trying to make videos for Crowbar training (suggestions welcome!). Please check back for updates about upcoming plans and sprint activity.
Crowbar Added Features in v1.2:
- Central feature was OpenStack Diablo Final barclamps (tag “openstack-os-build”)
- Improved barclamp packaging
- Added concepts for “meta” barclamps that are suites of other barclamps
- Proposal queue and ordering
- New UI states for nodes & barclamps (led spinner!)
- Install includes self-testing
- Service monitoring (bluepill)
Looking forward
Dell has a long list of pending Hadoop and OpenStack deployments using these bits so you can expect to see updates and patches matching our field experiences. We are very sensitive to community input and want to make Crowbar the best way to deliver a sustainable repeatable reference deployment of OpenStack, Hadoop and other cloud technologies.