Today my boss at Dell, John Igoe, is part of announcing of the report from the TechAmerica Federal Big Data Commission (direct pdf), I was fully expecting the report to be a real snoozer brimming with corporate synergies and win-win externalities. Instead, I found myself reading a practical guide to applying Big Data to government. Flipping past the short obligatory “what is…” section, the report drives right into a survey of practical applications for big data spanning nearly every governmental service. Over half of the report is dedicated to case studies with specific recommendations and buying criteria.
Ultimately, the report calls for agencies to treat data as an asset. An asset that can improve how government operates.
There are a few items that stand out in this report:
Clear tables of case studies on page 16 and characteristics on page 11 that help pin point a path through the options.
Definitive advice to focus on a single data vector (velocity, volume or variety) for initial success on page 28 (and elsewhere)
I strongly agree with one repeated point in the report: although there is more data available, our ability to comprehend this data is reduced. The sheer volume of examples the report cites is proof enough that agencies are, and will be continue to be, inundated with data.
One short coming of this report is that it does not flag the extreme storage of data scientists. Many of the cases discussed assume a ready army of engineers to implement these solutions; however, I’m uncertain how the government will fill positions in a very tight labor market. Ultimately, I think we will have to simply open the data for citizen & non-governmental analysis because, as the report clearly states, data is growing faster than capability to use it.
I commend the TechAmerica commission for their Big Data clarity: success comes from starting with a narrow scope. So the answer, ironically, is in knowing which questions we want to ask.
Not only are we simultaneously releasing both of these solutions, they reflect a significant acceleration in pace of delivery. Both solutions had beta support for their core technologies (Cloudera 4 & OpenStack Essex) when the components were released and we have dramatically reduced the lag from component RC to solution release compared to past (3.7 & Diablo) milestones.
As before, the core deployment logic of these open source based solutions was developed in the open on Crowbar’s github. You are invited to download and try these solutions yourself. For Dell solutions, we include validated reference architectures, hardware configuration extensions for Crowbar, services and support.
The latest versions of Hadoop and OpenStack represent great strides for both solutions. It’s great to be able have made them more deployable and faster to evaluate and manage.
The response to Crowbar has been exciting and humbling. I most appreciate those who looked at Crowbar and saw more than a bare metal installer. They are the ones who recognized that we are trying to solve a bigger problem: it has been too difficult to cope with change in IT operations.
During this year, we have made many changes. Many have been driven by customer, user and partner feedback while others support Dell product delivery needs. Happily, these inputs are well aligned in intent if not always in timing.
Introduction of barclamps as modular components
Expansion into multiple applications (most notably OpenStack and Apache Hadoop)
Working in the open (with public commits)
Collaborative License Agreements
Dell‘s understanding of open source and open development has made a similar transformation. Crowbar was originally Apache 2 open sourced because we imagined it becoming part of the OpenStack project. While that ambition has faded, the practical benefits of open collaboration have proven to be substantial.
The results from this first year are compelling:
For OpenStack Diablo, coordination with the Rackspace Cloud Builder team enabled Crowbar to include the Keystone and Dashboard projects into Dell’s solution
We’ve amassed hundreds of mail subscribers and Github followers
Support for multiple releases of RHEL, Centos & Ubuntu including Ubuntu 12.04 while it was still in beta.
SuSE does their own port of Crowbar to SuSE with important advances in Crowbar’s install model (from ISO to package).
We stand on the edge of many exciting transformations for Crowbar’s second year. Based on the amount of change from this year, I’m hesitant to make long term predictions. Yet, just within next few months there are significant plans based on Crowbar 2.0 refactor. We have line of site to changes that expand our tool choices, improve networking, add operating systems and become more even production ops capable.
With the GA drop, the Crowbar Cloudera Barclamps are effectively at release candidate state (ISO). The Cloudera Barclamps include a freemium version of Cloudera Enterprise 4 that supports up to 50 nodes.
This post is a collaboration between three Dell Cloud activists: Rob Hirschfeld (@zehicle), Joseph B George (@jbgeorge) and Stephen Spector (@SpectoratDell).
We’re not making predictions for the “whole” Cloud market, this is a relatively narrow perspective based on technologies that on our daily radar. These views are strictly our own and based on publicly available data. They do not reflect plans, commitments, or internal data from our employer (Dell).
The major 2012 theme is cloud coalescence. However, Rob worries that we’ll see slower adoption due to lack of engineers and confusing names/concepts.
Here are our twelve items for 2012:
Open source continues to be a disruptive technology delivery model. It’s not “free” software – there’s an emerging IT culture that is doing business differently, including a number of large enterprises. The stable of sleeping giant vendors are waking up to this in 2012 but full engagement will take time.
Linux. It is the cloud operating system and had a great 2012. It seems silly pointing this out since it seems obvious, but it’s the foundation for open source acceleration.
Tight market for engineering and product development talent will get tighter. The catch-22 of this is that potential mentors are busy breaking new ground and writing code, making it hard for new experts to be developed.
On track, OpenStack moves into its awkward adolescence. It is still gangly and rebelling against authority, but coming into its own. Expect to see a groundswell of installations and an expected wave of issues and challenges that will drive the community. By the “F” release, expect to see OpenStack cement itself as a serious, stable contender with notable public deployments and a significant international private deployment foot print.
We’ll start seeing OpenStack Quantum (networking) in near-production pilots by year end. OpenStack Quantum is the glue that holds the big players in OpenStack Nova together. The potential for next generation cloud networking based on open standards is huge, but it will emerge without a killer app (OpenStack Nova in this case) pushing it forward. The OpenStack community will pull together to keep Quantum on track.
Hadoop will cross into mainstream awareness as the need for big data analysis grows exponentially along with the data. Hadoop is on fire in select circles and completely obscure in others. The challenge for Hadoop is there are not enough engineers who know how to operate it. We suspect that lack of expertise will throttle demand until we get more proprietary tools to simplify analysis. We also predict a lot of very rich entrepreneurs and VCs emerging from this market segment.
DevOps will enter mainstream IT discussions. Marketers from major IT brands will struggle and fail to find a better name for the movement. Our prediction is that by 2015, it will just be the way that “IT” is done and the name won’t matter.
KVM continues to gain believers as the open source hypervisor. In 2011, I would not have believed this prediction but KVM making great strides and getting a lot of love from the OpenStack community, though Xen is also a key open source technology as well. I believe that Libvirt compatibility between LXE & KVM will further accelerate both virtualization approaches.
Big Data and NoSQL will continue to converge. While NoSQL enthusiasm as a universal replacement for structured databases appears to be deflating, real applications will win.
Java will continue to encounter turbulence as a software platform under Oracle’s overly heady handed management.
PaaS continues to be a confusing term. Cloud players will struggle with a definition but I don’t think a common definition will surface in 2012. I think the big news will be convergence between DevOps and PaaS; however, that will be under the radar since most of the market is still getting educated on both of those concepts.
Hybrid cloud will continue to make strides but will not truly emerge in 2012 – we’ll try to develop this technology, and expose gaps that will get us there ultimately (see PaaS and Quantum above)
This release raises the bar on open Hadoop deployments by making them faster, scalable, more integrated and repeatable.
These barclamps were developed in conjunction with our licensed Dell | Cloudera Solution. The licensed solution is for customers seeking large scale and professionally supported big data solutions. The purpose of the open barclamps (which pull the open source parts from the Cloudera distro) is to help you get started with Hadoop and reduce your learning curve. Our team invested significant testing effort in ensuring that these barclamps work smoothly because they are the foundational layer of our for-pay Hadoop solution.
Included in the Hadoop barclamp suite are Hadoop Map Reduce, Hive, Pig, ZooKeeper and Sqoop running on RHEL 5.7. These barclamps cover the core parts of the Hadoop suite. Like other Crowbar deployments (see OpenStack), the barclamps automatically discover the service configurations and interoperate. One of our team members (call him Scott Jensen) said it very simply “I can deploy a fully an integrated Hadoop cluster in a few hours. That friggin’ rocks!” I just can’t put it more eloquently than that!
I’ll post again when we flip the “open” bit and invite our community to dig in and help us continue to set the standards on open Hadoop deployments.