OpenStack Essex Events (Austin & Boston 3/8, WW Hack Day 3/1, Docs 3/6) February 29, 2012
Posted by Rob H in Dell, HP, Meetup, OpenStack, OpenStack Design Summit.Tags: Austin, Boston, Dell, essex, HP, meetup, OpenStack
add a comment
The excitement over the OpenStack Essex release is building! While my team has been making plans around the upcoming design summit in SF, there is more immediate action afoot.
Tomorrow (3/1), numerous sites are gathering around a World Wide Essex Hack Day on 3/1. If you want to participate or even host a hack venue, get on the list and IRC channel (details).
My team at Dell is organizing a community a follow-up OpenStack Essex Install Day next week (3/8) in both Austin and Boston. Just like the Hack Day, the install fest will focus on Essex release code with both online and local presence. Unlike the Hack Day, our focus will be on deployments. For the Dell team, that means working on the Essex deployment for Crowbar. We’re still working on a schedule and partner list so stay tuned. I’m trying to webcast Crowbar & OpenStack training sessions during the install day.
The hack day will close with the regularly scheduled 3/8 OpenStack Austin Meetup (6:30pm at Austin TechRanch). The topic for the meetup will be, …. wait for it …., the Essex Release. Thanks go to HP and Dell for sponsoring!
It’s important to note that Anne Gentle is also coordinating an OpenStack Essex Doc Day on 3/6.
To recap:
- 3/1: World Wide Essex Hack Day
- 3/6: OpenStack Doc Day.
- 3/8: OpenStack Install Day
- 3/8: OpenStack Austin Meetup
Wow… that should satisfy your Essex cravings.
Work with me! Our Dell team is hiring architects, engineers & open source gurus February 27, 2012
Posted by Rob H in Dell, Hiring.Tags: Dell, DevOps, Engineers, hadoop, Hiring, OpenStack
add a comment
If you’ve been watching my team’s progress at Dell on Crowbar, OpenStack and Hadoop and want a front row seat in these exciting open source projects then the ball is in our your court! We are poised to take all three of these projects into new territories that I cannot reveal here, but, take my word for it, there has never been a better time to join our team.
Let me repeat: my team has a lot of open engineering and marketing positions.
Not only are we doing some really kick ass projects, we are also helping redefine how Dell delivers software. Dell is investing significantly in building our software capabilities and focus.
Basically, we are looking for engineers with a passion for scale applications, devops and open source. Experience in Hadoop and/or OpenStack will move you to the top of the pile. These positions say Hadoop, but we’re also looking for OpenStack, DevOps and Chef. We think like a start-up.
- Engineering
- Marketing
If you are interested, the BEST NEXT STEP IS TO APPLY ONLINE.
Analyze This! Big Data | Apache Hadoop | Dell | Cloudera | Crowbar February 20, 2012
Posted by Rob H in Big Data Analytics, Cloudera, Hadoop, Open source.Tags: Big Data, CloudEra, hadoop, HDFS, map-reduce, Unstructured Data
add a comment
This article about Target using buying patterns to expose a teen was pregnant before she told her parents puts big data analysis into everyday terms better than the following 555 words (of course, I recommend that you read both).
Recently, I had the pleasure of being one of our team presenting Dell’s BIG DATA story at an internal conference. From the questions and buzz, it’s clear that the big data is big news this year. My team is at the center of that storm because we are responsible for the Dell | Cloudera Apache™ Hadoop™ solution. The solution is significant because we’ve integrated many pieces necessary to build and sustain a Hadoop cluster: that includes Dell servers, the Cloudera Hadoop distribution, the Crowbar framework and Services to make it useful.
Big Data Analytics spins data straws into information gold.
Before I jump into technical details, it’s worth stating the big data analytics value proposition. The problem is that we are awash in a tsunami of data: we’ve grown beyond the neat rows and columns of application databases, data today include source like website click logs and emails to call records and cash register receipts to including social media tweets and posts. While much of the data is unstructured noise, there is also incredibility valuable information. (video of my Hadoop “escalator pitch”)
Value is not just hidden inside the bulk data; it lies in correlations between sets of the data.
The big data analytics value proposition is to provide a system to hold a lot of loosely structured information (thus “big data”) and then sift and correlate the information (thus “analytics”). The result is a technology that helps us make data driven decisions. In many applications, the analysis is fed directly back into applications so they can alter behavior in near real-time. For example, an online retail store could offer you purple bunny slippers as you browse for crowbars in the hardware section knowing that you’re reading this post. That is the type of correlations on disparate data that I’m talking about.
This is really two problems: storing a lot of data and then computing over it.
Hadoop, the leading open source big data analytics project, is a suite of applications that implement and extend two core capabilities: a distributed file system (HDFS) and the map-reduce (M-R) algorithm. My point is not to define Hadoop (others have done better and here); instead, I want to highlight that it’s a combination big data analysis is a merger of storage and compute. When learning about any big data analysis solution, you cannot decouple how the data is stored from how the data is analyzed – storage and compute are fundamentally linked.
For that reason, the architecture of a Hadoop cluster is different than either a traditional database or compute cluster. The IO and the resiliency patterns are different. Since Hadoop is a distributed system, hardware redundancy is less important and eliminating IO bottlenecks is paramount. For this reason, our Hadoop clusters use a lot of local, non-RAID drives with a target of delivering a 1:1 CPU core to spindle ratio (ratios are tuned based on planned loads).
Imagine that you are looking for correlations in web click data. To do that analysis, Hadoop need to spend a lot of time cracking open log files, sifting for specific data and then reporting back its results. That process involves thousands of jobs each doing disk IO, CPU & RAM workload and then network transfer; consequently, contention between network and disk demands reduces performance.
Wow… that’s a lot of description and just scratching the surface of Big Data Analytics. I’ll going to have to add the technical details about the Dell solution architecture (Hardware) and software components (Cloudera & Crowbar) in another post.
2/9 Webcast about mixing GPUs & Big Data Analysis February 9, 2012
Posted by Rob H in Dell, Hadoop.Tags: Big Data, Dell, GPU, Webcast
add a comment
Here’s something from my employer (Dell) that may be interesting to you: it’s about using GPUs for Big Data Analytics. I meant to discuss/post this earlier, but… oh well. Here’s the information
Premieres LIVE: 2pm EST (11 AM PST) TODAY Free - Register Now!
What You’ll Learn:
- Not just for video games any more: GPUs for simulation and parallel processing
- Impact on business workflows in seismic processing, interpretation and reservoir modeling
- ROI: 5x performance in 5 days
- Cost-effective and flexible cluster configurations
- Show me the metrics: Tangible results from a variety of customers
OpenStack Boston Meetup 2/1 covers Quantum & Foundation February 8, 2012
Posted by Rob H in Andi Abes, Cisco, Dell, HP, Meetup, OpenStack, RackSpace, Suse.Tags: Boston, Foundation, meetup, OpenStack, quantum
add a comment
My team at Dell was in Beantown (several of us are Nashua based) for an annual team meeting so the timing for this Boston meetup. Special thanks to Andi Abes for organizing and Suse for Sponsoring!!
We covered two primary topics: Quantum and the OpenStack Foundation.
In typing up my notes from the sessions, I ended up with so much information that it made more sense to break them into independent blog posts. Wow – that’s a lot of value from a free meetup!eetup was ideal for us. While we showed up in force, so did many other Stackers including people from HP, Nicira, Suse, Havard, Voxel, RedHat, ESPN and many more! The turnout for the event was great and I’m taking notes that Austin may need to upgrade our pizza and Boston may need to upgrade their cookies (just sayin’).
The Quantum session by David Lapsley from Nicira talked about the architecture and applications of Quantum. I think that Quantum is an exciting incubated project for OpenStack; however, it is important to remember that Essex stands alone without it. I believe this fact gets forgotten in enthusiasm over Quantum’s shiny potential.
The OpenStack session by Rob Hirschfeld from Dell (me!) talked about the importance of governance for OpenStack and how the Foundation will play a key role in transitioning it from Rackspace to a neutral party. There are many feel-good community benefits that the Foundation brings; however, the collaborators’ ROI is driver for creating a strong foundation. There is nothing wrong with acknowledging that fact and using it to create a more sustainable OpenStack.
Quantum: Network Virtualization in the OpenStack Essex Release February 8, 2012
Posted by Rob H in Andi Abes, Cisco, Meetup, OpenStack.3 comments
This post is part of my notes from the 2/1 Boston OpenStack meetup.
Quantum
David Lapsley from Nicira gave the Quantum presentation (his slides). My notes include additional explication and interpretation so he is not to blame for errors (but I’ll share credit for clarity).
The objective for Quantum is to replace the current networking modes (flat, vlan, dhcp, dhcp ha) with a programmatic networking API. The idea is that cloud users would use the API to request the network topology they wanted to implement rather than have it imposed by the infrastructure’s network mode. To accomplish this, the API must allow users to create complex & hierarchical network topologies without being aware of the underlying network infrastructure (aka “an abstraction layer”).

In simpler terms: Quantum allows users to design their own isolated networks without knowing how the network is actually deployed.
Quantum is a stand-alone service with its own API. It is not simply an extension of the Nova API. The Quantum API an extensibility model similar to Nova and it also has a plug-in architecture so that it can be implementation agnostic. The plug-ins are needed to map the user’s API abstraction into actual networking. For example, if the user requests a network tunnel between two VMs then the plug in may choose to implement a tagged VLAN, OpenFlow connections, IPtable filters, or encapsulated tunnels. The goal is that the implementation of the API should not matter to the user of the API!
For (hopefully) obvious reasons, the use cases the Quantum are similar to the Amazon EC2 VPC. The notable exception is service injection. Quantum wants to allow vendors/providers to innovate around value-added services. This should result in a diversity of choices as vendors offer additional network services such as load balancers, IPS, IDS, etc. While this is a great concept, it’s important to note that Quantum is currently limited to a single plug-in! [see note in comments by Quantum PTL Dan Wendlandt (@danwendlandt)]
The expectation is that cloud users will want to create traditional application topologies with different tiers of access. For example, applications may require a dedicated network between web and database tiers or a DMZ between web and load balancer. The challenge is that these are patterns not rigid requirements. Ultimately, the simplest solution for the feature is to allow users to create “virtual VLANs.”
Essentially, the current Quantum API is creating virtual VLANs.
The Quantum API has four basic abstractions: interface, network, port and attachment. These primitives are used to build up a virtual network just as they are in physical networks.
- Interface: cloud / tenant / server / GUID / eth0
- Network: cloud / tenant / network / GUID
- Port: cloud ID / tenant / network / GUID / port / GUID
- Attachement : interface & network & port
To use the Quantum API, you must create a network, add ports (to network) and interfaces (to vms) then attach the network, interface, and port together. This gives users very fine grained control over their network topology. It is up to the plug-in to translate these primitives into a working physical topology.
According to my teammate and OSBOS organizer, Andi Abes, the Quantum API reached consensus in the community quickly because these it started with this basic but extensible API. In the meeting, I added that this approach is typical for OpenStack where it is considered better to demonstrate working core functionality than build extra complexity into the initial delivery. This approach links back to the API vs. Implementation debate I’ve discussed before. This simple API also provides room for innovation – while providing the basic constructs it is light, and does not encumber mappings of this API to different underlying technologies with lots of extras. OEM Vendors and service providers this have an easier time differentiating their offerings be it equipment or services.
In my experience, people often link OpenFlow and Quantum into a single technology base. I have certainly been guilty making that generalization. Quantum does not require OpenFlow or vice versa; however, they are highly complementary. OpenFlow takes over the switches’ “flow table” and allows administrators to control how every packet that touches the switch is routed. The potential for OpenFlow is to create highly dynamic and controlled network conduits. Quantum needs exactly that functionality to most directly map the virtual network requests into a physical fabric. In this way, OpenFlow is the most direct approach to building a fully enabled Quantum plug-in.
In the Essex release, progress has been made (and still is being made) towards integrating Nova and Quantum. The workflow of attaching a VIF (virtual interface) to the right network, and assigning it an appropriate IP (using Melange – the OpenStack IP address management project) are making headway. That said, the dashboard integration still lags and more progress is required.
Overall, my impression is that Quantum has great potential; however, I think that Nova in Essex will be sufficient for real applications without Quantum. As my freshman roommate used to say, “potential means you’ve got to keep working on it.”
Why Governance Matters in Open Source: Discussing the OpenStack Foundation February 8, 2012
Posted by Rob H in Citrix, Dell, Meetup, Open source, OpenStack, RackSpace.Tags: Foundation, meetup, OpenStack, rackspace
1 comment so far
This post is part of my notes from the 2/1 Boston OpenStack meetup.
OpenStack Foundation
Your’s truly (Rob Hirschfeld) gave the presentation about the OpenStack Foundation. To readers of this blog, it’s obvious that I’m a believer in the OpenStack mission; however, it’s not obvious how creating a foundation helps with that mission and why OpenStack needs its own. As one person at the meetup put it, “Why not? Every major project needs a foundation!”
Governance does not sound sexy compared to writing code and deploying clouds, but it’s very important to the success of the project.
Here are my notes without the poetic elocution I exuded during the meetup…
The basics:
- What: Creating a neutral body to govern OpenStack. Rackspace has been leading OpenStack. This means that they own the copyrights, name and also pay the people who organize the community. They committed (to executives at Dell and others) that they would ultimately setup a standalone body to govern the project before the project was public and endorsed by those early partners. Dell (my employer), Citrix, Accenture and NASA were some of biggest names at the Austin conference launch.
- Why: A neutral body is needed because a lot of companies are committing significant time and money to the project. They cannot risk their investments on Rackspace good will alone. This may mean many things. It could be they don’t like Rackspace direction or they feel that Rackspace is not investing enough.
- When: Right now and over the next few releases. You should give feedback right now on the OpenStack Foundations mission. The actual foundation will take more time to establish because it requires legal work and funding commitments.
- Who: The community – all stakeholders. This is important stuff! While trying to standup a financially independent Foundation, which requires moneys, the little guys are not left out. There is a clear realization and desire to enable independent developers and contributors and small players to have a seat at the table.
- How Much: The amounts are unclear, but establishing a foundation will require a significant ongoing investment from highly involved and moneyed parties (Rackspace, Dell, Cisco, HP, Citrix, NTT, startups?, etc). The funding will pay salaries for people dedicated to the community doing the things that I’ll discuss below. Overall, the ROI for those investments must be clear!
The foundation does “governance.” But, what does that mean? Here is a list of vitally important work that the foundation is responsible for.
- Branding – Protecting, certifying, and promoting the OpenStack brand is important because it ensures that “OpenStack” has a valuable and predictable meaning to contributors and users. A strong the brand also means a stronger temptation for people to abuse the brand by claiming compatibility, participation and integration.
- API – Many would assume that the OpenStack API is the very heart of the project and there is merit to this position. As more and more OpenStack implementations emerge, it is essential that we have a body that can certify which implementations (and even which versions of the implementation!) are valid. This is a substantial value to the community because API integrity ensures project continuity and helps the ecosystem monetize the project. Note: my opinion differs from others here because I think we should favor API over implementation
- Community – The OpenStack community is not an accident. It is the function of deliberate actions and choices made by Rackspace and supported by key contributors. That community requires virtual and physical places to coalesce and leaders to organize and manage those meeting places. The excellent conferences, wikis, blogs, media awareness, documentation and meetups are a product of consistent community management.
- Arbitration – An open source community is a family and siblings do not always get along. Today, Rackspace must be very careful about balancing their own interests because they are like the oldest sibling playing the parent role – you can get away with it until something serious happens. We need a neutral party so that Rackspace can protect their own interests (alternate spin: because Rackspace protects their own interests at the expense of the community).
- Leadership – OpenStack today is a collection of projects with individual leadership. We will increasingly need coordinated leadership as the number of projects and users increases. Centralized leadership is essential because the good of the project as a whole may mean sacrifices within individual projects. It may even mean that some projects chose to leave the OpenStack tent. Stewarding these challenges will require a new level of leadership.
- Legal – This is a function of all the above but also something more. From a legal stand point, OpenStack be able to represent itself. There is a significant amount of intellectual property being created. It would be foolish to overlook that this property is valuable and needs adequate legal representation.
I used “vitally important” to describe the above items. Is that an exaggeration? Our goal is collaboration and that requires some infrastructure and rules to make it sustainable. We must have a foundation that encourages innovation (multiple implementations) and collaboration (discourages forking). Innovation and collaboration are the heartbeat of an open source project.

The foundation is vitally important because collaboration by competitors is fragile.
In addition to the core areas above, the foundation needs to handle routine tactical items such as:
- Delivering on milestones & releases
- Moving new subprojects into OpenStack
- Electing and maintaining Project Policy Board
- Electing and maintaining Project Technical Leads
- Ensuring adherence and extensions to the current bylaws
At the end of the day, OpenStack monetization is the central value for the Foundation.
In order for the OpenStack project, and thus its foundation, to flourish, the contributors, ecosystem, sponsors and users of the project must be able to see a reasonable return (ROI) on their investment. I would love to believe that the foundation is allow about people banding together to solve important problems for the benefit of all; however, it is more realistic to embrace that we can both collaborate and profit simultaneously. Acknowledging the pragmatic self-interested view allows us to create the right incentives and processes as embodied by the OpenStack foundation.
Superbowl Ad Bingo for either High IQ or Matrix Challenged February 5, 2012
Posted by Rob H in Random.Tags: bingo, Superbowl
add a comment
If you, like me, are going to a party to observe expensive ads interrupted by costumed men line dancing aggressively then you may enjoy the bingo card generator that I put together.
This spreadsheet creates TWO types of bingo cards,
- Smart cards have TWO dimensions: the columns select the type of pitch while the rows select the item being pitched.
- Dumb cards just present choices (there are more) for you to find
Note: to use, you need to press “F9″ between each printing to generate new randomness.
Post Superbowl note: Apparently, I should have had “Factor Setting” instead of the “Sexy Model” column. Disappointing! Also, liquor & political ads are not shown during the Superbowl. Suggestions for improvement are welcome. If you got “Betty White” multiple times, I don’t want to hear about it.
Creative Commons licensed. No warranty.
OpenStack Keystone makes smart & bold move to improve quality January 30, 2012
Posted by Rob H in OpenStack.Tags: Diablo, essex, Keystone, OpenStack
2 comments
Just after the OpenStack
Essex 3 milestone, Ziad Sawalha of Rackspace announced a major shift in the Keystone code base. I applaud the clarity of Ziad’s email but want to restate my understanding of the changes here rather than simply parrot him.
These changes improve Keystone and OpenStack in several ways.
The Keystone team is keeping the current APIs while swapping their implementation. They recommend switching back to an implementation based on the Rackspace Cloud Builder’s Keystone Light code base. I say switching back because my team at Dell has some experience with the Keystone Light (KSL) code. KSL was used with our first Diablo release work while legacy Keystone (Diablo Keystone?) was being readied for release. Upon reflection, the confusion around Keystone readiness for Diablo may have been an indicator to some disconnects that ultimately contributed to last week’s decision.
This is not an 11th hour rewrite. Keystone Light (now Essex Keystone?) offers
- An existing code base that has been proven in real deployments
- Stronger identity pluggability, better EC2 compatibility and higher production readiness
- An existing testing framework and proven extensibility and flexibility
- Plus, the team has committed to ensure a simple migration path
Beyond the code and Keystone, making a change like this takes confidence and guts.
This change is not all sunshine and rainbows. Making a major change midway through the release cycle introduces schedule and delivery risk. Even though not fully graduated to core project status, Keystone is already an essential component in OpenStack. People will certainly raise valid questions about production readiness and code churn within the project. Changes like these are the reality for any major project and doubly so for platforms.
The very fact that this change is visible and discussed by the OpenStack community shows our strength.
Acknowledging and quickly fixing a weakness in the OpenStack code base is exactly the type of behavior that the community needs to be successful and converge towards a great platform. The fact that maintaining the API is a priority shows that OpenStack is moving in the direction of more API based standards. While the Keystone change is not a recommendation for dual implementations (the Diablo Keystone fork will likely die out), it should help set the stage for how the community will handle competing implementations. If nothing else, it is a strong argument for maintaining API tests and compliance.
The Keystone change is a forward looking one. Our Crowbar team will investigate how we will incorporate it. As part of OpenStack, the new Keystone code will (re)surface for the Essex deployment and that code will be part of the Dell OpenStack-Powered Cloud. This work, like the previous, will be done in the open as part of the OpenStack barclamps that we maintain on the Crowbar github.
