Crowbar 2.0 Objectives: Scalable, Heterogeneous, Flexible and Connected

The seeds for Crowbar 2.0 have been in the 1.x code base for a while and were recently accelerated by SuSE.  With the Dell | Cloudera 4 Hadoop and Essex OpenStack-powered releases behind us, we will now be totally focused bringing these seeds to fruition in the next two months.

Getting the core Crowbar 2.0 changes working is not a major refactoring effort in calendar time; however, it will impact current Crowbar developers by changing improving the programming APIs. The Dell Crowbar team decided to treat this as a focused refactoring effort because several important changes are tightly coupled. We cannot solve them independently without causing a larger disruption.

All of the Crowbar 2.0 changes address issues and concerns raised in the community and are needed to support expanding of our OpenStack and Hadoop application deployments.

Our technical objective for Crowbar 2.0 is to simplify and streamline development efforts as the development and user community grows. We are seeking to:

  1. simplify our use of Chef and eliminate Crowbar requirements in our Opscode Chef recipes.
    1. reduce the initial effort required to leverage Crowbar
    2. opens Crowbar to a broader audience (see Upstreaming)
  2. provide heterogeneous / multiple operating system deployments. This enables:
    1. multiple versions of the same OS running for upgrades
    2. different operating systems operating simultaneously (and deal with heterogeneous packaging issues)
    3. accommodation of no-agent systems like locked systems (e.g.: virtualization hosts) and switches (aka external entities)
    4. UEFI booting in Sledgehammer
  3. strengthen networking abstractions
    1. allow networking configurations to be created dynamically (so that users are not locked into choices made before Crowbar deployment)
    2. better manage connected operations
    3. enable pull-from-source deployments that are ahead of (or forked from) available packages.
  4. improvements in Crowbar’s core database and state machine to enable
    1. larger scale concerns
    2. controlled production migrations and upgrades
  5. other important items
    1. make documentation more coupled to current features and easier to maintain
    2. upgrade to Rails 3 to simplify code base, security and performance
    3. deepen automated test coverage and capabilities

Beyond these great technical targets, we want Crowbar 2.0 is to address barriers to adoption that have been raised by our community, customers and partners. We have been tracking concerns about the learning curve for adding barclamps, complexity of networking configuration and packaging into a single ISO.

We will kick off to community part of this effort with an online review on 7/16 (details).

PS: why a refactoring?

My team at Dell does not take on any refactoring changes lightly because they are disruptive to our community; however, a convergence of requirements has made it necessary to update several core components simultaneously. Specifically, we found that desired changes in networking, operating systems, packaging, configuration management, scale and hardware support all required interlocked changes. We have been bringing many of these changes into the code base in preparation and have reached a point where the next steps require changing Crowbar 1.0 semantics.

We are first and foremost an incremental architecture & lean development team – Crowbar 2.0 will have the smallest footprint needed to begin the transformations that are currently blocking us. There is significant room during and after the refactor for the community to shape Crowbar.

Technical details of pending Crowbar changes

We’re testing a HUGE batch of changes to Crowbar before we commit them. The changes support the barclamp modularization work and also include the addition of RHEL and network barclamp update.

You may be eager to dig in; however, disruptiveness of these changes means that we are taking extra time to make sure that the build and install still work.

Here’s what you’ll see when we commit the changes:

  • Changes in naming to be more generic
    • Crowbar server user/pass is now crowbar/crowbar (was openstack/openstack)
    • Rails app path now crowbar_framework (was openstack_manager )
  • The pre-split barclamps (/change-image/dell/barclamps/*) have been moved into individual github repos (barclamp-*).
    • Barclamps are pulled into the build using “git submodule”
    • Chef scripts for barclamps are no longer copied and comingled together in the chef directory. They remain in their source directories (default /opt/dell/barclamps)
  • Inside the barclamps, you’ll find
    • A crowbar configuration file to direct the barclamp installer including localization and menu extensions.
    • Path changes to better align with the destination paths (command_line -> bin, app ->crowbar_framework)
    • App views moved under subdirectories
  • Changes to installation scripts
    • Barclamp installation changed to a ruby library so it can do more and be used individually outside of the install process. This allows barclamps to be imported or updated after installation.
    • Changes to create accommodate multiple operating systems
  • Addition of a “redhat-5.6-extra” directory with the RHEL 5.6 installation build components.
    • The RHEL version installs Opcode Chef Server 0.10 (Ubuntu is still 0.9 – community help here?)
  • Crowbar framework Rails app runs under Rainbow instead of Apache.
  • The code for the framework and the barclamp installer has been moved into the crowbar barclamp.
    • The installer bootstraps the crowbar barclamp to install itself.
  • The network barclamp has been substantially changed – that will require additional documentation. Features include
    • Concept of “conduits” that are constructed on nodes to be shared between barclamps
    • Ability to map adapters in a general way to deal with inconsistent enumeration
    • Mapping conduits to adapters allows for new teaming and multiple teaming configurations

We’ll post to the Crowbar listserv when changes. They will be posted to Crowbar HEAD. If you want the current build, we have created a “v1.0″ tag.

Abstractions are great, except when they’re not

Or “please don’t make my life 100% easier, 80% is enough”

I had an interesting argument recently in a very crowded meeting – maybe we were all getting that purple meeting haze, but it started to take on all the makings of a holy war (so I knew it would make a good blog post).

We were discussing an API for interacting with a server cloud and the API was intentionally very abstracted.  Specifically, you could manage a virtual server but you could not see which host was providing the resources.  The vendor wanted to hide the raw resources from API consumers.  This abstraction was good; it made the API simpler and allowed the provider to flexibility about how it implemented the backend.  The API abstraction made the underlying system opaque.

So far it was all rainbows, unicorns and smiling yellow hatted yard gnomes.

Then I wanted to know if it was possible to relate information between the new API and the existing resource transparent API.  Why would I want to do that?  I was interested in the 5% case where we needed to get information about the specific resources that assigned.  For example, when setting up redundant database replication, we want to make sure that they are not assigned to the same physical hosts.

More importantly, I do not want the vendor to clutter their new abstracted API with stuff to handle these odd ball use cases.  Calling them 5% use-cases is deceptive: they are really in the hugely diverse bucket of use-cases outside of the 95% that are handled nicely the abstractions bucket.  Trying to crow bar in these extra long tail use-cases will make the API unwieldy for the intended audience. 

Someone else in the meeting disagreed with the premise of my question and wanted me to explain it.  In answer, I used the tautology “Abstractions are useful, until they are not.”

The clearest example of this concept is the difference between Rails ActiveRecord and Hibernate.  Both are excellent object-relational (OR) abstractions.   The make the most general cases (select * from foo where ID = bar) quick and easy.   But they are radically different at the edge of the abstraction.  ActiveRecord expects that programmers will write directly to the database’s native SQL to handle the 5% exceptions.  Hibernate added the complex and cumbersome HQL on top of their abstraction layer.  HQL is nearly (in some cases, more) complex than the SQL language that it tries to abstract.  For Hibernate, this is really an anti-abstraction that’s no longer useful.

 Over stretching an abstraction encourages the wrong behaviors and leads to overly complex and difficult to maintain APIs.  When you reach the edge of an abstraction, it’s healthy to peek under the covers.  Chances are that you’re doing something wrong or something unique enough that you’ve outgrown the abstraction.

And that’s enough for now because blog posts are useful, until they are not.

WhatTheBus gets its move on: live maps arrive

The “livemap” tag for WhatTheBus completes the work that I seeded in the last commit.

In the last update, we had finished a page that used jQuery AJAX updates to get the latest bus location from the cache.  Using the simulator to provide updates, the AJAX updates shows that we could get location updates as latitude and longitudes.  By watching the dev web server, I also saw that these requests were super light (0ms DB, 0ms view).

In this update, I took the same basic code and added Google maps (v3) interaction.  Using Google and jQuery together maps is refreshingly easy.

The first step was to render the map when the page loads.  This required adding the Google javascript library and pointing the page’s onload event  to initMap.  The initMap function creates a new map object centered on the bus’ location, creates a bus marker at the center, and then replaces the page’s live_map div with the map.

<script type="text/javascript" src="http://maps.google.com/maps/api/js?sensor=true&amp;key=<%= MAP_KEY || "not_set_in_#{RAILS_ENV}_config" %>"> </script>
<script type="text/javascript">
var map;
var busMarker;
var xref = '<%= @bus.xref %>';
var name = '<%= @bus.name %>';
var tstamp = new Date();
function initMap() {
  var centerCoord = new google.maps.LatLng(<%= @pos %>);
  var mapOptions = {
    zoom: 16,
    center: centerCoord,
    mapTypeId: google.maps.MapTypeId.ROADMAP
  };
  map = new google.maps.Map(document.getElementById("live_map"), mapOptions);
  busMarker = new google.maps.Marker({
    position: centerCoord,
    map: map,
    title: name,
    icon: "/images/bus.png"
  });
  window.setInterval('updateMap();', <%= MAP_UPDATE_SECS %>000);
}

The second step was to add the AJAX call on a timer.  After adding the update function timer registration, the updates simply extended the existing AJAX request.  This existing request already had the bus’ position so the work centered on interacting with the map.

To keep things friendly, we turn the last marker into a bullet point and change the bus name to the last time.  Then we create a new marker based on that latest position and center the map on that position too.  To prevent reduce server load for non-reporting buses, we move off the map page if there is no position data.

function updateMap() {
  busMarker.setIcon("/images/track.png");
  busMarker.setTitle(tstamp.getHours() + ":" + tstamp.getMinutes() + ":" + tstamp.getSeconds());
  tstamp = new Date();
  // get the data for the map
  jQuery.getJSON("/bus/index/"+ xref +".json?cache", {}, function(data){
    if (data.length==0) window.location.href("/bus/index/<%= @bus.xref %>");
      var newCoord = new google.maps.LatLng(parseFloat(data.buses[xref].lat), parseFloat(data.buses[xref].lng));
      busMarker = new google.maps.Marker({
        position: newCoord,
        map: map,
        title: name,
        icon: "/images/bus.png"
      });
      map.setCenter(newCoord);
  });
}

After a few tweaks to add navigation between the existing pages, the first use-case of WhatTheBus is in the bag!

The next step is to setup the code online and get a district to send updates!

WhatTheBus Map seeds planted, looks like Cucumbers

Today’s WhatTheBus progress (see tag “MapSeeds”) created the foundation for live maps. These maps will only show a single bus at a time because our initial use case is that a parent wants to monitor their child’s bus.

To accomplish this use case, we need to provide a list of all the buses in the system (grouped by district) and then allow users to select a bus and see the map.

This foundation started with a series of Cucumber tests that verified the navigation structure and the base map page.  These pages used the existing page tests (yawn).

In fact, most of this work is pretty dull Rails web navigation stuff.  The interesting parts were:

  1. DRY the bus location from cache by moving it into the model
  2. Add jRails so that we can use jQuery goodness
  3. Optimization: added index for Bus xref
  4. Optimization: added ?cache flag for the bus json requests to bypass database calls on location only requests

In our next pass, we’ll add support for Google Maps (v3).  Until then, we just have some simple script that pulls the current bus position using our existing bus JSON request.  I am using the simulator (“rake sim:move”) to verify this; unfortunately, Cucumber does have native AJAX support.

Here’s the <script>

<script type="text/javascript"> var xref = '<%= @bus.xref %>'; var name = '<%= @bus.name %>'; function initMap() { window.setInterval('updateMap();', <%= MAP_UPDATE_SECS %>000); updateMap(); } function updateMap() { jQuery("#ll").text("?,?"); // get the data for the map jQuery.getJSON("/bus/index/"+ xref +".json?cache", {}, function(data){ jQuery("#ll").text(data.buses[xref].lat + ',' + data.buses[xref].lng); }); } </script>

WhatTheBus fun with Cucumber and MemCacheD

Sometimes a problem has to kick you upside the head so you can learn an important lesson.  Tonight’s head slapper was an interaction between Cucumber and MemCacheD.

If you are using CUCUMBER AND MEMCACHE read this post carefully so you don’t get burned.  If you’re using MemCache and not writing tests then return to Jail, do not collect $200.

It’s important to note that Cucumber has the handy side effect of running each scenario in a transaction.  The impact is that the data from each scenario does not impact the next scenario.  (note: you can pre-load data into cucumber using fixtures).

However, Cucumber does not do any rollback for Cache keys added into MemCache.  In fact, your MemCache entries will happily persist between your development and test systems.

WhatTheBus has a simple check to reduce database writes – it only writes to the database if there is no cache hit for the bus.  My thinking is that we only need to add a new bus if there is no key as shown in this partial snippet:

  cache = Rails.cache.read params[:id]
  if cache.nil?
     bus = Bus.find_or_create_by_xref :name => params[:name], :xref => params[:id]
  end

This works great for live testing, but fails in technicolor for Cucumber because tests with the same ID will not make it to the find_or_create.

To solve the problem, I had to add a pre-condition (‘given’ in Cucumber speak) to each scenario to make sure the cache was cleared.  It looks like this in the scenario feature:

  Given no cache for "1234"

And that’s translated as code in the steps like so:

  Given /^no cache for "([^\"]*)"$/ do |id|
   Rails.cache.delete id
  end

WhatTheDB? Adding mySQL into WhatTheBus

Today’s WhatTheBus update added data persistence to the application. Ultimately, I am planning to use CouchDB for persistence; however, I wanted to show a SQL to document migration as part of this process. My objective is to allow dual modes for this application.

In the latest updates, I continued to show Test Driven Development (TDD) process using Cucumber. Before starting work, I ran the test suite and found a bug – spectacular failure if MemCacheD is not running. So my first check-in adds recovery and logging around that event. Next I wrote a series of tests for database persistence. These tests included checking a web page that did not exist at this time. I ran the tests – as expected, all failed.

The persistence was very simple: models for bus and district. These minimal models are created dynamically when a bus location is updated. The data contract is that the first location update should include the bus name and distract in the url. After the first update, only ID and location (lat, lng) are expected. In addition to the model and migrations, I also updated the database.yml to use mySQL.

Creating a web page for the bus (bus/index/[xref id]) required the addition of a little infrastructure for the application. Specifically, I had to add an application layout and style sheet. Just because I have a styles sheet, does not mean there is any style (I’ve got style, brother. I’ve got million dollar charm, sister. I’ve got headaches and toothaches and bad times too).

To preserve simplicity, I am not storing the location information in the database. Location is so time sensitive that I don’t want to create any storage burden and I’m using cache expiration to ensure that we don’t keep stale locations around.

Up next…. I’m going to add a simulator (in rake) to make it easier to work on the application.

WhatTheBus, Day1: MemCacheD roundtrip

Today I got the very basic bus data collection working using Cucumber TDD.  That means that I wrote the basic test I wanted to prove BEFORE I wrote the code that operates the test.

The Cucumber feature test looks like this:

Feature: Mobile Access
In order to ensure that location updates are captured
School Bus Location providers
want to have data they send stored on the site

Scenario: Update Location
When bus named “lion” in the “eanes” district with a id of “1234” goes to “32,-97″
When I go to the bus “1234” page
Then json has an object called “buses”
And json has a record “1234” in “buses” with “lat” value “32”
And json has a record “1234” in “buses” with “lng” value “-97″

There’s is some code behind this feature that calls the web page and gets the JSON response back.  The code that actually does the work in the bus controller is even simpler:

The at routine takes location updates just parses the parameters and stuffs it into our cache.  For now, we’ll ignore names and district data.

def at

Rails.cache.write params[:id], “#{params[:lat]},#{params[:lng]},#{params[:name]},#{params[:district]}”, :raw=>:true, :unless_exist => false, :expires_in => 5.minutes
render :nothing => true

end

The code that returns the location (index) pulls the string out of the cache and returns the value as simple JSON.

def index

data = Rails.cache.read(params[:id], :raw => true).split(‘,’)
if data.nil?
render :nothing => true
else
render :json => {:buses => { params[:id].to_sym => { :lat => data[0], :lng => data[1] } } }
end

end

Not much to it!  It’s handy that Rails has memcache support baked right in!  I just had to add a line to the environment.rb file and start my memcached server.

Cloud Reference App, “What The Bus” intro

Today I started working on an application to demonstrate “Cloud Scale” concepts.  I had planned to do this using the PetShop application; unfortunately, the 1995 era PetShop Rails migration would take more repair work then a complete rewrite (HTML tables, no CSS, bad forms, no migrations, poor session architecture).

If I’m considering a fresh start, I’d rather do it with one of my non-PetShop pet projects called “WhatTheBus.”  The concept combines inbound live data feeds and geo mapping with a hyper-scale use target.  The use case is to allow parents to see when their kids’ bus is running late using the phone from the bus stop.

I’m putting the code in git://github.com/ravolt/WhatTheBus.git and tracking my updates on this bog.

My first sprint is to build the shell for this application.  That includes:

  • the shell RAILS application
  • Cucumber for testing
  • MemCacheD
  • Simple test that sets the location of a bus (using a GET, sorry) in the cache and checks that it can retrieve that update.

This sprint does not include a map or any database.  I’ll post more as we build out this app.

Note: http://WhatTheBus.com is a working name for this project because it appeals to m warped sense of humor.  It will likely appear under the sanitary ShowBus moniker: http://showb.us.

Petshop, Updated Day 1

As part of a Cloud computing project, I’ve taken on updating the Rails port of the JPetShop project to Rails 2.0 and have the project on SourceForge.  This port dates back to 2005 so many of the latest conventions (e.g. CSS) were not in vogue.

My ultimate objective is to show scale out techniques on a very simple base app.  Before we can get there, I’ve got some clean-up work to do.  I’d also like to add a test framework (Cucumber?).  I’ll document the progress through this exercise here.

My first check-in provided the base level of function.  Currently, none the forms are working but the catalog is visible.

Today’s update was to fix the login page:

  • Change the view to use the form_tag helper.  This let us put protect_from_forgery into the code base again!
  • Remove the extra login form (not sure why that was there)
  • Clean-up all the references to use symbols (:field) instead of strings (‘field’)
  • Change the controller to handle both the initial request (GET) and form processing (POST)
  • Update the layout and other pages to direct users to the correct login page

I’ve been resisting:

  • removing the tables in favor of definition lists (DL)
  • add CSS
  • changing the session to store an ID instead of the full account object