Tweaking DefCore to subdivide OpenStack platform

The following material will be a major part of the discussion for The OpenStack Board meeting on Monday 10/20.  Comments and suggest welcome!

OpenStack in PartsAuthor’s 2/26/2015 Note:  This proposal was approved by the Board in October 2014.

For nearly two years, the OpenStack Board has been moving towards creating a common platform definition that can help drive interoperability.  At the last meeting, the Board paused to further review one of the core tenants of the DefCore process (Item #3: Core definition can be applied equally to all usage models).

Outside of my role as DefCore chair, I see the OpenStack community asking itself an existential question: “are we one platform or a suite of projects?”  I’m having trouble believing “we are both” is an acceptable answer.

During the post-meeting review, Mark Collier drafted a Foundation supported recommendation that basically creates an additional core tier without changing the fundamental capabilities & designated code concepts.  This proposal has been reviewed by the DefCore committee (but not formally approved in a meeting).

The original DefCore proposed capabilities set becomes the “platform” level while capability subsets are called “components.”  We are considering two initial components, Compute & Object, and both are included in the platform (see illustration below).  The approach leaves the door open for new core component to exist both under and outside of the platform umbrella.

In the proposal, OpenStack vendors who meet either component or platform requirements can qualify for the “OpenStack Powered” logo; however, vendors using the only a component (instead of the full platform) will have more restrictive marks and limitations about how they can use the term OpenStack.

This approach addresses the “is Swift required?” question.  For platform, Swift capabilities will be required; however, vendors will be able to implement the Compute component without Swift and implement the Object component without Nova/Glance/Cinder.

It’s important to note that there is only one yard stick for components or the platform: the capabilities groups and designed code defined by the DefCore process.  From that perspective, OpenStack is one consistent thing.  This change allows vendors to choose sub-components if that serves their business objectives.

It’s up to the community to prove the platform value of all those sub-components working together.

Notes from 10/27 OpenStack Austin Meetup (via Stephen Spector)

Stephen Spector (now a Dell Services employee!) gave me permission to repost his excellent notes from the first OpenStack Austin (#OSATX) Meetup Group.

Here are his notes:

[Stephen] wanted to update everyone on the Austin OpenStack Meetup last night at the Austin TechRanch sponsored by Joseph and Rob (that’s me!) of the Dell OpenStack team (I think I got that right?). You can find all the tweets from the event at https://twitter.com/#!/search/%23osatx as we created a new hashtag for tweeting during the event, #osatx.

Here are some highlights from the event:

  • About 60 or so attendees with a good amount from Dell (Barton George, Logan McCloud)and Rackspace, Opscode (Matt Ray), Puppet Labs, SUSE talked about their OpenStack commitment (http://t.co/bBnIO7xv), and Ubuntu folks as well
  • Jon Dickinson who is the Project Technical Lead for Swift (Object Storage) was there and presented information on the current Swift offering; It is interesting to note that Swift releases continuously when most of OpenStack releases during the 6 month development cycle like Nova (Compute)
  • Stephen and Jim Plamondon from Rackspace presented information on the overall community and talked about the announcement yesterday from Internap about their Compute public cloud and the information on the MercadoLibre 600 Node Compute cloud running their business:

“With 58 million users of MercadoLibre.com and growing rapidly, we need to provide our teams instant access to computing resources without heavy administrative layers. With OpenStack, our internal users can instantly provision what they need without having to wait for a system administrator,” said Alejandro Comisario, Infrastructure Senior Engineer, MercadoLibre, the largest online trading platform in Latin America. “With our success running OpenStack Compute in production, we plan to roll OpenStack Diablo out more broadly across the company, and have appreciated the community support in this venture, especially through the OpenStack Forums, where we are also global moderators.”

  • Discussion on the OpenStack API Issue which is a significant open issue at this time – should OpenStack focus on creating an API specification and then let multiple implementations of that API move forward or build 1 implementation of the API as official OpenStack (see my post for more on this).
  • Greg Althaus gave a demo of the Nova Dashboard
  • Future Meetings
  • Three organizations have offered to help host (pizza $ and TechRanch space $) but we always need more!  You can offer to sponsor via the meetup site.
  • There will be future OpenStack Austin Meetups so sign up for the group and you’ll be notified automatically.

Pictures…

Continue reading

Crowbar design: solving the multi master update issue and adding a pause before configuration

The last few weeks for my team at Dell have been all about testing as Crowbar goes through our QA cycle and enters field testing. These activities are the run up to Dell open sourcing the bits.

The Crowbar testing cycle drove two significant architectural changes that are interesting as general challenges and important in the details for Crowbar adopters.

Challenge #1: Configuration Sequence.

Crowbar has control of every step of deployment from discovery, BIOS/RAID configuration, base image, core services and applications. That’s a great value prop but there’s a chicken and egg problem: how do you set the RAID for a system when you have not decided which applications you are going to install on it?

The urgency of solving this problem became obvious during our first full integration tests. Nova and Swift need very different hardware configurations. In our first Crowbar flows, we would configure the hardware before you selected the purpose of the node.  This was an effect of “rushing” into a Chef client ready state. 

We also needed a concept of collecting enough nodes to deploy a solution.  Building an OpenStack cloud requires that you have enough capacity to build the components of the system in the correct sequence.

Our solution was to inject a “pause” state just after node discovery.  In the current Crowbar state machine, nodes pause after discovery.  This allows you to assign them into the roles that you want them to play in your system.

In testing, we’ve found that the pause state helps manage the system deployment; however, it also added a new user action requirement. 

Challenge #2: Multi-Master Updates

In Chef, the owner of a node’s data in the centralized database is the node, not the server.  This is a logical (but not a typical) design pattern and has interesting side effects.  Specifically, updates from Chef Client runs on the nodes are considered authoritative and will over-write changes made on the server. 

This is correct behavior because Chef’s primary focus is updating the node (edge) and not the central system (core).  If the authority was reversed then we would miss critical changes that Chef effected on the nodes.   From this perspective, the server is a collection point for data that is owned/maintained at the nodes.

Unfortunately, Crowbar’s original design was to inject configuration into the Chef server’s node objects.  We found that Crowbar’s changes could be silently lost since the server is not the owner of the data.  This is not a locking issue – it is a data ownership issue.  Crowbar was not talking to the master of the data when it made updates!

To correct this problem, we (really Greg Althaus in a coding blitz) changed Crowbar to store data in a special role mapped to each node.  This works because roles are mastered on the server.  Crowbar can make reliable updates to the node’s dedicated role without worrying the remote data will override changes. 

This pattern is a better separation of concerns because Crowbar and barclamp configuration in stored in a very clearly delineated location (a role named crowbar-[node] and is not mixed with edge configuration data.

It turns out that these two design changes are tightly coupled.  Simultaneous edge/server writes became very common after we added the pause state.  They are infrequent for single node changes; however, the frequency increases when you are changing a system of interconnected nodes through multiple state.

More simply put: Crowbar is busy changing the node configs at the exactly same time the nodes are busy changing their own configuration.

Whew!  I hope that helped clarify some interesting design considerations behind Crowbar design.

Note: I want to repeat that Crowbar is not tied to Dell hardware! We have modules that are specifically for our BIOS/RAID, but Crowbar will happily do all the other great deployment work if those barclamps are missing.

Cooking up OpenStack Chef recipes with Opscode

Our OpenStack team here at Dell has been busy getting Crowbar ready to open source and that does not leave much time for blog posts.  We’re putting on a new UI, modularizing with barclamps and creating network options for Nova Cactus.

Sharing is goodHowever, I wanted to take a minute to update the community about Swift and Nova recipes that we are intenionally leaking out to the community in advance of the larger Crowbar code drop.

As part of our collaboration with Opscode, Matt Ray, has been merging our recipes into his most excellent OpenStack cookbook tree.  If you want to see our unmerged recipes, we’re also posting those to our github.  So far, we have the Swift recipes available (thanks to Andi Abes!) with Nova to follow soon.

5/31 Update: These are now online.

Modularizing Crowbar via Barclamps – Dell prepares to open source our #OpenStack installer

My team at Dell is working diligently to release Crowbar (Apache 2) to the community.

  • We have ramped up our team size (Andi Abes was spotted recently posting on the Swift list).
  • We are collaborating with partners like Rackspace, Opscode and Citrix
  • We brought in UI expertise (Jon Roberts) to improve usability and polish.
  • We are making sure that the code is integrated with our Dell OpenStack Solution (DOSS).
  • We are lining up customers for real field trials.

The single most critical aspect of Crowbar involves a recent architectural change by Greg Althaus to make Crowbar much more modular.  He dubbed the modules “barclamps” because they are used to attach new capabilities into the system.  For example, we include barclamps for DNS, discovery, Nova, Swift, Nagios, Gangalia, and BIOS config.  Users select which combination to use based on their deployment objectives.

In the Crowbar architecture, nearly every capability of the system is expressed as a barclamp.  This means that the code base can be expanded and updated modularly.  We feel that this pattern is essential to community involvement.

For example, another hardware vendor can add a barclamp that does the BIOS configuration for their specific equipment (yes! that is our intent).  While many barclamps will be included with the open source release to install open source components, we anticipate that other barclamps will be only available with licensed products or in limited distribution.

A barclamp is like a cloud menu planner: it evaluates the whole environment and proposes configurations, roles, and recipes that fit your infrastructure.  If you like the menu, then it tells Chef to start cooking.

Barclamps complement the “PXE state machine” aspect of Crowbar by providing logic Crowbar evaluates as the servers reach deployment states.  These states are completely configurable via the provisioner barclamp; consequently, Crowbar users can choose to change order of operations.  They can also add barclamps and easily incorporate them into their workflow where needed.

Barclamps take the form of a Rails controller that inherits from the barclamp superclass.  The superclass provides the basic REST verbs that each barclamp must service while the child class implements the logic to create a “proposal” leveraging the wealth of information in Chef.  Proposals are JSON collections that include configuration data needed for the deployment recipes and a mapping of nodes into roles.

Users are able to review and edit proposals (which are versioned) before asking Crowbar to implement the proposal in Chef.  The proposal is implemented by assigning the nodes into the proposed roles and allowing Chef to work it’s magic.

Users can operate barcamps in parallel.  In fact, most of our barclamps are designed to operate in conjunction.

Reminder: It is vital to understand that Crowbar is not a stand-alone utility.  It is coupled to Chef Server for deployment and data storage.  Our objective was to leverage the outstanding capabilities and community support for Chef as much as possible.

We’re excited about this architecture addition to Crowbar and encourage you to think about barclamps that would be helpful to your cloud deployment.

OpenStack Swift Retriever Demo Online (with JavaScript xmlhttprequest image retrieval)

This is a follow-up to my earlier post with the addition of WORKING CODE and an ONLINE DEMO. Before you go all demo happy, you need to have your own credentials to either a local OpenStack Swift (object storage) system or RackSpace CloudFiles.

The demo is written entirely using client side JavaScript. That is really important because it allows you to test Swift WITHOUT A WEB SERVER. All the other Swift/Rackspace libraries (there are several) are intended for your server application to connect and then pass the file back to the client. In addition, the API uses meta tags that are not settable from the browser so you can’t just browse into your Swift repos.

Here’s what the demo does:

  1. Login to your CloudFiles site – returns the URL & Token for further requests.
  2. Get a list of your containers
  3. See the files in each container (click on the container)
  4. Retrieve the file (click on the file) to see a preview if it is an image file

The purpose of this demo is to be functional, not esthetic. Little hacks like pumping the config JSON data to the bottom of the page are helpful for debugging and make the action more obvious. Comments and suggestions are welcome.

The demo code is 4 files:

  1. demo.html has all the component UI and javascript to update the UI
  2. demo.js has the Swift interfacing code (I’ll show a snippet below) to interact with Swift in a generic way
  3. demo.css is my lame attempt to make the page readable
  4. jQuery.js is some first class code that I’m using to make my code shorter and more functional.

1-17 update: in testing, we are working out differences with Swift and RackSpace. Please expect updates.

HACK NOTE: This code does something unusual and interesting. It uses the JavaScript XmlHttpRequest object to retrieve and render a BINARY IMAGE file. Doing this required pulling together information from several sources. I have not seen anyone pull together a document for the whole process onto a single page! The key to making this work is overrideMimeType (line G), truncating the 32 bit string to 16 bit ints ( & 0xFF in encode routine), using Base64 encoding (line 8 and encode routine), and then “src=’data:image/jpg;base64,[DATA GOES HERE]'” in the tag (see demo.html file).

Here’s a snippet of the core JavaScript code (full code) to interact with Swift. Fundamentally, the API is very simple: inject the token into the meta data (line E-F), request the /container/file that you want (line D), wait for the results (line H & 2). I made it a little more complex because the same function does EITHER binary or JSON returns. Enjoy!

retrieve : function(config, path, status, binary, results) {

1   xmlhttp = new XMLHttpRequest();

2   xmlhttp.onreadystatechange=function()  //callback

3      {

4         if (xmlhttp.readyState==4 && xmlhttp.status==200) {

5            var out = xmlhttp.responseText;

6            var type = xmlhttp.getResponseHeader("content-type");

7            if (binary)

8               results(Swift.encode(out), type);

9            else

A               results(JSON.parse(out));

B         }

C      }

D   xmlhttp.open('GET',config.site+'/'+path+'?format=json', true)

E   xmlhttp.setRequestHeader('Host', config.host);

F   xmlhttp.setRequestHeader('X-Auth-Token', config.token);

G   if (binary) xmlhttp.overrideMimeType('text/plain; charset=x-user-defined');

H   xmlhttp.send();
}

OpenStack Swift Demo (in a browser)

I’m working on mini-demo project for OpenStack Swift.  To keep things very simple and easy to understand, I decided that the whole demo would work in JavaScript in the browser.  I also choose to use RackSpace’s CloudFiles as a Swift target for testing since they have the same API are are universally available (unlike my lab systems).

One advantage of this approach is that FireBug makes it very nice to debug and check the activity of the code.  Unfortunately, FireBug also seems to eat the headers that I need.  *Let me phrase that in a google friend way so that someone else will not loose the 2 hours I just lost*

“XmlHttpRequest setRequestHeader FireFox Not Respected when using FireBug”

It works great in Safari. So onward and upward.  So far, I’ve got step #1 ready – getting the authorization token back from the cloud site.

Here’s the HTML page (you need jQuery too).  Basically, it uses the username and key from the inputs to set “x-auth-user” and “x-auth-key” header attributes.  These attributes will allow Swift to return a token that you can use on future requests when you want to do useful work.

<!DOCTYPE html>

<html>

<head>

<title>Dell Swift Demo [0.0]</title>

<script src=”jquery.js” type=”text/javascript”></script>

<script type=”text/javascript” charset=”utf-8″>

var xmlhttp = null;

function swiftLogin() {

var usr = $(‘input:text[name=usr]’).val();

var key = $(‘input:text[name=key]’).val();

// code for IE7+, Firefox, Chrome, Opera, Safari (UR SOL IE<7)

xmlhttp = new XMLHttpRequest();

xmlhttp.onreadystatechange=function() //callback

{

if (xmlhttp.readyState==2)

{

$(‘#status’).replaceWith(xmlhttp.getResponseHeader(“X-Auth-Token”));

}

}

xmlhttp.open(‘GET’,’https://auth.api.rackspacecloud.com/v1.0&#8242;, true);

xmlhttp.setRequestHeader(‘Host’, ‘auth.api.rackspacecloud.com’);

xmlhttp.setRequestHeader(‘X-Auth-User’, usr);

xmlhttp.setRequestHeader(‘X-Auth-Key’, key);

xmlhttp.send();

}

</script>

</head>

<body>

<div id=”credentials”>

<fieldset id=”credentials” class=””>

<legend>Swift Login</legend>

<label for=”user”>User: </label><input type=”text” name=”usr” value=”user” id=”user”>

<label for=”key”>Key: </label><input type=”text” name=”key” value=”key” id=”key”>

<input type=”button” name=”Login” value=”login” id=”Login” onclick=”swiftLogin();”>

</fieldset>

</div>

<div id=”status”>[pending]</div>

<div id=”footer”>Time?</div>

<script type=”text/javascript”>

$(‘#footer’).replaceWith((new Date).toString());

swiftLogin();

</script>

</body>

</html>