The Go-Fasterer OpenStack Cloud Strategy

Dell’s OpenStack strategy (besides being interesting by itself) brings together Agile and Lean approaches and serves as a good illustration of the difference between the two approaches.

Before I can start the illustration, I need to explain the strategy clearly enough that the discussion makes sense.   Of course, my group is selling these systems so the strategy starts a sales pitch.  Bear with me, this is a long post and I promise we’ll get to the process parts as fast as possible.

Dell’s OpenStack strategy is to enter the market with the smallest possible working cloud infrastructure practical.  We have focused maniacally on eliminating all barriers and delays for customers’ evaluation processes.  Our targets are early adopters who want to invest in a real, hands-on OpenStack evaluation and understand they will have to work to figure out OpenStack.   White gloves, silver spoons and expensive licensed applications are not included in this offering.

We are delivering a cloud foundation kit: 7u hardware setup (6 nodes+switch), white paper, installer, and a dollop of consulting services.  It is a very small foot print system with very little integration.  The most notable deliverable is our target of going from boxes to working cloud in less than 4 hours (I was calling this “nuts to soup before lunch” but marketing didn’t bite).

Enough background?  Let’s talk about business process!

From this point on, our product offering is just an example.   You should imagine your product or service in these descriptions.  You should think about the internal reconfiguration required needed to bring your product or service to market in the way I am describing.

There are two critical elements in the go-fasterer strategy:

  1. a very limited “lean” product and
  2. a very fast “agile” installation process.

The offering challenges the de facto definition of solutions as being complete packages bursting with features, prescriptive processes, licensed companion products and armies of consultants.  While Dell will eventually have a solution that meets (or exceeds) these criteria; our team did not think we should wait until we had all those components before we begin engaging customers.

Our first offering is not for everyone by design.  It is highly targeted to early adopters who have specific needs (desire to move quickly) that outweigh all other feature requirements.  They are willing to invest in a less complete product because to core alone solves an important problem.

The concept of stripping back your product to the very core is the essence of Lean process.  Along this line of thinking, maintaining ship readiness is the primary mantra – if you can’t sell your product then your entire company’s existence is at risk.  I like the way the Poppendieck ‘s describe it:  you should consider product features as perishable inventory.  If we were selling fruit salad and you had bananas and apples but no cherries then it makes sense to sell apple/banana medley while you work on the cherries.

Whittling back a product to the truly smallest possible feature set is very threatening and difficult.  It forces teams to take risks and guesses that leave you with a product that many customers will reject.  Let me repeat that: you’re objective is to create a product that many customers will reject.  You must do this because it:

  1. gets into the market much faster for some customers (earning $ is wonderfully clarifying)
  2. learn immediately what’s missing (fewer future guesses)
  3. learn immediately what’s important to customers (less risk)
  4. builds credibility that you are delivering something (you’re building relationships)

Ironically, while lean approaches exist to reduce risk and guesswork; they will feel very risky and like gambling to organizations used to traditional processes.   This is not surprising because our objective is to go faster so initially we will be uncomfortable that we have enough information to make decisions.

The best cure for lack of information is not more analysis!  The cure is interacting with customers.

Lean says that you need product if you want to interact meaningfully with customers.  This is because customers (even those who are not buying right away) will take you more seriously if you’ve got a product.  Talking about products that you are going to release is like talking about the person you wanted to take to prom but never asked.

To achieve product early, you need to find the true minimum product set.  This is not the smallest comfortable set.  It is the set that is so small, so uncomfortable, so stripped down that it seems to barely do anything at all.

In our case, we considered it sufficient if the current OpenStack release could be reliably and quickly installed on Dell hardware.  We believe there are early adopter customers who want to evaluate OpenStack right away and their primary concern starting their pilot and marketing towards eventually deployment.

Mixing Agile into Lean is needed to make the “skinny down” discipline practical and repeatable.

Agile brings in a few critical disciplines to enable Lean:

  1. Prioritized roadmaps help keep teams focused on what’s needed first but don’t lose sight of longer term plans.
  2. Predictable pace of delivery allows committed interactions with customers that give timelines for fixing issues or adding capabilities.
  3. Working out of order keeps the great from being the enemy of the good so that we delay field testing while we solve imagined problems.
  4. Focus on quality / automation / repeatability reduces paying for technical debt internally and time firefighting careless defects when a product is “in the wild” with customers.
  5. Insistence on installable “ship ready” product ensures that product gets into the field whenever the right customer is found.  Note: this does not mean any customer.  Selling to the wrong customer can be deadly too, but that’s a different topic.
  6. Feedback driven iterations ensures that Lean engagements with customers are interactive and inform development.

These disciplines are important for any organization but vital when you go Lean.  To take your product early and aggressively to market, you must have confidence that you can continue to deliver after your customers get a taste of the product.

You cannot succeed with Lean if you cannot quickly evolve your initial offering.

The enabling compromise with Lean is that you will keep the train running with incremental improvements:  Lean fails if you engage customers early then disappear back into a long delivery cycle.  That means committing to an Agile product delivery cycle if you want Lean (note: the reverse not true)

I think of Lean and Agile as two sides of the same results driven coin: Lean faces towards the customer and market while Agile faces internally to engineering.

Please let me know how your team is trying to accelerate product delivery.

Note: of course, you’re also welcome to contact me if you’re interested in being an early adopter for our OpenStack foundation kit.

Why cloud compute will be free

Today at Dell, I was presenting to our storage teams about cloud storage (aka the “storage banana”) and Dave “Data Gravity” McCrory reminded me that I had not yet posted my epiphany explaining “why cloud compute will be free.”  This realization derives from other topics that he and I have blogged but not stated so simply.

Overlooking that fact that compute is already free at Google and Amazon, you must understand that it’s a cloud eat cloud world out there where losing a customer places your cloud in jeopardy.  Speaking of Jeopardy…

Answer: Something sought by cloud hosts to make profits (and further the agenda of our AI overlords).

Question: What is lock-in?

Hopefully, it’s already obvious to you that clouds are all about data.  Cloud data takes three primary forms:

  1. Data in transformation (compute)
  2. Data in motion (network)
  3. Data at rest (storage)

These three forms combine to create cloud architecture applications (service oriented, externalized state).

The challenge is to find a compelling charge model that both:

  1. Makes it hard to leave your cloud AND
  2. Encourages customers to use your resources effectively (see #1 in Azure Top 20 post)

While compute demands are relatively elastic, storage demand is very consistent, predictable and constantly grows.  Data is easily measured and difficult to move.  In this way, data represents the perfect anchor for cloud customers (model rule #1).  A host with a growing data consumption foot print will have a long-term predictable revenue base.

However, storage consumption along does not encourage model rule #2.  Since storage is the foundation for the cloud, hosts can fairly judge resource use by measuring data egress, ingress and sidegress (attrib @mccrory 2/20/11).  This means tracking not only data in and out of the cloud, but also data transacted between the providers own cloud services.  For example, Azure changes for both data at rest ($0.15/GB/mo) and data in motion ($0.01/10K).

Consequently, the financially healthiest providers are the ones with most customer data.

If hosting success is all about building a larger, persistent storage footprint then service providers will give away services that drive data at rest and/or in motion.  Giving away compute means eliminating the barrier for customers to set up web sites, develop applications, and build their business.  As these accounts grow, they will deposit data in the cloud’s data bank and ultimately deposit dollars in their piggy bank.

However, there is a no-free-lunch caveat:  free compute will not have a meaningful service level agreement (SLA).  The host will continue to charge for customers who need their applications to operate consistently.  I expect that we’ll see free compute (or “spare compute” from the cloud providers perspective) highly used for early life-cycle (development, test, proof-of-concept) and background analytic applications.

The market is starting to wake up to the idea that cloud is not about IaaS – it’s about who has the data and the networks.

Oh, dem golden spindles!  Oh, dem golden spindles!

32nd rule to measure complexity + 6 hyperscale network design rules

If you’ve studied computer science then you know there are algorithms that calculate “complexity.” Unfortunately, these have little practical use for data center operators.  My complexity rule does not require a PhD:

The 32nd rule: If it takes more than 30 seconds to pick out what would be impacted by a device failure then your design is too complex.

6 Hyperscale Network Design Rules

  1. Cost Matters
  2. Keep Networks Flat
  3. Filter at the Edge
  4. Design Fault Zones
  5. Plan for Local Traffic
  6. Offer load balancers (to your users)

Sorry for the teaser… I’ll be able to release more substance behind this list soon.   Until then comments are (as always) welcome!

 

 

 

Please (I’m begging here) consider the psychology of meetings!

It’s an occupational hazard that I go to a lot of meetings.  That’s not a bad thing because my job is a team sport.  Except for rare moments of programming bliss, my primary responsibility is to design, collaborate, and coordinate with other people.

The Problem

My problem with meetings is that we often forget about psychology when we are holding meetings.  I’m not going to go all B F Skinner or DeBono’s Hats on you, but I want to reinforce that we all have DIFFERENT MODES OF THINKING during meetings.  Some examples:

  • Listening to a presentation *
  • Giving/getting status *
  • Designing a product or presentation
  • Negotiating requirements
  • Making collaborative decisions
  • Celebrating success *
  • Giving subjective feedback

Let’s get concrete.  It is impossible to get people to  do design during a status meeting.  Even if you have the same people in the room, the way they behave (taking turns talking, rushing to finish, linear) during a status meeting is fundamentally different from how I would expect them to act during a design session (dynamic back and forth, pausing to reflect, circular)

It’s even worse when you factor in time bound meetings (I added * to them above) for open-ended activities like design, feedback, and collaboration.  If you want to kill ideas or suggestions then ask for them during the closing 2 minutes of a meeting. 

A big part of the solution is to remember how people are framing your meeting. 

  • If you want decisions but need to get people up speed then plan a clear end to the status part of the meeting
  • If you want feedback and discussion then don’t throw in a lot of status or one-way presentations.
  • remember:  Any meeting with a power point deck is a PRESENTATION.  It is not a discussion.

Agile Meetings

One of the things I like about Agile is that there is a lot of psychology in Agile!
 
The meetings in Agile are planned so that people are in the right frame of mind for the meeting. 
  • Stand-up (scrum) is a time bound status meeting.  It should be tight and focused.
  • Review is a FEEDBACK meeting and NOT a presentation meeting.  I like to have different people talking during the meeting so that it does not put people into a power point stupor. 
  • Retros are discussion meetings.  The must be open ended so people are not rushed. 
  • Planning is a design meeting where new ideas are formed.  People must be relaxed so they can be interact and collaborate.
It’s even more important to understand that the way iterations unwind is part of the magic
  • Reviews reinforce that team member have completed something.  It puts them in the frame of mind that they have closed something and are ready to start on new work.
  • Reviews give feedback to set priorities so that people feel like they know what’s important.  Their expectation is that they are working on what’s most important.
  • Retros build up team spirit.  They celebrate achievement and clear the air so the team can collaborate better. 
  • Retros establish team trust.  Trust is the single most essential ingredients for good design because people must take risks when they make design suggestions.
  • Planning brings together the feeling of accomplishment and purpose from reviews with the trust and collaboration from retros.
  • Planning should empower a team to be in control of their work and be confident that they are on the right track.
  • Stand-up, when planning works, becomes a quick reinforcement of priorities.
  • Stand-up should be status that the plan is on track or lead to a discussion (a new focus) for correction.

If meetings are getting you down then consider what type of meeting your are trying to create and focus your effort on that outcome. 

Oh, and my #1 tip for more effective meetings

Find ways to be collaborative by sharing screens, co-editing documents, or just rotating speakers.  It’s not a productive meeting when just one person has the conch.

Bootstrapping Hyperscale OpenStack Clouds – slides from 2/3 OpenStack SJC Meetup

The OpenStack meeting lightening talk is only 5 minutes, so the deck is mostly pictures that support points around a more detailed followup.

Here’s the deck: bootstrapping clouds preso

 and my Hyperscale white paper (links through Dell.com)

The theme of the talk is that hyperscale systems requires a fundamentally different management paradigm because at hyperscale

hardware faults are common,manual steps are impractical and small costs add up quickly.

Included in the preso are concepts I introduced at Flatness at the Edge.

2/10 Update: Now you can watch it Thanks to “@opnstk_com_mgr Stephen Spector lighting talks video of Rob Hirschfeld, Dell at Santa Clara, CA Meetup Feb 3, 2011 http://ow.ly/3U8OA

“Flatness at the Edges” guides hyperscale cloud design

As I’m working on a larger “cloud bootstrapping” white paper (look for a pending Dell release), I stumbled on an apparent unifying principle for hyperscale cloud design.  I’m interested in feedback about this concept to see if it fairly encapsulates a common target for cloud hardware, networking and software design.

“Flatness at the Edges” is one of the guiding principles of hyperscale cloud designs.  

Flatness means that cloud infrastructure avoids creating tiers where possible.  For example, having a blade in a frame aggregating networking that is connected to a SAN via a VLAN is a tiered design in which the components are vertically coupled.  A single node with local disk connected directly to the switch has all the same components but in a single “flat” layer.  

Edges are the bottom tier (or “leaves” to us CS geeks) of the cloud.  Being flat creates a lot of edges because most of the components are self contained.  To scale and reduce complexity, clouds must rely on the edges to make independent decisions such as how to route network traffic, where to replicate data, or when to throttle VMs.  The anti-example of edge design is using VLANs to segment tenants because VLANs (a limited resource) require configuration at the switching tier to manage traffic generated by an edge component.  We are effectively distributing an intelligence overhead tax on each component of the cloud rather than relying on a “centralized overcloud” to rule them all. 

Combining flatness and edges evolves the sympathetic concepts into full-fledged cloud design principle.

Interested in discussing this face to face?  I’ll presenting this and other cloud setup concepts that the SJC OpenStack meetup on 2/3.

Forward-looking Reviews: Feedback loops essential for Agile success

To keep pace with cloud innovations, my team at Dell drives aggressively forward.  Agile is essential to our success because it provides critical organization, control and feedback for our projects.  One repeating challenge I’ve had with the Agile decorations (aka meetings) is confusion between the name of the meeting and the process objectives.

The Agile process is very simple:  get feedback -> decide -> act -> repeat

People miss the intent of our process because of their predisposition about what’s supposed to happen in a meeting based on it’s name. 

Some examples of names I avoid:

  • Demo – implies a one-way communication instead of a feedback loop
  • Post-mortem – implies it’s too late to fix problems
  • Retrospective – implies we are talking about the past instead of looking forward
  • Schedule – assumes that we can make promises about the future (not bad, but limits flexibility)
  • Person-Weeks – focuses on time frame, not on the use cases we want to accomplish

Names that work well with Agile

  • Planning – we’re working together to figure out what we’re going to do.
  • Review – talking over work that’s been done with input expected.
  • Roadmap – implies a journey in which we have to achieve certain landmarks before we reach our destination.
  • Story Points – avoids time references in favor of relative weights and something that can be traded.
  • Velocity – conveys working quickly and making progress.  Works well with roadmaps.

We have recognize the powerful influence of semantics for people participating in any process.   If people arrive with the wrong mindset, we face significant danger (IMHO, soul numbing meetings are murder) from completely missing critical opportunities to get feedback and drive decisions.  Since We rarely review WHY we are meeting, so it’s easy to have people not engage or make poor assumptions based on nothing more than our word choice.

The most powerful mitigation to semantic confusion is to constantly seek feedback.  Ask for feedback specifically.  Ask for feedback using the work feedback.

Does this make sense?  I’d like your feedback.

Scrambled & Confused? Use Shorter Sprints!

As part of my Dell Agile coaching work, I comparing notes with another coach about a project was talking with collegue today about a team that was going into “scramble mode.”  In addition to being behind, they had a new manager without Agile experience.

My suggestion: Use Weekly Sprints!

You should consider short sprints in the following cases:

  1. New Team / Learning Agile (more time working together)
  2. Uncertain Requirements (more feedback from marketing)
  3. Behind Schedule (more delivery points and visibility)
  4. Prototypes / New Technology (more feedback from engineering)
  5. QA / Hardening Phase (more intergration points)

Shorter sprints (this team is on 3 weeks) have a proven record for boosting productivity, increasing teamwork and getting more accurate results.  As an additional benefit, the new manager gets to experience 3x more planning meetings and become more familar with how the process works.

The reason is simple: more planning means more feedback.

For many, shorter sprints seems contradictory to being more productive.  This aligns with my personal experience.  A team that plans in frequently generally plans inefficiently: long meetings, vauge committments, fuzzy tasks and poor estimates.  In many cases, the team simply does not remember what was planned by the last week!

Shorter sprints, while a lot of work to manage well, are generally much shorter.  Reviews cover less ground, retros are more concise, and planning has less ground to cover.  Weekly planning for a practiced team should be less than 4 hours.

Next time your team stumbles with Agile, consider shorter sprints as a way back to your normal pace.

Adding #11 to RightScale CEO’s Top 10 Cloud Myths

Generally, I think of a “Top 10 Cloud Myths” post as pure self-serving marketing fluffery, so I was pleasantly surprised to see Michael Crandal (RightScale’s CEO) producing a list with some substance.   Don’t get me wrong, the list is still a RightScale value prop 101.  It’s just that they have the good fortune to be addressing real problems and creating real value.

So, here’s my Myth #11 “We have to re-write our applications to run in the cloud.”  While that’s largely a myth; it may be a good myth to keep around because many applications SHOULD be rewritten – not for the cloud, but for changing usage patterns (more mobile users, more remote users, more SOA clients, etc)

OpenStack Swift Retriever Demo Online (with JavaScript xmlhttprequest image retrieval)

This is a follow-up to my earlier post with the addition of WORKING CODE and an ONLINE DEMO. Before you go all demo happy, you need to have your own credentials to either a local OpenStack Swift (object storage) system or RackSpace CloudFiles.

The demo is written entirely using client side JavaScript. That is really important because it allows you to test Swift WITHOUT A WEB SERVER. All the other Swift/Rackspace libraries (there are several) are intended for your server application to connect and then pass the file back to the client. In addition, the API uses meta tags that are not settable from the browser so you can’t just browse into your Swift repos.

Here’s what the demo does:

  1. Login to your CloudFiles site – returns the URL & Token for further requests.
  2. Get a list of your containers
  3. See the files in each container (click on the container)
  4. Retrieve the file (click on the file) to see a preview if it is an image file

The purpose of this demo is to be functional, not esthetic. Little hacks like pumping the config JSON data to the bottom of the page are helpful for debugging and make the action more obvious. Comments and suggestions are welcome.

The demo code is 4 files:

  1. demo.html has all the component UI and javascript to update the UI
  2. demo.js has the Swift interfacing code (I’ll show a snippet below) to interact with Swift in a generic way
  3. demo.css is my lame attempt to make the page readable
  4. jQuery.js is some first class code that I’m using to make my code shorter and more functional.

1-17 update: in testing, we are working out differences with Swift and RackSpace. Please expect updates.

HACK NOTE: This code does something unusual and interesting. It uses the JavaScript XmlHttpRequest object to retrieve and render a BINARY IMAGE file. Doing this required pulling together information from several sources. I have not seen anyone pull together a document for the whole process onto a single page! The key to making this work is overrideMimeType (line G), truncating the 32 bit string to 16 bit ints ( & 0xFF in encode routine), using Base64 encoding (line 8 and encode routine), and then “src=’data:image/jpg;base64,[DATA GOES HERE]'” in the tag (see demo.html file).

Here’s a snippet of the core JavaScript code (full code) to interact with Swift. Fundamentally, the API is very simple: inject the token into the meta data (line E-F), request the /container/file that you want (line D), wait for the results (line H & 2). I made it a little more complex because the same function does EITHER binary or JSON returns. Enjoy!

retrieve : function(config, path, status, binary, results) {

1   xmlhttp = new XMLHttpRequest();

2   xmlhttp.onreadystatechange=function()  //callback

3      {

4         if (xmlhttp.readyState==4 && xmlhttp.status==200) {

5            var out = xmlhttp.responseText;

6            var type = xmlhttp.getResponseHeader("content-type");

7            if (binary)

8               results(Swift.encode(out), type);

9            else

A               results(JSON.parse(out));

B         }

C      }

D   xmlhttp.open('GET',config.site+'/'+path+'?format=json', true)

E   xmlhttp.setRequestHeader('Host', config.host);

F   xmlhttp.setRequestHeader('X-Auth-Token', config.token);

G   if (binary) xmlhttp.overrideMimeType('text/plain; charset=x-user-defined');

H   xmlhttp.send();
}