Are Clouds using Dark Cycles?

Or “Darth Vader vs Godzilla”

Way way back in January, I’d heard loud and clear that companies where not expecting to mix cloud computing loads.  I was treated like a three-eyed Japanese tree slug for suggesting that we could mixing HPC and Analytics loads with business applications in the same clouds.  The consensus was that companies would stand up independent clouds for each workload.  The analysis work was too important to interrupt and the business applications too critical to risk.

It has always rankled me that all those unused compute cycles (“the dark cycles”) could be put to good use.  It’s appeals to my eco-geek side to make best possible use of all those idle servers.   Dave McCrory and I even wrote some cloud patents around this.

However, I succumbed to the scorn and accepted the separation.

Now all of a sudden, this idea seems to be playing Godzilla to a Tokyo shaped cloud data center.  I see several forces merging together to resurrect mixing workloads.

  1. Hadoop (and other map-reduce Analytics) are becoming required business tools
  2. Public clouds are making it possible to quickly (if not cheaply) setup analytic clouds
  3. Governance of virtualization is getting better
  4. Companies want to save some $$$

This trend will only continue as Moore’s Law improves the compute density for hardware.  Since our designs are leading towards scale out designs that distribute applications over multiple nodes; it is not practical to expect an application to consume all the power of a single computer.

That leaves a lot of lonely dark cycles looking for work.

Now all of a sudden, this idea seems to be playing Godzilla to a Tokyo shaped cloud data center.  I see several forces merging together to resurrect mixing workloads.

  1. Hadoop (and other map-reduce Analytics) are becoming required business tools
  2. Public clouds are making it possible to quickly (if not cheaply) setup analytic clouds
  3. Governance of virtualization is getting better
  4. Companies want to save some $$$

This trend will only continue as Moore’s Law improves the compute density for hardware.  Since our designs are leading towards scale out designs that distribute applications over multiple nodes; it is not practical to expect an application to consume all the power of a single computer.

That leaves a lot of lonely dark cycles looking for work.

Network World on Ubuntu Cloud

My team at Dell is working on solutions around this cloud strategy.  I like the approach that Canonical & Ecalyptus are taking concerning the use of open source (KVM), ad hoc API standards (Amazon), and flexible storage configurations (DAS or SAN).

Looking at usage trends, stateless server designs (as we get closer to PaaS) will allow us to rethink how we architect hypervisor based clouds.  Of course, this requires us to rethink application architectures and the OS choices that we make to run them. 

Thanks for BartonGeorge.net for the link  that got this thought started.  Network World says…

“Ubuntu Enterprise Cloud provides tight integration between Ubuntu and Ecalyptus and a series of CLI tools (made even more simple by apps like HybridFox with gives them a GUI) that follows along Amazon’s construction. Work done for Ubuntu Enterprise Cloud ends up being somewhat reusable if you’re transporting your work to Amazon.”

Very Costly Accounting and why I value ship ready code

Or be careful what you measure because you’ll get it.

One of Agile’s most effective, least understood and seldom realized disciplines is that teams should “maintain ship-readiness” of their product at all times.  Explaining the real value of this discipline in simple terms eluded me for years until I was talking to an accountant on a ski lift.

Before I talk about snow and mountain air inspired insights, let me explain what ship-ready means.  It means that you should always be ready to deliver the output from your last iteration to your most valuable customer.   For example, when my company, BEware, finished our last iteration we could have delivered it to our top customer, Fin & Key, without losing our much needed beauty sleep.   Because we maintain ship-readiness, we worked to ensure that they could upgrade, have complete documentation, and high quality without spending extra cycles on release hardening.

The version that we’d ship to Fin & Key would probably not have any new features enabled (see tippy towers & Eric Ries on split testing) , but it would have fixed defects and the incremental code for those new features baked in.  While we may decide to limit shipments to fixed times for marketing reasons, that must not keep us from always being ready to ship.

Meanwhile, back at 8,200 feet, my accountant friend was enrolled in Cost Accounting 101.  To fulfill my mission as an Agile Subversive, I suggested reading  “The Goal” by Eli Goldratt which comes out very strongly against the evils of CA.  Goldratt’s logic is simple – if you want people to sub-optimize and ignore the overall system productivity then you’d assign costs to each sub-component of your system.  The result in manufacturing is that people will always try to keep the most expensive machine 100% utilized even if that causes lots of problems elsewhere and drives up costs all over the factory.

Cost Accounting’s process of measuring on a per cost basis (instead of a system basis) causes everyone to minimize the cost at their area rather than collaborate to make the system more efficient.  I’ve worked in many environments where management tried to optimize expensive developer time by off loading documentation, quality, and support to different teams.  The result was a much less effective system where defects were fixed late (if ever), test automation was never built, documentation was incomplete, and customer field issues lingered like the smell of stale malt beverage in a frat house.

No one wanted these behaviors, yet they were endemic because the company optimized developer time instead of working product.

Agile maintains ship readiness because it becomes Engineering’s primary measurement.  Making sellable product the top priority focuses team on systems and collaboration.  It may not be as “efficient” to have a highly paid developer running tests; however, it does real economic harm if the developer continues to write untested code ahead of your ability to verify it.  Even more significantly, a developer who plays a larger part in test, documentation, and supports is much more invested in fixing problems early.

If your company wants ship product then find a metric that gets you to that goal.  For Agile teams, that metric is percent of iterations delivering ship ready product.  My condolences if your team’s top values are milestones completed, bugs fixed, or hours worked.

Death by Ant Bytes

Or the Dangers of Incremental Complexity

Products are not built in big bangs: they are painfully crafted layer upon layer, decision after decision, day by day.  It’s also a team sport where each member makes countless decisions that hopefully help flow towards something customers love.

In fact, these decisions are so numerous and small that they seem to cost nothing.  Our judgment and creativity to builds the product crumb by drop.  Each and every morning we shows up for work ready to bake wholesome chocolaty goodness into the product.   It’s seeming irrelevance of each atomic bit that lulls us into false thinking that every addition is just a harmless Pythonesque “wafer thin”  bite.

That’s right, not all these changes are good.  It’s just as likely (perhaps more likely) that the team is tinkering with the recipe.  Someone asks them to add a pinch of cardamom today, pecans tomorrow, and raisins next week.  Individually, these little changes seem to be trivial.  Taken together, they can delay your schedule at best or ruin your product at worst.

Let me give you a concrete example:

In a past job, we had to build an object model for taxis.  At our current stage, this was pretty simple: a truck has a name, a home base, and an assigned driver.  One of our team independently looked ahead and decided individually that he should also add make, model, MPG, and other performance fields.  He also decided that assignments needed a whole new model since they could date range (start, end) and handle multiple drivers.  Many of you are probably thinking all this was just what engineers are supposed to do – anticipate needs.  Read on…

By the time he’d built the truck model, it had taken 5x as a long and resulted in 100s of lines of code.  It got worse the very next week when we built the meter interface code and learned more about the system.  For reporting requirements, MPG and performance fields had to be handled outside the taxi model.  We also found that driver assignments were much more naturally handled by looking at fare information.   Not only had we wasted a lot of time, we had to spend even more time reversing the changes we’d put in.

One of my past CEOs called this a “death by ant bites” and “death of a million cuts.”

It’s one of the most pernicious forms of feature creep because every single one of the changes can be justified.  I’m not suggesting that all little adds are bad, but they all cost something.   Generally, if someone says they are anticipating a future need, then you’re being bitten by an ant.

You need to make sure that your team is watching each other’s back and keeping everyone honest.  It’s even better to take turns playing devil’s advocate on each feature.  It’s worth an extra 10 minutes in a meeting to justify if that extra feature is required.

PS: Test Driven Design (TDD) repels ants because it exposes the true cost for those anticipatory or seemingly minor changes.  That “10 minute” feature is really a half day of work to design, test, integrate, and document.  If it’s not worth doing right, then it’s not worth adding to the product.

Wearing the Cape

Or Team Death by Heroism

I remember the day that put on the hero’s cape and I killed our team.

A few years ago I would have felt really good about saving the deadline and exposing the deadweight we’d been carrying on the team.   I’d get a good performance review, a bonus, and some expectation that I’d be picked first on the corporate playground baseball team.  Everyone was oblivious that I just cost the company a lot of time and money.

So with all this goodness raining down on me like gumballs in an Adam Sander movie, how does wearing the cape cause team death?  Let us count the ways:

  1. I’m going to do my job badly.  A Hero only focuses on winning the day.  Like Batman saving the Gotham, we aren’t worried about the collateral damage.  We don’t care about the unfortunate drivers in the way of my super awesome armored tank-bike!  Can you afford to have documentation, cross-training and automated tests become collateral damage?
  2. My teammates are going to feel pretty crummy.  Who gets to be Aquaman on you team?   I’m telling them that they could not do their job so I’ve got to do my job AND their job.  There’s no way to sugar coat that ego buster and their not likely to offer much help after that.
  3. Decisions will be one-sided.  Without a team, the hero has no balancing ideas and will go racing off into obvious traps and dead-ends.  On my latest hero project, I worked all weekend and was told on Monday that there was a check-box on the operating system that would have done the same thing!  Doh.  Why didn’t my teammate give me a heads up on Friday?  Why should he bother – it was better cinema to watch me ricochet all over the project.
  4. Heroes require drama, so nothing will ever stabilize.  It’s silly to dash and keep saving the day if everything’s working pretty well.  Once I’ve put on the cape, I’m much more likely to invent crises to solve because most of the work that is needed to ship a product is pretty drama-free.  Some of the heros I’ve met just leap from job to job faster than a speeding bullet.

Looking for some kryptonite?  It’s not that hard to find:

  1. Stop rewarding heroics with praise, money, pizza, or beer.
  2. Make the hero take a long vacation, transfer them, or fire them.  Really, you can live without them.
  3. If you’ve got a hero, pair them with another person (they can respect) to minimize the damage.
  4. Have heros take on the grunt work instead of letting them cherry pick the best work.
  5. Stop the team.  If you believe team needs a hero then you need to reevaluate your deadlines, feature list, or external messaging.  Generally, an emergent hero is a symptom of a larger problem.

Wonder team powers, activate!

Tippy Towers

or “Synergistic Nondistruptive Integration”

Back in my Space Suit Architect days, I was a fan of “green field” or second system redesign.  It made sense that we should use our hard earned knowledge of the application domain to “build the product right this time instead of rushed crapware.”  Since then, I’ve come to appreciate a more pragmatic approach that I call “Tippy Towers.”

Let’s get specific with an example…

My company, BEware, has a shipping product, Alpha, that works pretty well even though it’s all written in this obsolete language AWFOL-88.  Yeah, it’s got problems scaling, only runs on Vista, and it’s hard to add new features; however, it’s earning revenue and the sales team can sell it.

The engineering team is really hot to recreate Alpha as a new product, Omega, using Ruby on Rails (yeah, baby!) and marketing is behind the move because they’re told it’s the only way to add features.  Side note: in my experience, these claims are always exaggerated but generally true (look for a future post about working out of order).

In a green field approach, everyone agrees to stop working on Alpha and focus all work on Omega.  The company enters the tunnel of death and hopefully dies of starvation before a buggy v1.0 version Omega ships to the companies few remaining customers.

Here’s the plan.  Since we’re astronauts, our design is to build the new hotness Omega first and the later bring over the boring old features of Alpha (like an installer).  As good Agile engineers, we’ll build the most critical use cases into Omega first.  We’ll identify early customers that only need a subset of the Alpha features.  Since these are a known subset and core features, we quickly build some kick-ass next-generation hypen-inducing wonderware.

The early trial customers love the stripped down Omega product and start demanding more of the Alpha features.  Now the Alpha tower begins to tip over because we need Omega to duplicate the all features that are working adequately in Alpha.  Customers can do a comparison between Alpha and Omega and conversations start saying “I would buy Omega but I need to green cross fade accounting API from Alpha.” 

At this point, our Alpha tower is free falling.   Omega has enough features that it is hurting the sales for Alpha while customers wait for Omega.  Worse, abandoned Alpha is falling behind in the market.  At this point, we’re going to spend many months porting working Alpha features from Omega.  This is not value added work, its quality optional crisis management where we rush Omega to be “feature complete” ASAP.

Our alternative is to see Alpha as a tippy tower that we want to keep standing.  Each feature we pull from Alpha to Omega can makes it more unstable.  It takes more effort to support both; however, if Omega can leverage Alpha and vice versa then we have the luxury to focus on new features and better designs.

In a Tippy Tower mindset, Alpha is valuable.  The existing code base is field proven, generating revenue, and shipping.  Even if Omega holds promise, it’s worth our time to keep Alpha selling.  It may slow down Omega, but it ultimately gives the company more time to get Omega right.

Let’s take the Tippy Tower concept a farther.  I argue that Alpha and Omega should be designed to co-exist.  For example, Omega’s new web interface could be displayed on a form in Alpha’s UI.   It may take some time to enable a browser in Alpha, but Alpha users would have immediate access to Omega’s new capability.  In this case, we’ve done work that helps create Omega and also added uplift to our shipping product.

In an ideal Tippy Tower case, those legacy Alpha features just stay in the Alpha code base and die a natural death because Omega replaced them with a better paradigm. The fact that we keep Alpha from tipping over kept us from having to port them and kept us from creating more rushed crapware.

One last note: sales people love this approach – they always have something to sell and customers to reference.

Just Striping in the RAIN

Or “behold the power of the unreliable”

In a previous post, I discussed the concept of a Redundant Array of Inexpensive Nodes (RAIN) as a way to create more reliable and scalable applications.   Deploying a RAIN application is like being the House in Vegas – it’s about having enough size that the odds come out in your favor.  Even if one player is on a roll, you can predict that nearly everyone else is paying your rent.  Imagine what would happen if all the winning gamblers were in your casino!  If you don’t want to go bankrupt when deploying a RAIN app, then ensure that the players spread out all over the Strip.

One of my core assumptions is that you’ll deploy a RAIN application on a cloud.   This is a significant because we’re assuming that your nodes are

  1. idle most of the time because your traffic loads are cyclic
  2. unreliable because the cloud provider does not offer much SLA
  3. divisible because renting ten 1/10ths of a server costs roughly the same as a whole one
  4. burstable because 1/10th servers can sometimes consume that extra 9/10th server

The burstable concept is a dramatic power multiplier that tips the whole RAIN equation heavily towards clouds. 

Bursting means that under load, your 10 1/10th servers (roughly 1 server in cost) could instantly expand to the power of 10 full servers!  That reflects an order of magnitude spike in demand without any change in your application.

In the past, we’ve racked extra servers to handle this demand.  That meant that we had a lot of extra capacity taking up rack space, clubbing innocent migratory electrons for their soft velvety fur, and committing over provisioning atrocities.  

Today, multi-tenant clouds allow us to average out these bursts by playing the odds on when application bursts will occur.  In this scenario, the infrastructure provider benefits from the fact that applications need to be over provisioned.  The application author benefits because they can instantly tap more resources on demand.  Now, that is what I would call a win-win synergy!

All this goodness depends on

  • standard patterns & practices that developers use to scale out RAIN applications
  • platform improvements in cloud infrastructure to enable smooth scale out
  • commercial models that fairly charge for multi-tenant over-subscription
  • workable security that understands the underlying need for co-mingling workloads

The growing dominance of cloud deployments looks different when you understand the multiplying interplay between multi-tenant clouds and cloud-ready RAIN applications.

Green Clouds?

This is an interesting take on clouds by the Guardian.  Dell’s new cloud offerings are more power efficient; however, we are racking lots and lots of servers.  It’s like everyone in China buying fuel efficient cars – they are better then Hummers, but still going to use gas.

We’re clearly entering an age where compute consumed per person is going up dramatically.  They are correct that the cost and environmental impact of that compute is hidden from the consumer.  I have a front row seat to these cloud data centers and I can verify that lots and lots of new servers are being brought online every day. 

Welcome back to 2001.

Screening Recruits for Agile Savvy

We’re hiring new managers and developers into my team and its important (to me) that we find people who will embrace our Agile processes.

Sadly, many people experience with the fluffy Agile decorations and not its core disciplines; consequently, interviewees will answer “yes, I’ve done Agile” and not really (IMHO) know what they are saying.

So I wanted to craft some questions that will help identify good Agile candidates even if they have no experience (or negative experience) with the process.

  • Explain a time that you did not agree with a design decision that was being made. [Good candidates will tell you that they had a healthy debate about it, made sure they were heard, and then supported the team decision.  Excellent candidates will give you a specific case where they were wrong and the outcome was better their suggestion.]
  • How have you handled the trade-off between shipping quality software and getting a release done on time? [Good candidates will be pragmatic about the need to release but own quality as their responsibility.  Excellent candidates will talk about implementing TDD and automation so that quality can be maintained throughout a release cycle.]
  • How have you made changes to your work habits based on retrospectives? [Good candidates will tell you about items where they had to acknowledge other people’s suggestions and change their behavior.  Excellent candidates will be excited about having ownership in their team’s continuous improvement and can give examples.]
  • Why are sprint reviews important? [Good candidates will say that it’s important for a team to show progress to other groups.  Excellent candidates will tell you that it’s how a team shows that it is meeting its commitments and getting feedback to improve the product.]
  • Is it possible to achieve the objective to be “ship ready” at the end of each sprint? [Good candidates will say that ship ready is a great target but only practical in the last sprints before a release.  Excellent candidates will explain that being ship ready is a core driver for the process that ensures the team is focused on priorities, quality, and breaking work into components.]
  • Tell me about the best performing team that you’ve been part of. What made it a great team? [Good candidates will tell you about having quality people or a very tight focus.  Excellent candidates will tell you have the shared goals of the team and how people gave up individual recognition to accomplish team objectives.]
  • What does it mean to for a team to be transparent? [Good candidates will talk about status reports and documentation.  Excellent candidates will talk about being willing to take risks and fail fast.]

If they can’t pass theses questions then go buy a lifeboat.  You’ll want it for that that waterfall you’re going to be riding down shortly.

Don’t confuse the decorations for the process!

Or “Icing is pretty, but you need a cake too!”

Good Agile Process books will explain in chapter 1 that the stand-ups, iterations, demos, retrospectives, long planning meetings, et cetera are all “decorations” for the process.  This is an important concept that gets completely lost when they spend the next 43 chapters talking about how the decorations are used to support the process.

The decorations are not the process!

Having stand-ups does not make you agile.  Doing work in sprints does not make you agile.  Using story- cards does not make you agile.  Doing retrospectives where people share honest feedback does not make you agile (but could help you become agile if you actually take the feedback).

Agile is a business discipline.

Agile teams have the discipline to focus on delivering a product that generates revenue.  Management supporting agile teams has the discipline to reward team work.  Product owners have the discipline to provide crisp actionable priorities.  Everyone has the discipline to be transparent and willing to adapt as the environment evolves. 

These disciplines are hard to build and maintain, but they are fundamentally valuable.  They are honest and practical.  They are revolutionary.

Now, if you embrace the agile disciplines then the decorations will reinforce your efforts like mocha fudge icing on a double-dutch chocolate groom’s cake.

Unfortunately, if you lack the discipline then the decorations become a whip used to micromanage teams into a long frustrating death march.  Sadly, I’m finding that this experience is the more common one in the industry. 

Bon appetite!