Death by Ant Bytes

Or the Dangers of Incremental Complexity

Products are not built in big bangs: they are painfully crafted layer upon layer, decision after decision, day by day.  It’s also a team sport where each member makes countless decisions that hopefully help flow towards something customers love.

In fact, these decisions are so numerous and small that they seem to cost nothing.  Our judgment and creativity to builds the product crumb by drop.  Each and every morning we shows up for work ready to bake wholesome chocolaty goodness into the product.   It’s seeming irrelevance of each atomic bit that lulls us into false thinking that every addition is just a harmless Pythonesque “wafer thin”  bite.

That’s right, not all these changes are good.  It’s just as likely (perhaps more likely) that the team is tinkering with the recipe.  Someone asks them to add a pinch of cardamom today, pecans tomorrow, and raisins next week.  Individually, these little changes seem to be trivial.  Taken together, they can delay your schedule at best or ruin your product at worst.

Let me give you a concrete example:

In a past job, we had to build an object model for taxis.  At our current stage, this was pretty simple: a truck has a name, a home base, and an assigned driver.  One of our team independently looked ahead and decided individually that he should also add make, model, MPG, and other performance fields.  He also decided that assignments needed a whole new model since they could date range (start, end) and handle multiple drivers.  Many of you are probably thinking all this was just what engineers are supposed to do – anticipate needs.  Read on…

By the time he’d built the truck model, it had taken 5x as a long and resulted in 100s of lines of code.  It got worse the very next week when we built the meter interface code and learned more about the system.  For reporting requirements, MPG and performance fields had to be handled outside the taxi model.  We also found that driver assignments were much more naturally handled by looking at fare information.   Not only had we wasted a lot of time, we had to spend even more time reversing the changes we’d put in.

One of my past CEOs called this a “death by ant bites” and “death of a million cuts.”

It’s one of the most pernicious forms of feature creep because every single one of the changes can be justified.  I’m not suggesting that all little adds are bad, but they all cost something.   Generally, if someone says they are anticipating a future need, then you’re being bitten by an ant.

You need to make sure that your team is watching each other’s back and keeping everyone honest.  It’s even better to take turns playing devil’s advocate on each feature.  It’s worth an extra 10 minutes in a meeting to justify if that extra feature is required.

PS: Test Driven Design (TDD) repels ants because it exposes the true cost for those anticipatory or seemingly minor changes.  That “10 minute” feature is really a half day of work to design, test, integrate, and document.  If it’s not worth doing right, then it’s not worth adding to the product.

Cloud Application Life Cycle

Or “you learn by doing, and doing, and doing”

One of the most consistent comments I hear about cloud applications is that it fundamentally changes the way applications are written.  I’m not talking about the technologies, but the processes and infrastructure.

Since our underlying assumption of a cloud application is that node failure is expected then our development efforts need to build in that assumption before any code is written.  Consequently, cloud apps should be written directly on cloud infrastructure.

In old school development, I would have all the components for my application on my desktop.  That’s necessary for daily work, but does not give me a warm fuzzy for success in production.

Today’s scale production environments involve replicated data with synchronization lags, shared multi-writer memcache, load balancers, and mixed code versions.  There is no way that I can simulate that on my desktop!   There is no way I can fully anticipate how that will behave all together!

The traditional alternative is to wait.  Wait for QA to try and find bugs through trial and error.  Or (more likely) wait for users to discover the problem post deployment.

My alternative is to constantly deploy the application to a system that matches production.    As a bonus, I then attack the deployment with integration tests and simulators.

If you’re thinking that is too much effort then you are no thinking deeply enough.  This model forces developers to invest in install and deployment automation.  That means that you will be able to test earlier in the cycle.  It means you will be able to fix issues more quickly.  And that you’ll be able to ship more often.  It means that you can involve operations and networking specialists well before production.  You may even see more collaboration between your development, quality, and operations teams.   

Forget about that last one – if those teams actually worked together you might accidently ship product on time.  Gasp!

WhatTheDB? Adding mySQL into WhatTheBus

Today’s WhatTheBus update added data persistence to the application. Ultimately, I am planning to use CouchDB for persistence; however, I wanted to show a SQL to document migration as part of this process. My objective is to allow dual modes for this application.

In the latest updates, I continued to show Test Driven Development (TDD) process using Cucumber. Before starting work, I ran the test suite and found a bug – spectacular failure if MemCacheD is not running. So my first check-in adds recovery and logging around that event. Next I wrote a series of tests for database persistence. These tests included checking a web page that did not exist at this time. I ran the tests – as expected, all failed.

The persistence was very simple: models for bus and district. These minimal models are created dynamically when a bus location is updated. The data contract is that the first location update should include the bus name and distract in the url. After the first update, only ID and location (lat, lng) are expected. In addition to the model and migrations, I also updated the database.yml to use mySQL.

Creating a web page for the bus (bus/index/[xref id]) required the addition of a little infrastructure for the application. Specifically, I had to add an application layout and style sheet. Just because I have a styles sheet, does not mean there is any style (I’ve got style, brother. I’ve got million dollar charm, sister. I’ve got headaches and toothaches and bad times too).

To preserve simplicity, I am not storing the location information in the database. Location is so time sensitive that I don’t want to create any storage burden and I’m using cache expiration to ensure that we don’t keep stale locations around.

Up next…. I’m going to add a simulator (in rake) to make it easier to work on the application.