WhatTheBus fun with Cucumber and MemCacheD

Sometimes a problem has to kick you upside the head so you can learn an important lesson.  Tonight’s head slapper was an interaction between Cucumber and MemCacheD.

If you are using CUCUMBER AND MEMCACHE read this post carefully so you don’t get burned.  If you’re using MemCache and not writing tests then return to Jail, do not collect $200.

It’s important to note that Cucumber has the handy side effect of running each scenario in a transaction.  The impact is that the data from each scenario does not impact the next scenario.  (note: you can pre-load data into cucumber using fixtures).

However, Cucumber does not do any rollback for Cache keys added into MemCache.  In fact, your MemCache entries will happily persist between your development and test systems.

WhatTheBus has a simple check to reduce database writes – it only writes to the database if there is no cache hit for the bus.  My thinking is that we only need to add a new bus if there is no key as shown in this partial snippet:

  cache = Rails.cache.read params[:id]
  if cache.nil?
     bus = Bus.find_or_create_by_xref :name => params[:name], :xref => params[:id]

This works great for live testing, but fails in technicolor for Cucumber because tests with the same ID will not make it to the find_or_create.

To solve the problem, I had to add a pre-condition (‘given’ in Cucumber speak) to each scenario to make sure the cache was cleared.  It looks like this in the scenario feature:

  Given no cache for "1234"

And that’s translated as code in the steps like so:

  Given /^no cache for "([^\"]*)"$/ do |id|
   Rails.cache.delete id

Ready to Fail

Or How Monte Python taught me to program

Sometimes you learn the most from boring conference calls.  In this case, I was listening to a deployment that was so painfully reference-example super-redundant by-the-book that I could have completed the presenter’s sentences.  Except that he kept complaining about the cost.  It turns out that our typical failure-proofed belt-and-suspenders infrastructure is really, really expensive.

Shouldn’t our applications be Monte Python’s Black Knight yelling “It’s just a flesh wound!  Come back and fight!”   Instead, we’ve grown to tolerate princess applications that throw a tantrum of over skim milk instead of organic soy in their mochaito.

Making an application failure-ready requires a mindset change.  It means taking of our architecture space suit and donning our welding helmet.

Fragility is often born from complexity and complexity is the compounded interest from system design assumptions.

Let’s consider a transactional SQL database.  I love relational databases.  Really, I do.  Just typing SELECT * FROM or LEFT OUTER JOIN gives me XKCD-like goose bumps.  Unfortunately, they are as fragile as Cinderella’s glass slippers.  The whole concept of relational databases requires a complex web of sophisticated data integrity we’ve been able to take for granted.  The web requires intricate locking mechanisms that make data replication tricky.  We could take it for granted because our operations people have built up super-complex triple-redundant infrastructure so that we did not have to consider what happens when the database can’t perform its magic.

What is the real cost for that magic?

I’m learning about CouchDB.  It’s not a relational database, it a distributed JSON document warehouse with smart indexing.  And compared some of the fine grained features of SQL, it’s an arc welder.   The data in CouchDB is loosely structured (JSON!) and relationships are ad hoc.  The system doesn’t care (let alone enforce) that if you’ve maintained referential integrity within the document – it just wants to make sure that the documents are stored, replicated, and indexed.   The goodness here is that CouchDB allows you to distribute your data broadly so that it can be local and redundant.  Even better, weak structure allows you to evolve your schema agilely (look for a future post on this topic).

If you’re cringing about lack referential integrity then get over it – every SQL backed application I ever wrote required RI double-checking anyway!

If you’re cringing about possible dirty reads or race conditions then get over it – every SQL backed application I ever wrote required collision protection too!

I’m not pitching CouchDB (or similar) is a SQL replacement.   I’m holding it up as an example of a pragmatic approach to failure-ready design.   I’m asking you to think about the hidden complexity and consequential fragility that you may blindly inherit.

So cut off my arms and legs – I can still spit on your shoes.

WhatTheDB? Adding mySQL into WhatTheBus

Today’s WhatTheBus update added data persistence to the application. Ultimately, I am planning to use CouchDB for persistence; however, I wanted to show a SQL to document migration as part of this process. My objective is to allow dual modes for this application.

In the latest updates, I continued to show Test Driven Development (TDD) process using Cucumber. Before starting work, I ran the test suite and found a bug – spectacular failure if MemCacheD is not running. So my first check-in adds recovery and logging around that event. Next I wrote a series of tests for database persistence. These tests included checking a web page that did not exist at this time. I ran the tests – as expected, all failed.

The persistence was very simple: models for bus and district. These minimal models are created dynamically when a bus location is updated. The data contract is that the first location update should include the bus name and distract in the url. After the first update, only ID and location (lat, lng) are expected. In addition to the model and migrations, I also updated the database.yml to use mySQL.

Creating a web page for the bus (bus/index/[xref id]) required the addition of a little infrastructure for the application. Specifically, I had to add an application layout and style sheet. Just because I have a styles sheet, does not mean there is any style (I’ve got style, brother. I’ve got million dollar charm, sister. I’ve got headaches and toothaches and bad times too).

To preserve simplicity, I am not storing the location information in the database. Location is so time sensitive that I don’t want to create any storage burden and I’m using cache expiration to ensure that we don’t keep stale locations around.

Up next…. I’m going to add a simulator (in rake) to make it easier to work on the application.