Five ways I’m Sad, Mad and Scared: the new critical security flaw in firmware no one will patch.

There is new security vulnerability that should be triggering a massive server fleet wide upgrade and patch for data center operators everywhere.  This one undermines fundamental encryption features embedded into servers’ trusted platform module (TPM).   According to Sophos.com, “this one’s a biggie.”

Yet, it’s unlikely anyone will actually patch their firmware to fix this serious issue.

Why?  A lack of automation.  Even if you agree with the urgency of this issue,

  1. It’s unlikely that you can perform a system wide software patch or system re-image without significant manual effort or operational risk
  2. It’s unlikely that you are actually using TPM because they are tricky to setup and maintain
  3. It’s unlikely that you have any tooling that automates firmware updates across your fleet
  4. It’s unlikely that you have automation to gracefully roll out an update that can coordinate BIOS and operating system updates
  5. Even if you can do the above (IF YOU CAN, PLEASE CALL ME), it’s unlikely that you can coordinate updating both patching the BIOS and re-encrypting/rotating the data signed by the keys in the TPM

Being able to perform actions should be foundational; however, I know from talking to many operators that there are serious automation and process gaps at this layer.  These gaps weaken the whole system because we neither turn on security features embedded in our infrastructure nor automate ways to systematically maintain them.

This type of work is hard to do.  So we don’t do it, we don’t demand it and we don’t budget for it.

Our systems are way too complex to expect issues like this to be improved away by the next wave of technology.  In fact, we see the exact opposite.  The faster we move, the more flaws are injected into the system.  This is not security problem alone.  Bugs, patches and dependencies cause even more system churn and risk.

I have not given up hoping that our industry will prioritize infrastructure automation so that we can improve our posture.  I’ve seen that fixing the bottom layers of the stack makes a meaningful difference in the layers above.  If you’ve been following our work, then you already know that is the core of our mission at RackN.

It’s up to each of us individually to start fixing the problem.  It won’t be easy but you don’t have to do it alone.  We have to do this together.

unBIOSed? Is Redfish an IPMI retread or can vendors find unification?

Server management interfaces stink.  They are inconsistent both between vendors and within their own product suites.  Ideally, Vendors would agree on a single API; however, it’s not clear if the diversity is a product of competition or actual platform variation.  Likely, it’s both.

From RedFish SiteWhat is Redfish?  It’s a REST API for server configuration that aims to replace both IPMI and vendor specific server interfaces (like WSMAN).  Here’s the official text from RedfishSpecification.org.

Redfish is a modern intelligent [server] manageability interface and lightweight data model specification that is scalable, discoverable and extensible.  Redfish is suitable for a multitude of end-users, from the datacenter operator to an enterprise management console.

I think that it’s great to see vendors trying to get on the same page and I’m optimistic that we could get something better than IPMI (that’s a very low bar).  However, I don’t expect that vendors can converge to a single API; it’s just not practical due to release times and pressures to expose special features.  I think the divergence in APIs is due both to competitive pressures and to real variance between platforms.

Even if we manage to a grand server management unification; the problem of interface heterogeneity has a long legacy tail.

In the best case reality, we’re going from N versions to N+1 (and likely N*2) versions because the legacy gear is still around for a long time.  Adding Redfish means API sprawl is going to get worse until it gets back to being about the same as it is now.

Putting pessimism aside, the sprawl problem is severe enough that it’s worth supporting Redfish on the hope that it makes things better.

That’s easy to say, but expensive to do.  If I was making hardware (I left Dell in Oct 2014), I’d consider it an expensive investment for an uncertain return.  Even so, several major hardware players are stepping forward to help standardize.  I think Redfish would have good ROI for smaller vendors looking to displace a major player can ride on the standard.

Redfish is GREAT NEWS for me since RackN/Crowbar provides hardware abstraction and heterogeneous interface support.  More API variation makes my work more valuable.

One final note: if Redfish improves hardware security in a real way then it could be a game changer; however, embedded firmware web servers can be tricky to secure and patch compared to larger application focused software stacks.  This is one area what I’m hoping to see a lot of vendor collaboration!  [note: this should be it’s own subject – the security issue is more than API, it’s about system wide configuration.  stay tuned!]

Crowbar lays it all out: RAID & BIOS configs officially open sourced

MediaToday, Dell (my employer) announced a plethora of updates to our open source derived solutions (OpenStack and Hadoop). These solutions include the latest bits (Grizzly and Cloudera) for each project. And there’s another important notice for people tracking the Crowbar project: we’ve opened the remainder of its provisioning capability.

Yes, you can now build the open version of Crowbar and it has the code to configure a bare metal server.

Let me be very specific about this… my team at Dell tests Crowbar on a limited set of hardware configurations. Specifically, Dell server versions R720 + R720XD (using WSMAN and iIDRAC) and C6220 + C8000 (using open tools). Even on those servers, we have a limited RAID and NIC matrix; consequently, we are not positioned to duplicate other field configurations in our lab. So, while we’re excited to work with the community, caveat emptor open source.

Another thing about RAID and BIOS is that it’s REALLY HARD to get right. I know this because our team spends a lot of time testing and tweaking these, now open, parts of Crowbar. I’ve learned that doing hard things creates value; however, it’s also means that contributors to these barclamps need to be prepared to get some silicon under their fingernails.

I’m proud that we’ve reached this critical milestone and I hope that it encourages you to play along.

PS: It’s worth noting is that community activity on Crowbar has really increased. I’m excited to see all the excitement.