data | Rob Hirschfeld

For the past two decades, data centers have been more about compute than data, but the machine learning and IoT revolutions are changing that focus for the 2020 Data Center (aka DC2020). My experience at IBM Think 2018 suggests that we should be challenging our compute centric view of a data center; instead, we should be considering the flow and processing of data. Since data is not localized, that reinforces our concept of DC2020 as a distributed and integrated environment.

We have defined data centers by the compute infrastructure stored there. Cloud (especially equated with virtualized machines) has been an infrastructure as a service (IaaS) story. Even big data “lakes” are primary compute clusters with distributed storage. This model dominates because data sources locked in application silos control of the compute translates directly to control of the data.

What if control of data is being decoupled from applications? Data is becoming it’s own thing with new technologies like machine learning, IoT, blockchain and other distributed sourcing.

In a data centric model, we are more concerned with movement and access to data than building applications to control it. Think of event driven (serverless) and microservice platforms that effectively operate on data-in-flight. It will become impossible to actually know all the ways that data is manipulated as function as a service progresses because there are no longer boundaries for applications.

This data-centric, distributed architecture model will be even more pronounced as processing moves out of data centers and into the edge. IT infrastructure at the edge will be used for handling latency critical data and aggregating data for centralization. These operations will not look like traditional application stacks: they will be data processing microservices and functions.

This data centric approach relegates infrastructure services to a subordinate role. We should not care about servers or machines except as they support platforms driving data flows.

I am not abandoning making infrastructure simple and easy – we need to do that more than ever! However, it’s easy to underestimate the coming transformation of application architectures based on advanced data processing and sharing technologies. The amount and sources of data have already grown beyond human comprehension because we still think of applications in a client-server mindset.

We’re only at the start of really embedding connected sensors and devices into our environment. As devices from many sources and vendors proliferate, they also need to coordinate. That means we’re reaching a point where devices will start talking to each other locally instead of via our centralized systems. It’s part of the coming data avalanche.

Current management systems will not survive explosive growth. We’re entering a phase where control and management paradigms cannot keep up.

As an industry, we are rethinking management automation from declarative (“start this”) to intent (“maintain this”) focused systems. This is the simplest way to express the difference between OpenStack and Kubernetes. That change is required to create autonomous infrastructure designs; however, it also means that we need to change our thinking about infrastructure as something that follows data instead of leads it.

That’s exactly what RackN has solved with Digital Rebar Provision. Deeply composable, simple APIs and extensible workflows are an essential component for integrated automation in DC2020 to put the data back in data center.

Today at Dell, I was presenting to our storage teams about cloud storage (aka the “storage banana”) and Dave “Data Gravity” McCrory reminded me that I had not yet posted my epiphany explaining “why cloud compute will be free.” This realization derives from other topics that he and I have blogged but not stated so simply.

Overlooking that fact that compute is already free at Google and Amazon, you must understand that it’s a cloud eat cloud world out there where losing a customer places your cloud in jeopardy. Speaking of Jeopardy…

Answer: Something sought by cloud hosts to make profits (and further the agenda of our AI overlords).

Question: What is lock-in?

Hopefully, it’s already obvious to you that clouds are all about data. Cloud data takes three primary forms:

Data in transformation (compute)
Data in motion (network)
Data at rest (storage)

These three forms combine to create cloud architecture applications (service oriented, externalized state).

The challenge is to find a compelling charge model that both:

Makes it hard to leave your cloud AND
Encourages customers to use your resources effectively (see #1 in Azure Top 20 post)

While compute demands are relatively elastic, storage demand is very consistent, predictable and constantly grows. Data is easily measured and difficult to move. In this way, data represents the perfect anchor for cloud customers (model rule #1). A host with a growing data consumption foot print will have a long-term predictable revenue base.

However, storage consumption along does not encourage model rule #2. Since storage is the foundation for the cloud, hosts can fairly judge resource use by measuring data egress, ingress and sidegress (attrib @mccrory 2/20/11). This means tracking not only data in and out of the cloud, but also data transacted between the providers own cloud services. For example, Azure changes for both data at rest ($0.15/GB/mo) and data in motion ($0.01/10K).

Consequently, the financially healthiest providers are the ones with most customer data.

If hosting success is all about building a larger, persistent storage footprint then service providers will give away services that drive data at rest and/or in motion. Giving away compute means eliminating the barrier for customers to set up web sites, develop applications, and build their business. As these accounts grow, they will deposit data in the cloud’s data bank and ultimately deposit dollars in their piggy bank.

However, there is a no-free-lunch caveat: free compute will not have a meaningful service level agreement (SLA). The host will continue to charge for customers who need their applications to operate consistently. I expect that we’ll see free compute (or “spare compute” from the cloud providers perspective) highly used for early life-cycle (development, test, proof-of-concept) and background analytic applications.

The market is starting to wake up to the idea that cloud is not about IaaS – it’s about who has the data and the networks.

Oh, dem golden spindles! Oh, dem golden spindles!

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

Tag Archives: data

DC2020: Putting the Data back in the Data Center