DC2020: Putting the Data back in the Data Center

For the past two decades, data centers have been more about compute than data, but the machine learning and IoT revolutions are changing that focus for the 2020 Data Center (aka DC2020). My experience at IBM Think 2018 suggests that we should be challenging our compute centric view of a data center; instead, we should be considering the flow and processing of data. Since data is not localized, that reinforces our concept of DC2020 as a distributed and integrated environment.

We have defined data centers by the compute infrastructure stored there. Cloud (especially equated with virtualized machines) has been an infrastructure as a service (IaaS) story. Even big data “lakes” are primary compute clusters with distributed storage. This model dominates because data sources locked in application silos control of the compute translates directly to control of the data.

What if control of data is being decoupled from applications? Data is becoming it’s own thing with new technologies like machine learning, IoT, blockchain and other distributed sourcing.

In a data centric model, we are more concerned with movement and access to data than building applications to control it. Think of event driven (serverless) and microservice platforms that effectively operate on data-in-flight. It will become impossible to actually know all the ways that data is manipulated as function as a service progresses because there are no longer boundaries for applications.

This data-centric, distributed architecture model will be even more pronounced as processing moves out of data centers and into the edge. IT infrastructure at the edge will be used for handling latency critical data and aggregating data for centralization. These operations will not look like traditional application stacks: they will be data processing microservices and functions.

This data centric approach relegates infrastructure services to a subordinate role. We should not care about servers or machines except as they support platforms driving data flows.

I am not abandoning making infrastructure simple and easy – we need to do that more than ever! However, it’s easy to underestimate the coming transformation of application architectures based on advanced data processing and sharing technologies. The amount and sources of data have already grown beyond human comprehension because we still think of applications in a client-server mindset.

We’re only at the start of really embedding connected sensors and devices into our environment. As devices from many sources and vendors proliferate, they also need to coordinate. That means we’re reaching a point where devices will start talking to each other locally instead of via our centralized systems. It’s part of the coming data avalanche.

Current management systems will not survive explosive growth. We’re entering a phase where control and management paradigms cannot keep up.

As an industry, we are rethinking management automation from declarative (“start this”) to intent (“maintain this”) focused systems. This is the simplest way to express the difference between OpenStack and Kubernetes. That change is required to create autonomous infrastructure designs; however, it also means that we need to change our thinking about infrastructure as something that follows data instead of leads it.

That’s exactly what RackN has solved with Digital Rebar Provision. Deeply composable, simple APIs and extensible workflows are an essential component for integrated automation in DC2020 to put the data back in data center.

Rob Hirschfeld

On Computing, Containers, Cloud & Tech Culture

DC2020: Putting the Data back in the Data Center

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply