Podcast – Rob Lalonde on HPC in the Cloud, Machine Learning and Autonomous Cars

Joining us this week is Rob Lalonde, VP & General Manager, Navops at Univa.

About Univa

Univa is the leading independent provider of software-defined computing infrastructure and workload orchestration solutions.

Univa’s intelligent cluster management software increases efficiency while accelerating enterprise migration to hybrid clouds. We help hundreds of companies to manage thousands of applications and run billions of tasks every day.

 Highlights

  • 1 min 6 sec: Introduction of Guest
  • 1 min 43 sec: HPC in the Cloud?
    • Huge migration of workloads to public clouds for additional capacity
    • Specialized resources like GPUs, massive memory machines, …
  • 3 min 29 sec: Cost perspective of cloud vs local HPC hardware
    • Primarily a burst to cloud model today
  • 5 min 10 sec: Good for machine learning or analytics?
  • 5 min 40 sec: What does Univa and Navops do?
    • Cloud cluster automation
  • 7 min 35 sec: Role of Scheduling
    • Job layer & infrastructure layer
    • Diversity of jobs across organizations
  • 9 min 30 sec: Machine learning impact on HPC
    • Survey of Users ~ Results
      • Machine learning not yet in production ~ still research
      • HPC very much linked to machine learning
      • Cloud and Hybrid cloud usage is very high
      • GPUs usage for machine language
    • 15 min 09 sec: GPU discussion
      • Similar to early cloud stories
    • 16 min 00 sec: Concurrency in operations in HPC & machine learning
      • Workload dependency ~ weather modeling
    • 18 min 12 sec: People bring workloads in-house after running in cloud?
      • Sophistication in what workloads work best where
      • HPC is very efficient ~ 1 Million Cores on Amazon : Successful when AWS called about taking all their resources for other customers 🙂
    • 23 min 56 sec: Autonomous cars discussion
      • Processing in the car or offloaded?
      • Oil and Gas exploration example (Edge Infrastructure example)
        • Pre-process data on ship then upload via satellite to find information required
      • 29 min 12 sec: Is Kubernetes in the HPC / Machine Learning world?
        • KubeFlow project
      • 35 min 8 sec: Wrap-Up

Podcast Guest:  Rob Lalonda, VP & General Manager, Navops

Rob Lalonde brings over 25 years of executive management experience to lead Univa’s accelerating growth and entry into new markets. Rob has held executive positions in multiple, successful high tech companies and startups. He possesses a unique and multi-disciplined set of skills having held positions in Sales, Marketing, Business Development, and CEO and board positions. Rob has completed MBA studies at York University’s Schulich School of Business and holds a degree in computer science from Laurentian University.

DC2020: Mono-clouds are easier! Why do Hybrid?

Background: This post was inspired by a mult-cloud session session at IBM Think2018 where I am attending as a guest of IBM. Providing hybrid solutions is a priority for IBM and it’s customers are clearly looking for multi-cloud options. In this way, IBM has made a choice to support competitive platforms. This post explores why they would do that.

There is considerable angst and hype over the terms multi-cloud and hybrid-cloud. While it would be much simpler if companies could silo into a single platform, innovation and economics requires a multi-party approach. The problem is NOT that we want to have choice and multiple suppliers. The problem is that we are moving so quickly that there is minimal interoperability and minimal efforts to create interoperability.

To drive interoperability, we need a strong commercial incentive to create an neutral ecosystem.

Even something with a clear ANSI standard like SQL has interoperability challenges. It also seems like the software industry has given up on standards in favor of APIs and rapid innovation. The reality on the ground is that technology is fundamentally heterogeneous and changing. For this reason, mono-anything is a myth and hybrid is really status quo.

If we accept multi-* that as a starting point, then we need to invest in portability and avoid platform assumptions when we build automation. Good design is to assume change at multiple points in your stack. Automation itself is a key requirement because it enables rapid iterative build, test and deploy cycles. It is not enough to automate for day 1, the key to working multi-* infrastructure is a continuous deployment pipeline.

Pipelines provide insurance for hybrid infrastructure by exposing issues quickly before they accumulate technical debt.

That means the utility of tools like Terraform, Ansible or Docker is limited to how often you exercise them. Ideally, we’d build abstraction automation layers above these primitives; however, this has proven very difficult in practice. The degrees of variation between environments and pace of innovation make it impossible to standardize without becoming very restrictive. This may be possible for a single company but is not practical for a vendor trying to support many customers with a single platform.

This means that hybrid, while required in the market, carries an integration tax that needs to be considered.

My objective for discussing Data Center 2020 topics is to find ways to lower that tax and improve the outcome. I’m interested in hearing your opinion about this challenge and if you’ve found ways to solve it.

Counterpoint Addendum: if you are in a position to avoid multi-* deployments (e.g. a start-up) then you should consider that option. There is measurable overhead of heterogeneous automation; however, I’ve found the tipping point away from a mono-stack can be surprising low and committing to a vertical stack does make applications less innovation resilient.