Podcast – Ash Young talks Everything in your PC is IoT

Joining us this week is Ash Young, Chief Evangelist of Cachengo and OPNFV Ambassador. Cachengo builds smart, predictive storage for machine learning.

NOTE – We had a microphone problem that is solved at the 9 minute 19 second mark of the podcast. Start there if you find the clicking noise an issue

Highlights

  • 1 min 34 sec: Time to Change Basic Storage Architecture
    • Converged Protocol Appliances & Nothing has changed form early 90s
  • 7 min 8 sec: Sounds like Hadoop?
    • Underlying hardware still used proprietary protocols
  • 9 min 19 sec: Single Drive Cluster – it’s built?
    • 24 Servers and 24 Drives in a 1U ; has done 48 drives
    • Working on a new design for 96 drives in a 1U
  • 11 min 52 sec: Truly a Distributed Storage Array
    • Storage focused microservers
  • 13 min 24 sec: Limitations in Operations with Hardware
    • Hinders Innovation
  • 15 min 40 sec: Lessons Learned on Managing Devices
    • Over-dependence on tunneling protocols requiring full networking (e.g. VPN)
    • Move to peer-to-peer network slicing
  • 17 min 28 sec: Software Defined Networking Topology
    • Introduce devices to each other and get out of the way
  • 18 min 33sec: Every Storage Node is Part of the Network
    • Moves into a world of networking challenges
    • Ipv4 cannot support this model
  • 21 min 06 sec: Networking Magic in the Model
    • Peer to Peer w/ Broker Introduction and then Removal from Traffic
    • Scale out for Edge Computing Requires this New Model
    • 5G Energy Cost Savings are a Must
  • 27 min 28 sec: Issues of Powering On/Off Machines to Save Money
    • Creating a massive array of smaller GPUs for Machine Learning
    • Build a fast, cheap, lower power storage system to get started in the model
  • 34 min 09 sec: Doesn’t fit the model that Edge infrastructure will be Cloud patterned
    • Rob makes a point to listeners to consider various ideas in future Edge infrastructure
  • 36 min 48 sec: State of Open Source?
    • Consortium’s and open source standards
    • Creating the lowest common denominator free thing so competitors can build differentiation on top of it for revenue
    • Not a fan of open core models
  • 41 min 44 sec: Does Open Source include Supporting Implementation?
    • Look at the old WINE project financing
    • You can’t just deploy people onsite for free<
  • 48 min 24 sec: Wrap-Up

Podcast Guest: Ash Young,Chief Evangelist of Cachengo

Technology leader with over 20 years experience, primarily in storage. Created the first open source NAS (network attached storage) stack, the first unified block/file storage stack for Linux, the first storage management software, and the list goes on.

Since 2012, I have been heavily involved in NFV (Network Functions Virtualization). I wrote a bunch of the standards and was editor for the Compute/Storage Domain in the Infrastructure Working Group for NFV. And then I started up the open source effort to close the gaps for achieving our vision of the NFVI. This was the precursor to OPNFV.

The best way to understand what I do is to imagine being a high-level marketing exec who comes up with a whiz bang product and business idea, including business plan, competitive analysis, MRD, everything, but now comes the hand-off with your engineering organization, only to hear a litany of nos. Well, I got tired of being told “No, it can’t be done” or “No, we don’t know how to do it”, so I started doing it myself. I call this skill “Rapid Prototyping”, and over the years I have found it to be a very missing gap in the product development process. When Marketing comes up with ideas, we need a way to very efficiently validate the technology and business concepts before we commit to a lengthy engineering cycle.

I’m just one person, working in a company of over 180,000 people and in a very dynamic industry. My ability to get creative and to influence businesses is never a dull moment; and I will probably be 100 years old and still writing open source software.

Week in Review: Operational Paralysis is Real

Welcome to the RackN and Digital Rebar Weekly Review. You will find the latest news related to Edge, DevOps, SRE and other relevant topics.

Mobilize your Ops Team Against Operational Paralysis  

Many IT departments struggle with keeping “the lights on” as legacy hardware and software consume significant resources preventing the team from taking advantage of new technologies to modernize their infrastructure. These legacy issues not only consume resources but also cause challenges to find qualified experts to keep them operational as the older the technology the less likely to find experienced support.

Freezing older technology in place without capable support or an understanding of how the product works is certainly not an industry best practice; however, it is commonly accepted in many large IT organizations. RackN has built a single, open source platform to manage not just new technologies but also legacy services allowing IT teams to actively engage the older technology without fear.

Full Post


News

RackN

Digital Rebar Community

L8ist Sh9y Podcast

Social Media

Mobilize your Ops Team Against Operational Paralysis

Many IT departments struggle with keeping “the lights on” as legacy hardware and software consume significant resources preventing the team from taking advantage of new technologies to modernize their infrastructure. These legacy issues not only consume resources but also cause challenges to find qualified experts to keep them operational as the older the technology the less likely to find experienced support. Even worse, new employees are typically not interested in working on old technology while the IT press obsesses on what comes next.

Freezing older technology in place without capable support or an understanding of how the product works is certainly not an industry best practice; however, it is commonly accepted in many large IT organizations. RackN has built a single, open source platform to manage not just new technologies but also legacy services allowing IT teams to actively engage the older technology without fear.

Issue: Expertise & the Unknown

  • Existing Infrastructure – legacy technology abounds in modern enterprise infrastructure with few employees capable of maintaining
  • State of the Art vs the Past – new employees are experienced in the latest technology and not interested in working on legacy solutions

Impact: Left Behind

  • Stuck in the Past – IT teams are unwilling to touch old technology that just works
  • Employee Exodus – limited future for employees maintaining the past

RackN Solution: Stagnation to Action

  • Operations Excellence – RackN’s foundational management ensures IT can operate services regardless of platform (e.g. data center, public cloud, etc)
  • Operational Paralysis – RackN delivers a single platform for IT to single platform capable of supporting existing solutions, newly arriving technologies as well as prepare for future innovation down the road.

The RackN team is ready to unlock your operational potential by preventing paralysis:

Series Intro: A Focus on Sustaining Operations

When discussing the data center of the future, it’s critical that we start by breaking the concept of the data center as a physical site with guarded walls, raised floors, neat rows of servers and crash cart pushing operators. The Data Center of 2020 (DC2020) is a distributed infrastructure comprised of many data centers, cloud services and connected devices.

The primary design concept of DC2020 is integrated automation not actual infrastructures.

As an industry, we need to actively choose implementations that unify our operational models to create portability and eliminate silos. This means investing more in sustaining operations (aka Day 2 Ops) that ensure our IT systems can be constantly patched, updated and maintained. The pace of innovation (and discovered vulnerabilities!) requires that we build with the assumption of change. DC2020 cannot be “fire and forget” building that assumes occasional updates.

There are a lot of disruptive and exciting technologies entering the market. These create tremendous opportunities for improvement and faster innovation cycles. They also create significant risk for further fragmenting our IT operations landscape in ways to increase costs, decrease security and further churn our market.

It is possible to be for both rapid innovation and sustaining operations, but it requires a plan for building robust automation.

The focus on tightly integrated development and operations work is a common theme in both DevOps and Site/System Reliability Engineering topics that we cover all the time. They are not only practical, we believe they are essential requirements for building DC2020.

Over this week, I’m going to be using the backdrop of IBM Think to outline the concepts for DC2020. I’ll both pull in topics that I’m hearing there and revisit topics that we’ve been discussing on our blogs and L8ist Sh9y podcast. Ultimately, we’ll create a comprehensive document: for now, we invite you to share your thoughts about this content in it’s more raw narrative form. 

Raise your Data Center above the Clouds

As more and more companies move workloads and storage to public clouds, CIOs are forced to account for existing data center investments. Simply abandoning existing infrastructure is not an option and IT teams need to find new methodologies to fully use their available resources.   

Of course, legacy services continue to be served in these data centers; however there is ample opportunity for new technology to leverage the compute power by selecting a foundational provision and control solution from RackN. Our solution brings cloud management tooling found in public clouds to your data centers enabling IT teams the option of keeping resources in house.   

Issue :  Data Center as Clouds

  • Public Clouds – Shadow IT and leverage of public clouds have exposed the shortcomings of IT’s ability to deliver services in a timely manner for business needs
  • Data Center Sunk Costs – CIOs cannot simply leave existing data centers under-utilized with only legacy services and must enhance the operational skills of the IT team  

Impact : Data Center Investment

  • Data Center ManagementMany services cannot be moved into a public cloud and IT teams must enhance their skills to maximize their available resources in-house
  • Automated Provision/Control – Standardizing your infrastructure foundation with RackN allows IT to manage all platforms including data centers and public clouds at scale and securely

RackN Solution : Data Center Efficiency

  • Operations Excellence – RackN’s foundational management ensures IT can operate services regardless of platform (e.g. data center, public cloud, etc)
  • Operational Improvement – RackN delivers a single platform for IT to leverage across deployment vehicles as well as ensure IT team efficiency across services

The RackN team is ready to start you on the path to operations excellence:

Standardize your Operational Chaos for Provisioning Bliss

A common side-effect of rapid growth for any organization is the introduction of complexity and one-off solutions to keep things moving regardless of the long-term impact. Over time, these decisions add up to create a chaotic environment for IT teams who find themselves unable to find an appropriate time to stop and reset.  

IT operations teams also struggle in this environment as management knowledge for all these technologies are not often shared appropriately and it is common to have only 1 operator capable of supporting specific technologies. Obviously, enterprises are at great risk when knowledge is not shared and there is no standard process across a team.

Issue :  Infrastructure Management

  • One-Off Operations – Customized operation tooling per service leads to team dysfunction as operators cannot support each due to inexperience with unique tools
  • IT Productivity – Data centers struggle to meet business needs with no standard process or tools; cloud platforms expose this deficiency causing business to go shadow IT

Impact : Delivery Times

  • Costly and Slow – Many data centers operate with dated processes and tools causing significant delays in new service rollout as well as maintaining existing services
  • Cross Platform Support IT teams MUST maintain control over company services by supporting internal data centers as well as cloud deployments from a single platform  

RackN Solution : Global Standard

  • Operations Excellence – RackN’s foundational management ensures IT can operate services regardless of platform (e.g. data center, public cloud, etc)
  • Operational Standardization – RackN delivers a single platform for IT to leverage across deployment vehicles as well as ensure IT team efficiency across services

The RackN team is ready to start you on the path to operations excellence:

Take part in the Digital Rebar Community

Shine a Light into the Darkness to Regain Control of your IT Services

Internal business units continue to bypass traditional IT in many organizations creating shadow IT leaving corporate data unsecured, networks exposed through unknown entry points, and the possibility of wasting IT resources by paying for services already provided by the company. CIOs must regain control of their IT sprawl to ensure security, resource allocation, and operational control of the business.  

RackN offers IT leaders a new way forward to take back control of their services by establishing a solid foundation capable of managing internal data centers, external hosting services, public clouds, and even the upcoming edge infrastructure opportunity.

Issue : IT & Business Conflict on IT Service Delivery and Ownership

  • Shadow IT – corporate services (internal and external) are delivered without the knowledge or control of the IT department  
  • Public Clouds – services are delivered via 3rd party hardware that may have noisy neighbors, unknown geographic location, and other issues requiring IT management

Impact : Loss of Operational Control of IT

  • IT Business Alignment – IT must engage with the overall business as more than just a delivery mechanism. IT is now a strategic MUST at the executive table  
  • Operational Failure – Without operational control and management companies are highly vulnerable to attacks and an inability to deliver on customer needs   

RackN Solution : Operational Flexibility & Excellence for Service Delivery

  • Operations Excellence – RackN’s foundational management ensures IT can operate services regardless of platform (e.g. data center, public cloud, etc)
  • Revitalize the Data Center – RackN delivers cloud-like operational capabilities for existing data centers offering internal business units less reason to create shadow IT

The RackN team is ready to start you on the path to operations excellence:

SRE Thinking : Reframing Dev + Ops

Last month, Eric Wright and I were able to complete a discussion the inspired my guest post for CapitalOne “How Platforms and SREs Change the DevOps Contract.” While our conversation ranged widely over the challenges of building and integration of IT processes, the key message is simple: we need to make investments in operations.

This podcast explains why I’ve been using Site Reliability Engineering (SRE) as a proxy for this DevOps inspired rethinking of operations.

I hope you’ll take the time to listen to this deep conversation about very real IT issues. Eric and I are not shy about expressing our opinions, but we’re also anti-shaming. The simple reality is that building infrastructure is hard and we all make difficult choices. My hope is that we can start sharing the fixes and helping each other out.

Podcast Episode 50 – SRE Revisited plus the Challenges of Ops and more with Rob Hirschfeld (@zehicle) 

Do these topics inspire you? Creating data center automation for SREs is our mission at RackN. We believe that well run infrastructure requires building APIs from the ground up and keeping them simple. I hope that you’ll take 5 minutes to try our latest offering, Digital Rebar Provision and join us on the quest drive excellence in operations.

 

Ops integration will be scary, proceed with haste!

TL;DR: Your own tool silos (and the teams supporting them) are blocking your progress.

As CEO of RackN, I talk to a lot of operations teams who have big aspirations for automation that are faltering due to internal resistance.  Generally, we’re talking to the SREs on the team.  Sadly, those SREs are often stymied by narrowly scoped teams and house-of-cards technical debt.

Last week, I examined some of my DevOps scar tissue and tweeted:  “consider, ops integration will be scary – you have to give up control of individual actions and silos.  it’s hard to give up control”

Screenshot_2017-06-24-10-08-12

The tweet seemed to strike a nerve with others because change and control are so often at war.  It was based on a recurring theme that the RackN team sees from ops organizations: antibodies towards integrated solutions in favor of DIY projects combining disparate tools.  

It makes sense to me that operators want a sense of control and ownership; however, those same motivations are counter to the automation imperative that should be driving them forward.  Patching together a solution today is adding technical debt that becomes insurmountable when used in production.

This challenge is why so much DevOps content is targeted at organization culture instead of tools.  While this is clearly the root, I also think that our tools are not designed to work together as a system.  The fact that teams prefer it that was is as key part of the problem.

Let’s do ourselves a favor – let’s take the time to solve operations issues at the system level like we’ve been trying to do with Digital Rebar.  We’ll all move faster together.

 

 

What makes ops hard? SRE/DevOps challenge & imperative [from Cloudcast 301]

TL;DR: Operators (DevOps & SREs) have a hard job, we need to make time and room for them to redefine their jobs in a much more productive way.

Cloudcast-Logo-2015-Banner-BlueThe Cloudcast.net by Brian Gracely and Aaron Delp brings deep experience and perspective into their discussions based on their impressive technology careers and understanding of the subject matter.  Their podcasts go deep quickly with substantial questions that get to the heart of the issue.  This was my third time on the show (previous notes).

In episode 301, we go deeply into the meaning and challenges for Site Reliability Engineering (SRE) functions.  We also cover some popular technologies that are of general interest.

Author’s Note; For further information about SREs, listen to my discussion about “SRE vs DevOps vs Cloud Native” on the Datanauts podcast #89.  (transcript pending)

Here are my notes from Cloudcast 301. with bold added for emphasis:

  • 2:00 Rob defines SRE (more resources on RackN.com site).
    • 2:30 Google’s SRE book gave a name, even changed the definition, to what I’ve been doing my whole career. Evolved name from being just about sites to a full system perspective.  
    • 3:30 SRE and DevOps are aligned at the core.  While DevOps is about process and culture, SRE is more about the function and “factory.”
    • 4:30 Developers don’t want to be shoving coal into the engine, but someone, SREs, have to make sure that everything keeps running
  • 5:15 Brian asks about impedance mismatch between Dev and Ops.  How do we fix that?
    • 6:30 Rob talks about the crisis brewing for operations innovation gap (link).  Digital Rebar is designed to create site-to-site automation so Operators can share repeatable best practices.
    • 7:30 OpenStack ran aground because Operators because we never created a the practices that could be repeated.  “Managed service as the required pattern is a failure of building good operational software.”
    • 8:00 RackN decomposes operations into isolated units so that individual changes don’t break the software on top

  • 9:20 Brian talks about the increasing rate of releases means that operations doesn’t have the skills to keep up with patching.
    • 10:10 That’s “underlay automation” and even scarier because software is composited with all sorts of parts that have their own release cycles that are not synchronized.
    • 11:30 We need to get system level patch/security.update hygiene to be automatic
    • 12:20 This is really hard!

  • 13:00 Brian asks what are the baby steps?
    • 13:20 We have to find baby steps where there are nice clean boundaries at every layer from the very most basic.  For RackN, that’s DHCP and PXE and then upto Kubernetes.
    • 15:15 Rob rants that renaming Ops teams as SRE is a failure because SRE has objectives like job equity that need to be included.
    • 16:00 Org silos get in the way of automation that have antibodies that make it difficult for SREs and DevOps to succeed.
    • 17:10 Those people have to be empowered to make change
    • 17:40 The existing tools must be pluggable or you are hurting operators.  There’s really no true greenfield, so we help people by making things work in existing data centers.
    • 19:00 Scripts may have technical debt but that does not mean they should just be disposed.
    • 19:20 New and shiney does not equal better.  For example, Container Linux (aka CoreOS) does not solve all problems.  
    • 20:10 We need to do better creating bridges between existing and new.
    • 20:40 How do we make Day 2 compelling?

  • 21:15 Brian asks about running OpenStack on Kubernetes.
    • 22:00 Rob is a fan of Kubernetes on Metal, but really, we don’t want metal and vms to be different.  That means that Kubernetes can be a universal underlay which is threatening to OpenStack.
    • 23:00 This is no longer a JOKE: “Joint OpenStack Kubernetes Environments”
    • 23:30 Running things on Kubernetes (or OpenStack) is great because the abstractions hide complexity of infrastructure; however, at the physical layer you need something that exposes that complexity (which is what RackN does).

  • 25:00 Brian asks at what point do you need to get past the easy abstractions
    • 25:30 You want to never care ever.  But sometimes you need the information for special cases.
    • 26:20 We don’t want to make the core APIs complex just to handle the special cases.
    • 27:00 There’s still a class of people who need to care about hardware.  These needs should not be embedded into the Kubernetes (or OpenStack) API.

  • 28:00 Brian summarizes that we should not turn 1% use cases into complexity for everyone.  We need to foster the skill of coding for operators
    • 28:45 For SREs, turning Operators into coding & automation is essential.  That’s a key point in the 50% programming statement for SREs.
    • In the closing, Rob suggested checking out Digital Rebar Provision as a Cobbler replacement.

We’re very invested in talking about SRE and want to hear from you! How is your company transforming operations work to make it more sustainable, robust and human?We want to hear your stories and questions.