SDO 001 - What is DataOps? - Christopher Berg
What are your thoughts on DataOps?
Between MLOps and DataOps, it may seem like X-Ops is the new trend in the data space— but make no mistake— these concepts are rooted in years of data practitioners sharing how they solved real data problems within businesses. DataOps was first coined by Lenny Liebmann in 2014 as “the discipline that ensures alignment between data science and infrastructure.” Though the Modern Data Stack has taken up the most mindshare since then, the pain points presented by DataOps have only grown to show its necessity. Specifically, the recent rise of data engineering and the push toward data-centric AI signals DataOp's importance in delivering high-quality data to customers and internal business stakeholders.
Hear from Christopher Berg, CEO, Founder & Head Chef of DataKitchen:
Hear from "XYZ" highlights real-world use cases for all of us to learn best practices and upcoming trends within the DataOps space. When I asked my network who I needed to talk to regarding DataOps, Christopher Berg was repeatedly mentioned. It became clear, after looking further into his profile and reading the DataOps Manifesto, that Christopher is leading the charge for the DataOps movement.
What is DataOps, and why should organizations care about it?
Christopher: “Yeah, I guess kind of two related reasons. One is like if you actually do the work in data science or data engineering, your job sort of sucks, honestly, because you spend a lot of time doing things that aren't really delivering value. You, you have a lot of failure. A lot of people are quitting and being upset.
We did a survey with 700 data engineers with data.world last year, and 78% of data engineers were so stressed they wanted a therapist on their job, and so you're caught between this kind of, ‘I gotta work really hard,’ ‘I'm always behind,’ ‘my customers always want new stuff,’ and then it's breaking left and right.
And so your life sort of sucks in a lot of ways. If you go on the other side, like you look at the people who are really trying to influence with data, right? Business people, maybe people on your website, they're dissatisfied as well. They want more. They don't understand it. And so there's all this potential around data and all this potential to sort of help and change.
And the people who are kind of working to make that happen are very unhappy and, and the results aren't there. Most projects fail, most models don't get in the production. And so I think that's really what we're trying to address with DataOps is that sort of failure and pain that people have doing their work.”
An organization’s data maturity falls on a spectrum, at what stage in the data maturity journey should an organization start taking DataOps seriously?
Christopher: “It depends on how much you like being a hero. Like all the people say, ‘wow, you worked all weekend,’ ‘you're amazing,’ ‘I love you.’ And like, how long can you be a hero? Right? And, and what happens with people is if they don't start thinking about building a system to make their life easier where they can test and automate and observe and iterate, they get burnt out and then they end up honestly quitting.
And so we all wanna do the cool things, but if you're doing the cool thing as a hero, no matter if your team is big or small, no matter if you're at the start of a project or the end, you're creating problems. For me, when I was younger, I spent a lot of time kind of being the hero and the project, and I left a lot of sort of hair balls for other people to pick up, and it's not cool.
You write a bunch of code, it's untested, it's un-version control, you change it right on production. Every software engineer goes, ‘ew,’ and we do the same thing in data, but no one goes ‘ew.’ And so I just want everyone to collectively go ‘ew, you, you're hair balling it.’ And that's my quest, to get the entire data community just start doing that collective ‘ew’ that software engineers do.”
What's something that you believe the broader data industry is missing about DataOps, but should care more about?
Christopher: “Well, I think it's observability led DataOps. So I've come to believe that we are, as a company DataKitchen, not gonna change right now how people build things. They've already built things, they're already being heroes. I've been saying the same message now for six years and the world hasn't changed.
And so what I'm saying first is observe what's happening with your system. Stick little thermometers at various points in the process and measure is it running, is the data right, is the model still predicting, and centralize that information. And you're gonna be really surprised at what you see.
That information, that data is gonna drive your behavior change. If you can measure errors, either from poor code that's getting in the production or poor data that's getting into your system, if you can see that you're late, if you can see if your system's utilized. That source of information is actually really insightful.
So observability first, get the information, stick a bunch of thermometers all over your data pipelines, your models, your viz, measure all that stuff. Then look at the data and say, ‘where are the bottleneck, where are the errors? Let's just automate them.’
And to me, I think that's gonna be the gospel we're talking about first because people are gonna continue to hero out and build these systems and they're in production, they don't wanna change them, they're afraid to change them. And so once you stick their monitors in, once you start saying the problem, you'll say ‘oh wow, maybe I should put some automation on it, maybe I should develop some more unit tests or system tests, maybe I should figure out deployment and version control.’ All these things will start to come.”
Person Profile:
Christopher Bergh is the CEO and Head Chef at DataKitchen. Chris has more than 25 years of research, software engineering, data analytics, and executive management experience. At various points in his career, he has been a COO, CTO, VP, and Director of engineering. Chris has an M.S. from Columbia University and a B.S. from the University of Wisconsin-Madison.
Chris is a recognized expert on DataOps. He is the co-author of the ‘DataOps Cookbook” and the “DataOps Manifesto,” and a speaker on DataOps at many industry conferences. Chris began his career at the Massachusetts Institute of Technology's Lincoln Laboratory and NASA Ames Research Center. There he created software and algorithms that provided aircraft arrival optimization at several major airports in the United States. Chris served as a Peace Corps Volunteer Math Teacher in Botswana, Africa.
What are others saying in the DataOps space?
DataOps: Why Big Data Infrastructure Matters - Lenny Liebmann
What: An early warning of the limitations of big data only focused on data science.
Why: See the origins of the DataOps movement as defined by Lenny Liebmann.
Who: You like to understand how we decided on specific frameworks within technology.
What: A reflection of DataOps for the past seven years regarding the trends driving the movement and the difference between DataOps and DevOps,
Why: The author, Andy Palmer, was one of the earlier voices in popularizing DataOps.
Who: You are trying to understand why DataOps is important and how it compares to DevOps.
Download The DataOps Cookbook | DataKitchen
What: A ~200-page, in-depth book of “methodologies and tools that reduce analytics cycle time while improving quality.”
Why: An excellent reference to guide your journey into DataOps.
Who: You have moved beyond learning about DataOps and want to start implementing DataOps.