SDO 005 - Why You Need a Data Champion - Mona Rakibe
What are your thoughts on getting buy-in for data quality?
A challenge I have experienced repeatedly and heard others struggle with is getting buy-in for solving the root issues of data quality. The data profession constantly hears the mantra of “garbage in, garbage out,” yet we are often left with resolving this garbage downstream— at worst, it’s sometimes not even solved. Many times, data quality is only prioritized as a reactionary response to a fire, such as a model going down or a customer seeing wrong numbers. Thankfully, I think our industry is starting to recognize the importance of data quality as much as the data itself. One of the strongest signals I see is Andrew Ng’s push for data-centric AI, yet how can we get buy-in from key business stakeholders who are not technical? I’m excited to dive into this topic with my interview with Mona Rakibe.
Hear from Mona Rakibe, Co-founder & CEO of Telmai:
Hear from "XYZ" highlights a real-world use case for all of us to learn from. Founders in the data space are often on the edge of data technology and solving some of the trickiest problems. One problem that on the surface seems easy, but the reality is much different is data observability. One would think that having data about data is standard, yet many organizations still struggle with understanding a baseline of their data quality and how it moves throughout their system. Mona Rakibe, CEO and Co-founder of TelmAI, is currently at the forefront of helping companies overcome this challenge.
Being obsessed with data reliability gives you a unique perspective in the data space. What have you seen work in companies that lead to stronger data teams and cultures?
Mona: “That's great question to be honest. I have closely worked with many, many teams specifically in the context of data observability. It's a new way of thinking. It's the new way of understanding the health of your data. Couple of things stand out for companies that are really doing it well.
First is there is an internal champion. This champion can be a data architect, a data product manager, a data engineer, or even CDO. It doesn't have to be top down, it can be bottom up. There's this personality trait where they want to challenge the status quo of enough of dealing with bad data and the firefighting.
We gotta get this tools process and people together to fix this problem. Because this is not a simple problem to be solved, and it cannot be done by tools alone. So it needs a lot of like getting people together, getting the processes in shape and so on.
The second thing that I have noticed work very well is there has to be a good amount of focus and priority around data reliability, and the entire DataOps team is very well aligned on this focus.
One of my favorite team is where the data product manager, the data engineer, the data architect, and the solution architects are pretty much there on every single call around implementation, how data observability is going to be used, what are the priorities, and so on and so forth. And they're getting a very good ROI out of this initiative, which is very well aligned.”
You and your company Telmai were part of the S21 cohort of Y Combinator. What was your main takeaway from that experience in helping you grow your data company?
Mona: “So if you're familiar with Y Combinator, the biggest model of Y Combinator is build something that people want. And throughout the program, there is a tremendous emphasis on this part. It's a very simple message. Build something that people want, but it's very, very deep. And this by want you have to, your product has to constantly give value to people.
So a lot of stuff we were doing came down to focusing on product value. We have been obsessed with getting that customer value and focusing on shrinking the time to value. And now if you tie it back in data ecosystem, it's not very easy. If I had come to you and I said that, "Hey, I have this ML AI based tool, which will improve your data reliability" uh, maybe I'll have your attention, but it won't move the needle for me.
So, what we feel works best for technical people and people who really want to see stuff working is a proof of concept based approach. Where we founders literally worked as solution architects for those customers and walked their journey with them until the point that they were getting constantly value of out of our product.
Right? So all of our initial POCs, which were two to three weeks of POCs, we plugged data observability into production systems where users can actually start seeing the value in their production environment, and then they start getting these aha moments, "Oh, this is awesome. I couldn't have caught this without observability in place."
So this way they're able to quantify the outcome in their world. And in turn, we are building a product that solves real world problems. So this to me was the biggest takeaway from Y Combinator, really working lock step, finding that customer value and finding the repeatability of that value. So all of this, I feel, is a tremendously important point to build a strong business foundation.”
As a technical founder, you constantly straddle being a technical expert and a salesperson. What advice can you give to data teams to help them sell the importance of data quality and infrastructure internally?
Mona: “I remember having this conversation with you, Mark, when I first met, and I think this is one of the most important aspects of what we are building data observability. How do you sell the value of data observability, especially to the technical people?
My number one recommendation is treat data quality itself as a product, not data as a product. I know all of us have heard that a lot data as a product. I'm specifically saying data quality as a product too. So when I say treat data quality as a product, you gotta start thinking of what's your MVP. Which can show the value of what you're doing in like weeks or even like at the most months, right?
Typically when we talk about data quality, people are thinking as an initiative that's going to take months, years, but we don't have to boil the ocean. The moment you start thinking of it as a product, you pick the tools that can show value immediately, and today we have that technology to support those type of faster wins. And then keep showing the incremental value, which makes an internal justification of data quality much more easier.
Now once you do that, the other important aspect is really showing the value of data quality, and initiative like data observability. I generally like to start with what's the cost of not doing anything at all, right? There is always a cost of not doing anything and different teams do it differently, but at the very least, there is an operational cost. If you don't do anything like data observability, you are spending one of your most highly skilled data engineering, data architect, or data product owner roles. Are people investigating, doing root cause analysis and firefighting on data issues, which is definitely causing a lot of operational efficiencies in your system. And often time this is easy to quantify, to show that, okay, there's a lot of time wasted on them.
The other thing that you can easily show is loss opportunity. A lot of time your teams data engineering teams are supposed to do stuff that will show the value out of the product. But in fact, what they are doing is to find issues in the product. So how do you make sure that your teams are focusing on things that are very core and important to your business? It's not a lost opportunity when there's focusing time on building, or building validations and rules and so on and so forth, or even looking investigation of issues.
And the last thing is loss of revenue. So if you look at historic examples, data quality issues are very expensive, a simple human error can cost millions of dollars. So I'll give you an example: someone recently told me that there was an human error, extra zero added to something, which caused an improper 3 million in inventory writeoff. So you bought there was an inventory of produce, which was perishable and there was a loss of revenue there.
At the very least, you can classify your observability initiative and its importance into these three buckets, then push this internally. The first and most important thing, please treat data quality as a product, start with an mvp, pick a low hanging, but most impactful piece. Don't try to monitor all the tables, the whole fidelity. Start with the most highest impactful use case. That will definitely help you get started on this journey.
And a lot of this goes back to my first answer that somebody has to champion this, right? Somebody has to be there to say that, "I'm gonna work on this and this is important for my organization and I'm gonna make sure that we kind of rally up the people, processes and tools to get this started." So that's the champion for the team.”
Person Profile:
Mona Rakibe is the Co-founder & CEO of Telmai. Feel free to connect with her on LinkedIn to learn more about her work. In addition, Mona and her team are extremely passionate about data reliability; check out the free course (Data Quality and Observability Academy) they created to help you learn data quality best practices below!
What are others saying in the DataOps space?
Data Quality and Observability Academy
What: A free learning module to help you start your journey of understanding data quality and observability best practices.
Why: The “Data Quality Indicators” section was extremely helpful for me to share with my team to start more meaningful conversations on how we can start improving data quality.
Who: You have become a “data champion” at your organization and want to improve data quality.
Why it's time for 'data-centric artificial intelligence' | MIT Sloan
What: A brief introduction to Andrew Ng’s emphasis on data-centric AI.
Why: Learn how improving data quality can help organizations reach the full potential of AI.
Who: You want to stay up to date on the latest trends within the data space.
The Existential Threat of Data Quality
What: A warning about the shortcomings of the Modern Data Stack and its impact on data quality.
Why: We should reflect on why the Modern Data Stack has risen to popularity, its implications, and how we can iterate to improve the data architecture many companies have now.
Who: Anyone who has implemented or used the Modern Data Stack and thought, “there has to be a better way.”