SDO Rewind - Navigating the Complexity of Healthcare - Ben Doremus
A lot has changed since this episode was originally released in October 2022. Specifically, Ben has been promoted from VP to CTO of Magenta Care Continuum, where he s continuing his push for high-quality healthcare data! If you want to hear his perspective from the CTO angle, then I highly encourage checking out this panel I did with him as well:
— Mark
What are your thoughts on healthcare data?
Why is healthcare so slow in adopting AI and precision medicine within the US? There are a multitude of reasons— policy, high regulation, etc.— but the largest challenge is data quality. In my first data science role, I worked with a registry dataset with 80% of all ophthalmology records in the US, and it was one of the hardest datasets I had ever navigated. This startup had an exceptional data model and an exceptional data engineering team… but we didn’t control the source data. Since US healthcare is so fractured, every hospital (even different departments) has different electronic health record software (e.g. EPIC). Even within the same software, they would have different configurations for collecting data. Scale that up to any data product, and you are going to be slowed down by a myriad of data quality issues. If we want to advance data in healthcare, we have to drastically improve data quality, and it’s why I’m so obsessed with DataOps.
Hear from Ben Doremus, VP Technical Operations at Magenta Care Continuum:
Hear from "XYZ" highlights real-world use cases for all of us to learn from. I often joke with my friends that you must be a masochist to enjoy working with healthcare data due to the messiness and complexity. It has everything: high regulation, high stakes of improving or harming lives, and high domain knowledge, all collected by electronic health records that are universally hated by doctors entering the data. One person who has repeatedly jumped into the chaos of health data to bring order and insights is Ben Doremus.
Electronic healthcare data is some of the messiest data I have worked with. What are your top lessons from scaling data pipelines in this domain?
Ben: “I mean, it just comes down to trust no one and trust no one data source either. Like you gotta have multiple sources of truth. It's one thing that we found time and time again. When we get data, we get data from revenue cycle, we get it from clinical, we get it from quality. We try to get data from different systems that don't talk to each other so that we can try from there to triangulate to the truth for what's really going on.
Because especially when you're looking for like clinical truth, healthcare data is based on claims, It's based on billing cycles. It's not actually centered on any of what's really going on clinically. You can try to extrapolate, you can use, you know, SNOMED ontologies or whatever else you like, but it's still all based on billing codes, so you never really know what's going on.
So, multiple sources of truth. If you can get the notes from the physicians, that's a gold mine, but they also use templates. So it's super messy there as well, so you can't trust it. We found multiple notes where within the same note, the provider refers to the person as male and female inside the same note.
So like you just using a template, this is not helping anyone here. So yeah, multiple sources of truth, that is like a hundred percent where you start. But then once you've got your multiple sources of truth, you're still kind of in hot water. What do you do when you have conflicts? What do you do when you find logical inconsistencies in the data?
And this is where I would say 80 to 90% of my work has come, is in figuring out how to reconcile those situations. So it's really all coming down to. Business rules and business logic, and for anybody who's implemented reams of business logic, before you know that it's really hard to get all of that to align, to be logically consistent.
It's really hard to keep it up to date to keep these things consistent. If you go from ICD 9 to ICD 10, what are you gonna do? How are you gonna deal with that? So all of this business logic implementation has to be done really carefully in a way that's nice and modular, easy to keep up to date, well documented.
I hear people go on about how, ‘Hey, I don't need to comment my code because it is self-referencing.’ It doesn't work when you need context. That doesn't work when you've got business implementations. You gotta have really good contextual comments to make all that stay up to date and usable a year or two down the road when all of your assumptions fall down.
You have to have access to clinicians and coders. They're not the same thing. They have totally different skill sets, clinicians and coders, and you need both, and they need to be like at your elbow while you're doing this. One of the great things that we did at the first healthcare company I was at.
Was we actually mixed up our teams. So there wasn't a tech side and there wasn't a clinical side. They were blended teams. So the team that I oversaw had nurses on it. They were part of the team. Their responsibility was the same as the software engineers. To make sure that the data we're putting out makes clinical sense and getting those people working together is really hard.
So you gotta have the right people to do it. You gotta people who want to learn, who are curious, who want to expand their horizons, both on software side and the clinical side, right? Like you've got a clinical person who doesn't want to learn how to fix their computer, they're not gonna do great in this scenario.
But there are so many nurses from the ORs and so many coders who want to expand their horizons and learn more and grow and figure out how is this thing called SQL works? Like if you can empower them and get these people together. Oh, it's magic. It's so cool, like a different version of multiple sources of truth, right?
It's like do you have the coders defining the truth and the clinicians defining the truth and the software engineers defining the truth. You need everyone together and like that's how you work with really messy data is you never trust it and you get multiple opinions and you figure out how to make sure those opinions are scalable and maintainable.”
Something that's important but not discussed enough is data security. How do you ensure you keep patient data protected as you scale these complex data systems?
Ben: “So having been at startups, you get to wear multiple hats, and I've had the pleasure of going through three different HITRUST implementations now, and the first time I rolled my eyes the whole time, I'm like, ‘You're gonna make me do what?’ It's, you know, some of these things are written a decade ago, and they're all about in person work and dial up connections and things like that where you're like, ‘Oh, this is ridiculous.’
And the second time through I was like, ‘Oh. I see why they did this now. This is what they're protecting with that. And I can ignore all the other stuff.’ And now like I'm right in the middle of my third time, and this is the first time I've really done it from scratch, where I've got a blank slate, a company that hasn't existed yet, and I get to literally build the company with compliance in mind.
And it's been a game changer for me to view it that way. So, If I could suggest to anyone what to do, it'd be build the company with compliance in mind. Have a framework identified early. HITRUST is actually a really good one. I've been really impressed with it. We've gotten some template policies and procedures that tell us, ‘here's all the controls you need to hit,’ and once you can step back and really see the whole landscape of what you're trying to protect with this, it makes the nuance of each individual control way, way easier to implement and understand contextually. When you just get this big all list of, ‘here's my security checklist.’ It sucks. That is drudgery at that point, right? But if you can start to build a system out of it rather than just little check marks, it really works like it really comes together in a nice way.
And you know, there's some new technologies that make this really easy too. Identity management providers you know, Okta's and, and stuff like that. Holy cow. This just makes it so much easier to manage your user access and your permission levels, and it makes it easier to gather all these things and, you know, stuff didn't exist just a few years ago when I was doing this, or at least it wasn't popular.
So it's, there's, there's more and more things coming on that aren't necessarily, built for healthcare IT and all of that. But really, if you know the general landscape of what's available, it's way easier now than it used to be. Especially with cloud providers, right? Like they take a huge burden off when you're using the cloud rather than having all these on-prem things.
I know it's a totally different story when you're a bigger company and a totally different story if you are a healthcare provider yourself. But from my seat of small companies aiming to help healthcare be better, it's not as bad as it used to be. It's getting even better, and there's all these new laws coming out around interoperability that I am itching to see if they make it dent the problem because they also help with security. You know, they themselves have their own controls in place and they take off some of the burden that we have to right now in sending and receiving data securely. So it's ever changing. The, the landscape is always on a move and staying on top of what's available to you is probably the most important thing.
Have a framework and know what your options are I'd say, boom, there's your two.”
Not only have you scaled data systems, you have scaled data teams as well. What guides you in growing high performant data teams tackling the complexity of healthcare?
Ben: “This is really hard to answer because even in the five years I've been doing this, I have seen so much variety in what is considered a data team. I've had teams that I would term production analytics, we call them data engineers, and I've had teams where we were serving essentially a data infrastructure as a self-service platform, data engineers.
I've had teams that were really just converting JSON to CSV, data engineers. Like all of these different things fall under the same title. So when you ask for, ‘give me advice on building data teams,’ well, like what type? They're all so different. The job I had prior to this one, I was in charge of two different data engineering teams working on two different products with two widely different skill sets.
There was zero overlap between them, like seriously, zero overlap. And it was, the context switching for me was brutal for it, right? Because on one team I'd be like, ‘all right, we are implementing a domain specific dbt model to solve this problem, and the other side we're trying to deal with airflow scaling issues.’
There's just no technical consistency between them, but there are still a lot of things you can do in terms of managing any team at scale. Some of the things I mentioned in the first question. Pull together the content experts with the people doing the coding. You can't separate them.
And to that point, some of the worst performers were people who had blinders on and said, ‘Not my job.’ People who said, ‘I was told to do this, so I did this,’ and they didn't gather context, and they didn't go figure out why they were being asked to do it. That's a recipe for failure. Probably in any job anywhere, but especially in the data space where context is so important.
So curiosity, I guess what it is, if you can foster curiosity and empower people to have a broader ownership over what they're doing, rather than just, ‘I got my job, I do my thing, here's my ticket, check.’ That is just necessary. You can't scale a team of any size. You can't start a team of any size if they don't have that curiosity, but it definitely won't work at scale because one of the problems that I had earlier too was like we had disparate people coming into the data team.
I came in from data science, these other people came in from analytics, and people had their projects that came with them. So then they said, ‘This is my vertical. These are the projects that I take. I take this side of the ticket and then I'm done.’ And when we moved from the tech team and services team to the blended model, another thing we did was we said,’ tearing down all the walls, you don't have projects anymore, that is not your code base, this is everyone's code base.’ You have to be able to move in and out of all of the different projects so that you can see the whole ecosystem of what we're dealing with. Otherwise, you're just gonna make assumptions that are bad. So now you gotta have a primary code owner.
They can handle all the PRs and all that, but they shouldn't be doing all the coding. You need other people in there. You can't have silos is really what it is. You can't have silos in technical skills. You can't have silos in the code that's being worked on. You do want the code to be separated though, especially early on.
Modularity is super, super important because things are gonna thrive and things are gonna die. So building a modular code base, very important. But don't let people find their niche and hold on to it and say, ‘This part is mine and I don't need to know anything else.’"
Person Profile:
Ben Doremus is the Vice President of Technical Operations at Magenta Care Continuum. Feel free to connect with him on LinkedIn to learn more about his work.
What are others saying in the DataOps space?
Data Pipelines in the Healthcare Industry
What: A great intro describing the clinical workflows and how it’s represented in data pipelines.
Why: Healthcare requires so much domain knowledge to start, this article can help.
Who: You are relatively new to healthcare and what to gain a high level overview of working with such data.
Artificial Intelligence and Machine Learning in Software
What: The U.S. FDA finally released its guidelines on using AI and ML within healthcare after years of anticipation.
Why: This is a BIG DEAL, as companies finally have guidelines on implementing AI and ML software within clinical workflows and whether or not it’s considered a medical device.
Who: You are creating production ML systems within clinical settings.
How to build an effective DataOps team
What: High-level overview of the various roles within an effective DataOps team.
Why: Insights into what roles are needed, how they support DataOps efforts, and how to tailor it to your organization.
Who: You are a leader planning data strategy and or headcount.
About On the Mark Data:
On the Mark Data helps brands connect to data professionals through captivating content, such as this newsletter and other featured content! Please feel free to check out my website to learn how I can support your data brand via influencer marketing or content and go-to-market strategy consulting.