SDO 017 - Navigating the ML, AI, and Data (MAD) Landscape as a VC - Matt Turck

Interview: Matt Turck, Managing Director at FirstMark Capital

Mar 10, 2023

What are your thoughts on venture capital?

Last week we discussed the changing data market from a founder's perspective. For this edition, we are going to the other side of startups to understand the VC perspective. I highly encourage checking out the articles at the bottom of last week’s interview to get the historical context of our market. Then check out the MAD Landscape articles in this edition to understand the current context.

My first interaction with venture capital was in grad school when my co-founder and I applied to Pear VC for our health data startup idea. I quickly realized I was out of my depth as I was asked about TAM, competitors, and other details about the health market... rejected. We then applied to Lean Launchpad, got an interview, and again crumbled when VCs started questioning our business model… rejected. A few more rejections that year and a pattern became clear— VCs are some of the best individuals at identifying holes in your proposed ideas, for they are in the business of saying “no.”

The more I learned about venture capital, the more I understood why they must be quick to say “no.” VCs are in the challenging position of seeing hundreds of startup pitches but only having the ability to allocate funds to a handful of companies. In addition, venture capital is beholden to the “power law,” where a successful return for their respective firm’s fund is often driven by a handful of companies in their portfolio. It’s a tough job, but this is precisely why VCs are some of the best individuals to learn from concerning changing trends within markets.

— Mark

Hear from Matt Turck, Managing Director at FirstMark Capital:

Hear from "XYZ" highlights real-world use cases for all of us to learn best practices and upcoming trends within the DataOps space. One of the coolest things about creating content is that it can be a platform to enable you to meet people you look up to. Matt Turck is one such individual who has significantly shaped how I view the data market. Ever since I started my career in data, I have looked forward to Matt and his team’s release of the MAD Landscape and seeing how the data industry has grown. This landscape has inspired many of the questions I’ve asked leaders in this newsletter regarding how they are navigating such a fractured data market. So I’m beyond excited to share with you all this interview with Matt and learn where the data industry is potentially going in the future. Enjoy!

With the aggressively increased fed interest rates, institutional capital has become expensive again, thus directly impacting venture capital. What impact will this economic shift have on the MAD Landscape in the next few years?

Matt: “I do think that we are in a very different world all of a sudden. It's true for any kind of startup, but certainly for MAD including data infrastructure companies. In the wake of the Snowflake IPO, there was as we all know, an enormous amount of excitement around data infrastructure, the rise of the Modern Data Stack, all those things.

So you end up with a bunch of very interesting companies getting started and a lot of venture capital that was more than happy to fund those companies and then fund them again, and then six months later, fund them again. Occasionally, again and again. And that was a lot of fun and a little dizzying, but ultimately led to a lot of categories emerging overnight and getting very crowded overnight. And it was not a completely irrational approach because certainly the Snowflake IPO was an unlock for an entire space.

And if you believe, as I do that ultimately every company is a data company, meaning every company needs to have a data infrastructure and be data-driven, then the market for this is very large because the market is ultimately everyone, right?

So it was not completely irrational. At the same time, this led to a market that was super crowded, and now the music has stopped, and everyone needs to or is in the process of adapting to a new reality. And look, I'm a huge fan of the data ecosystem, machine learning, AI, and all the things. But equally, I think we should be all very transparent and honest with ourselves about what's happening, and what's happening is not necessarily a lot of fun. I think something's gonna have to give, right? You can't have a lot of very young, often single-feature kind of companies, many companies being below five million in ARR.

And then on the other hand you have customers who are probably gonna be much more selective and discerning in their buying process because they are gonna be under strict scrutiny by their CFO. A lot of supply, arguably at least for now, less demand. And then, on top of that, less easily available venture capital money. All the things are not gonna be able to work together for a very long time. Unfortunately, I expect that this is gonna be pretty tough here in data infrastructure in 2023, perhaps 2024.

By all means, I hope I'm wrong. I hope the macro environment comes roaring back. I have all sorts of companies that would benefit from that. So, by all means, I'm with everyone on that. But I think we are at the beginning of a more Darwinian period where in each category there's gonna be one or two companies that survive and then a bunch of companies that are not; and when I say they're not, it doesn't mean they necessarily go bankrupt.

Unfortunately, some of them will. But I think it means that a lot of companies get gobbled up very often for not very much money and not generating the kind of return that the founders, the employees, and the investors were hoping for. But, I think we are at the beginning of this trend as opposed to the end of it. I think for a lot of those companies the moment of reckoning has not even happened yet because so many companies raised a bunch of cash in 2021, and they still have, year, two years, sometimes three years of cash.

So they don't really have to worry about the immediate reality of what raising a round will entail. But I think that moment is gonna come at some point, and when it comes, that's when it accelerates everything in terms of " okay, our business is not working. We can't raise another round, therefore we need to find our home. Oh shit, we cannot find our home, therefore we are going to have to go and just move on and do something else."

And look, I think a lot of those companies are just early, right? And, a bunch of companies were started in 2019, 2021, so they're one, two, three years old. And it's just not a lot of time to build a business, even with great founders, even with a smart positioning of the product, even with great execution, even with venture capital money. That's a little bit of a time on Earth dimension that matters immensely in building a company. And It is just not enough time to get the kind of escape velocity that will enable you to raise more money and therefore be able to survive in the top market.

So again I hope I'm wrong. I hope somehow it all works out. I have a lot of friends in this industry. There are a lot of people I deeply respect in this industry and all the things. But think we're gonna have a tough couple of years ahead of us.”

We are currently seeing an evolution away from the “Modern Data Stack” within the data market. Given this change, what type of data infrastructure startups are you most excited to invest in?

Matt: “Yes, I ask myself that question a lot these days. So I do agree that there's at least chatter about evolving away from the Modern Data Stack. Although I think the reality is that it's gonna be a lot more nuanced than this. But, like everybody else in conversations, I'm hearing that the whole idea of paying an ELT vendor, ETL vendor for a lot of money, and then you data warehouse a lot of money, and then the transformation layer a lot of money, and the visualization layer a lot of money, and then like stitching everything together, and that's expensive and time-consuming.

Also hearing that general philosophy, which has been one of the core tenets of big data since the Hadoop days, that taking all your data and dumping it into a repository and worrying about what you're gonna do with it later. That's getting certainly under scrutiny because as it turns out, it's technically doable because, Snowflake is super elastic and all the things, but still very expensive and not always that useful. Something I'm seeing people for the first time since the Hadoop days maybe that's not exactly the right approach. So all of this is definitely happening.

Now, in terms of companies I'm excited about there's certainly a generation of companies that seem to be sort of taking a different stance. Like the perfect example that has been obviously very buzzy is DuckDB. And this concept of disaggregating it all up if you want. So I guess it was like one central thing, like doing it in an embedded manner. That's certainly interesting. I think despite the buzz, and I'm perhaps sadly not an investor in the company, but from what I'm hearing is still pretty early and a lot to prove and all those things. But in terms of approach, I think that's really interesting.

There's a whole different category of companies where I was not sure at first, but I actually do think it's interesting, they seem to be getting quite a bit of traction which is the rise of the fully managed platform. And I'm not an investor in those companies, I don't know the details yet, the Y42, Mozart Data, Keboola and they do things in different ways, right? The Mozart Data and Y42, as I understand it, just take all the usual suspects and stitch them together and abstract where the complexity where as the Keboola has built all the tools themselves natively. But regardless, I think that approach of " okay, you have one platform that does everything regardless of how you do it." I think that's interesting in a context where you start focusing more on just simplicity and everything working and you may have like less resources on your end to do experimental stitching together. So the disaggregation of all-apps-on-one and then the rise of fully managed, I think those are two interesting trends.

And then maybe a little bit, the periphery of the Modern Data Stack, I'm excited about the general evolution towards more sort of convergence and more simplicity. Precisely when you look at the MAD, you have tools everywhere that you also see different things. And it seems whether that's at the database obstruction layer or in other parts of the ecosystem, that things seem to be converging toward another.

If you look at the database abstract space, which is what I call it, it may not be the exact term, but I recently invested in a company called SurrealDB which is an abstraction layer on top of a key-value store. And that does a lot of things that do like document and graph and like real-time and it's serverless and so a lot of the things that people have been talking about for a while, that has been historically very hard to combine. That's now happening. So I think that's pretty interesting.

I was also an investor in CockroachDB, which is like more SQL with some of the capabilities of NoSQL in terms of scalability. But that's also like simplifying that complex problem space and doing different things that people didn't think could be combined together. So I think those are interesting spaces.

And in the same theme of convergence. It's weird that as a space we had like the people that do the batch things and then another group of people that do the real-time things. So I'm interested in that convergence as well. And I'm an investor in a company called Estuary. That at its core, does a lot of this "why do we need to have real-time and why do we need to have batch? How can we just unify everything together and rethink our ELT in that general context." So that convergence and that trend toward conventions and simplification feels inevitable to me and very interesting.”

You are the organizer and host of Data Driven NYC, one of the largest communities focused on data, ML/AI, and enterprise software. What has been key for you in both growing and maintaining such a strong tech community over the past 10+ years?

Matt: “It's gonna sound terribly self-serving, but the love for it has been the key driver because it's very easy to do one or two events, or four or five. It's really hard to do it over several years and even harder to do it over ten years, which is what I've been doing at Data Driven NYC.

And especially there's an offline component, which is a big part of Data Driven, which has pros and cons, but like ultimately I think is very powerful. Every event you need to get butts in seats and you just constantly just rebuilding the event each time; like look, this is not show business, but you are literally only as good as your last event.

And so the consistency of the effort required I think has been the key driver for doing this. And then that's on my side. And then I think on the community side, I think you wanna create that flywheel where great speakers bring a great audience and keep great speakers coming back.

And it's not just a number. I think the biggest surprise to me for Data Driven over the years has been the sheer quality of the people that show up with the things. And there are again and again people that could then maybe will be on stage in the future. It's just I guess that's the beauty of doing something that's pretty geeky. Like you get a self-selecting group of people who shows up again and again at some meetup to talk about like data and machine learning or AI; like people that are irrelevant to the space will go once, but they won't come back. But the people that keep coming back tend to be of a level of intellect and passion and just knowledge of the space-- sophistication. That makes everybody's experience just incredible, including the speakers.

Over the years, speakers have commented again and again about the quality of the discussions that they had, and it's every speaker's fear that after you've done your video panel or your presentation, you get swarmed by people that are just wasting your time. And you know what, what I heard over the years is that it's very different from their perspective just because people are just good in the audience in general.”

Person Profile:

Matt Turck is the Managing Director at FirstMark Capital. Feel free to connect with him on Twitter and LinkedIn to learn more about his work.

Matt Turck @mattturck

How it How it’s started going (2012) (2023)

What are others saying in the DataOps space?

The 2023 MAD (Machine Learning, Artificial Intelligence & Data) Landscape

What: A dizzying display of the various vendors in the data industry that just grows every single year… this is a lot of work to compile, so it’s awesome that our industry has this.
Why: An opportunity for you to validate that you have indeed heard about a new data tool every other day.
Who: Any data professional interested in a snapshot of our industry.

MAD 2023, PART II: FINANCINGS, M&A AND IPOs

What: An overview of the funding side of our data industry for the past year.
Why: A great discussion on how data companies can adjust to our market potentially headed towards a downturn.
Who: You are a founder that is looking to raise funding and wants a reality check.

MAD 2023, PART III: TRENDS IN DATA INFRASTRUCTURE

What: A dive into how data infrastructure is starting to change in our industry (e.g. bundling and consolidation).
Why: This is one of the best summaries of emerging trends in the data infrastructure space.
Who: You are a tech lead or head of data platforms trying to understand what your data stack could look like in a few years.

MAD 2023, PART IV: TRENDS IN ML/AI

What: Have you heard of this thing called ChatGPT? Of course, you have!
Why: Even though we have heard a lot about ChatGPT as a tool, few are talking about the market implications of generative AI… this article is a great primer for understanding its impact.
Who: Anyone who is trying to make sense of generative AI and its potential impact on their business and or career.

Bonus: Very timely is the situation of Silicon Valley Bank and its impact on the startup ecosystem. Below is a great tweet thread describing the situation.

samir kaji @Samirkaji

A lot of panic re: SVB (you should see my phone/emails!). A bank run driven by panic is the real risk here, not the action of selling LT securities at loss I have no inside information as I left SVB in 2012, but know enough about banking to piece together. Quick 🧵

About On the Mark Data:

On the Mark Data helps brands connect to data professionals through captivating content, such as this newsletter and other featured content! Please feel free to check out my website to learn how I can support your data brand via influencer marketing or content and go-to-market strategy consulting.

Discussion about this post

Ready for more?