SDO 012 - The Data Driving Agriculture - Ahraz Husain
What are your thoughts on AgTech?
One of the largest data revolutions is hiding in plain sight amongst the produce within grocery stores— AgTech is quickly adopting data best practices that we can all learn from. According to this IDC article, the "…average farmer generates 500,000 data points every day…" ranging from telemetry to satellite data. Yet, we all know "mo' data mo' problems" is a pillar of the pain we experience in the data industry, given data's affinity for entropy. All of which makes AgTech one of the most interesting areas to explore for DataOps given the space is 1) in the midst of digital transformation, 2) there is a tremendous amount of rich data from disparate sources, and 3) data quality is a major pain point holding back impact. I hope you enjoy learning more about this exciting space from my guest!
Real a quick shoutout to the Data Teams Summit conference, which will host a LIVE session of the Scaling DataOps Newsletter on January 25th, 2023! For this session, I will ask data leaders at various levels how they handle their data infrastructure when they face scaling issues. I can’t wait to hear my guest’s insights!
You can register for this free virtual event here!
Hear from Ahraz Husain, VP of Data at Growers Edge:
One of my selfish reasons for starting this newsletter is to have an excuse to talk to interesting people in data, especially within industries I have minimal experience. I met Ahraz at the Scale AI TransformX conference a few months ago, and during our brief chat, I was so excited to learn about the data problems he faced in agriculture. So I had to get him on the newsletter so you all could learn more about this space too. Even if you are not in agriculture, we can learn so much from this industry, especially in handling the anomalies caused by the pandemic.
Thanks for reading Scaling DataOps! Subscribe for free to receive new insights from data leaders every week.
When people hear about AI, many first think of self-driving cars or recommendation engines behind social media, yet agriculture has a wide array of AI use cases that can strengthen our food systems. For the folks unfamiliar, what are the unique data challenges you face within this domain?
Ahraz: “Before I jump into the challenges, let me give you a few examples of the AI users in agriculture. Apart from, you know, the usuals: transportation, pricing, supply, demand, robo-workers that's an emerging space as well. But there are a few other AI use cases, probably a lot more dominant than these or very specific to ag.
You have agronomic prescriptions. What seed should I plant, where, how much, and what chemicals should I apply? So there isn't runoff, yet the crops can have the right amount of nutrients they require. Do I need fungicide for this tree or not? Then you have the good old question of crop monitoring, this has been probably one of the first sci-fi ideas, they're using satellites to monitor crop progress, how's corn doing, how are the trees doing, and such. So crop monitoring has been really old, but monitoring isn't a big challenge.
The challenge is identifying causes that are more AI-driven. Why did yield fall in this geography or this part of my field? Then obviously, you have a good old production predict forecasting, how much yield can I expect? And again, when we think of production it's at a macro level. US-wide, how many grapes are we going to get this fall or whatever?
Or on the flip side, there are challenges even within field levels. Here's a small field, 40 acres, what can I expect out of this field? If I'm a farmer, that is what I care more about, for obvious reasons.
Then you have things around leak detection. We apply fertilizers all of a sudden here in this river, there are a lot more new nutrients, nitrogen, phosphates, or whatever shouldn't be there. Can we use AI to help track it all the way back to where it came from? Then you have regulatory compliance, EPA, and many other compliance challenges. And then last but not least, this whole emerging field of carbon, carbon offsets in ag, is being driven by AI.
So these are all examples of where AI is being used, and when it comes to challenges, they're abundant as with everything, but my biggest thing is... there is a lot of data Mark. More data than anybody could imagine.
So just for context, let's talk about corn fields cause I'm here in Iowa, and it's always fun to talk about corn around here. So you've got seed, right? An acre of land can have 32,000 kernels of seed, and today's planters can really track them down. So you can really have a GPS code, which is tied to each seed planted. Now you end up with the right chemicals. So you have information on those. So just, let's think about just simply 32,000 data points per acre. Now there are 19 million acres of corn in the US it's a huge amount of data is created.
The second biggest challenge is the data is noisy. Cause up till now, and this has been a big challenge, calibration was an issue, right? So I harvested corn, for instance, or soybean, and I forgot to change the crop selection. Little things like that, all of which now AI is helping too because now we can auto-detect crops and stuff with harvesters. Then obviously, noise. Talk about noise, satellite imagery, we talked about remote sensing to detect crop performance, but cloud, now they're using AI to go through clouds spread again. There are new sensors and new technologies that can do that.
The single biggest thing, much of ag, is driven by geospatial computation. And GIS technologies didn't really go far over the last few decades, but over the last two or three years, maybe they've come a long way. Now we can run geospatial systems in Spark, but I wouldn't even call them production grade even today in Spark, but things are getting there.
Then there are challenges around access. Very few companies or organizations actually create the data. So think John Deere. Think Caterpillar, big equipment companies, really up till now, they've been exchanging it, but now they wanna get into the data processing game. It makes it challenging for startups and other innovative companies to come and play.
Then ownership, there's this big question, who owns the data now? Is it the companies that collected the equipment that you used? Is it the farm manager? Is it the owner of the farm? Is it the landlord? Is it the companies like us that process data for analytics, among others?
And then obviously security is a big deal, we've had cyber tax and all those challenges even in ag. So again, you are never short of problems to solve.”
A major aspect of data products in agriculture is forecasting produce demand to help growers understand their key business metrics— which was heavily impacted by covid supply chain changes. How can data teams navigate handling such a huge anomaly in historical data in providing accurate forecasts?
Ahraz: "What a great question. I'm gonna take an agriculture spin on this a little, just thinking about forecasting produced demand, right? It's a function of your expected production, export demands, and then ending stocks. These three are the major economic factors that go into assessing and establishing what your demand might look like now, apart from obviously great different quality of different producers or dairy products have different demand curves. And then geography plays a big factor, too, right? You can't ship corn out of Utah, but you can out of Iowa. Cause we have a lot of shipping routes through the rivers here. So those are factors. Obviously, we've always had challenges with determining supply, and I know your question's more about demand, but let me pick on supplier a little, too, just as an example.
Weather is the single biggest influence on supply; weather you can never predict. You can come up with the best estimates, but best of luck trying to get an accurate number.
I'll pick on an example from 2012, there were widespread droughts across the Midwest US, and I think we lost roughly 25, 30% of all expected yields by the end of the season. So that's a huge number. It was a brutal year for ag, at least row crops. And then you have diseases, you have things like insects, and product failures, all of which can throw your supply off completely.
But talking about demand, right? The question you specifically had. One of the biggest demand signals is through economic reports because that really tells you where to go, right?
So the USDA puts out a lot of these things picking specifically on covid, right? USDA put out an estimate that there would be a high supply and strong exports of milk. I'm talking about March 2020, and they said you could expect 25 cents for raw milk. But by the time April hit a month later, the price was cut down by 25-30% in a span of 30 days.
And I remember reading stories about milk producers, dairy farmers just throwing milk away because they didn't want the prices to go down to unsustainable levels, and so on. But going back to all these anomalies and data that we see and that our team sees, right? Some things that I have seen us do in ag specifically have been around: literally just remove the seasons that are anomaly years because you're trying to make predictions of the forecast, anomalies that can be tricky and anomalies can happen due to so many different reasons. So simple way, just remove those bad seasons. So again, from a data standpoint, not an ideal way, but from an agronomic or economic standpoint, you might do that sometimes.
Obviously, you can substitute your outliers in the data, but now we're talking about a lot of data, a lot of outliers. So that by itself is heavy compute, heavy processing, right? I've seen some models leverage pre-covid expectations. So come March, here's what we anticipated, let's use that as an actual, right? Because within 2-3% is always where we ended up end of year and so on. You can model demand, right? What would have happened if Covid wasn't there? But again, we are now getting into synthetic data, which can create its own nuances.
Then if you are really smart, you can do event-aware forecasting, which again is a different ballgame because we are talking about an unlikely event. What were the odds that covid would hit? And through every model of pretty much, right? And even economists didn't know what to expect. Expecting models to know would be a very tricky thing.
And then finally, but not least, depending on your modeling framework and in ag in agriculture and food, usually the predictions are made on so much as time series as they are, like pre-season or post, right after harvest is when people try to model and forecast demand. You can always have economic demand signals, macro signals are playing, right?
Really, depends on the modeling approach, depending on the problem you're trying to solve. Sometimes as easy as removing the data and your input. But the best thing you can do is always have an eye on economics, which is a major driver in ag, at least. And the second cool thing you can do is build your models to be agile. And when I say agile, I mean to be able to train them quickly as situations change. And to be able to make quick predictions. If you're trying to make a model that does things once, you train it once a year or once a season and make predictions once a season while you're off for a fairly rocky time, and things don't go to plan. So really, options are limitless, and that's the challenge, right? That's the beauty of our field.”
In a span of four years, you have gone from a senior IC role, to a technical lead position, and now you serve as a VP. How has your approach to solving data challenges evolved with your career progression?
Ahraz: “Questions like this make you retrospect and really think about how you evolved. Cause oftentimes, you simply focus on where you are versus how you got there. So it's a great question to ponder upon certain things that have happened.
To begin with, a really important thing for me has been learning from leaders I look up to. Find a few leaders you love. One thing I've known forever is don't follow them, but learn from them. Always challenge, and ask questions. So really, that's been the key for me, at least.
Now to answer your question very directly, there have been three fundamentals that I've evolved over time and have stayed really persistent for me. One is perseverance. At least in the data domains, failures are a lot more dominant than success stories; you try 80 models, and 19 work out. And so perseverance is the single most important thing that I've truly learned and I have benefited from.
Second is flexibility, so what that means is knowing when to back off, and you know that good old failing early concept. When it comes to modeling, the sooner you realize something might not work, the better off you are. It's better to know what are early indicators of failure are and really know when to change course.
And then finally, but not least, this is what Silicon Valley has been talking about for decades, is that first principles thinking. It's not about the solution but really the problem at hand, how this problem should be solved, and so on. So I guess those are things that have really stayed consistent for me. Obviously, what they mean has evolved and changed in my own role and what I get to do.
I guess my first role, it was everything was very ad hoc. Anything that I was asked to do, I would just change gears and do that, it moved over to a place where working bigger teams and leading teams, it got to a place where I realized real quick that doesn't work the best either. So somewhere in the middle, the flexibility to take different tracks with different projects that you're building has truly been the key. And I can not emphasize enough on how important that is and let alone for a startup, the one I'm at right now. It's important to realize the same process, project life cycle doesn't work for everything, not in our domain of data analytics. My own team today follows at least three different life cycle processes for different types of projects we do, you have the A-type analysis projects, you have the B-type, the build process, and then you have kind of a mix of both, and so on.
So I think it's really important to understand that as well. And I used to joke, so one of my titles, before I became a lead, was a solutions architect. And I used to joke once I became a manager, "I'm more of a problems architect." So it's not about the solutions, it's more about finding the right problems that are worth solving. And I think that, again, is really important, as important as the solution itself.
I remember one of my ex-bosses saying something, and he quoted, " the most successful people are those that think about implications." And the more you can be obsessed with the implications of your choices and decisions through the modeling process, through selections of projects and products, and whatever else it is that you're doing, the more likely you're to succeed. So don't get lost in the problem. Always focus on the implication. I guess that's been a big mantra of mine thus far.”
Ahraz Husain is the VP of Data at Growers Edge. Feel free to connect with him on LinkedIn to learn more about their work.
What are others saying in the DataOps space?
How COVID-19 is disrupting data analytics strategies | MIT Sloan
What: An article highlighting how the pandemic is disrupting data workflows and what firms are doing to account for it.
Why: I can't think of any other event in recent history that changed everyone's behaviors both at a global scale and at the same time; thus, what do we do with this anomaly presented in our data?
Who: You are a data leader still navigating the pandemic's havoc on your data quality and production models.
The Importance of Data Quality in the Agri-Food Sector
What: A high-level overview of the data quality challenges faced in agriculture.
Why: I like to start as high level as possible whenever I am learning within a new space.
Who: You are trying to understand the pain points experienced in agriculture data.
Better data, higher impact: improving agricultural data systems for societal change
What: An in-depth review of the data challenges faced within agriculture, the opportunities, and roadblocks.
Why: Seeing similar data challenges faced by industries outside my own is an excellent reminder of how hard working with data is and how early we are in getting the most out of data.
Who: You are potentially looking to solve data problems in the agriculture space and want to identify opportunities