Field notes: Part 1 of 10

Plant Identification Looks Solved. Until You Try to Build It

27 May 2026 Originally published on Medium

Modern plant identification apps will tell you, with confidence, that a potentially toxic plant is a culinary herb.

I know because one of them did. I took a photo of a seedling in my garden. The app told me it was basil. I left it to grow, watched it mature, and worked out, eventually, that it was annual mercury. Mildly toxic if you eat it. Basil isn’t.

The app didn’t offer alternatives. It didn’t qualify its answer. It just said: basil.

Basil Identification. Illustration generated with AI.

On its own, it isn’t a big deal. I identified that plant correctly in the end. But it stuck with me for a long time.

I spent nearly a year building a plant identification system from scratch to understand why.

What I found was a problem that looks simple on the surface, but breaks in ways that aren’t obvious until you’re deep inside it.

This series is a record of that process: what worked, what failed, and what turned out to be much harder than I expected.

Who’s writing this

I’ve spent fifteen years in engineering. The last five leading Site Reliability Engineering (SRE) teams supporting AI systems in production. I’ve built smaller machine learning (ML) systems before. Perceptrons, multi-layer networks, deep learning models. But I’m not a computer vision specialist, and I’m not a botanist.

Close enough to the problem to care. But I’m not expert enough to get it right first time.

Why this is harder than it looks

At first glance, plant identification looks like a standard computer vision task: take an image, classify it, then return a label.

The problems start almost immediately.

Plants don’t look like themselves. The same sunflower can look completely different as a seedling, a mature plant, a flowering head, and a dried seed stage. It’s like being asked to recognise a person from a baby photo, a blurry CCTV still, and a picture of them at a party in fancy dress or cosplay, then being told they’re all the same individual. A model trained on one stage will quietly fail on another, and the user won’t know why.

Sunflower seedling. Rob Duval, CC BY-SA 4.0.

Sunflower vegetative plant. Anshul24Sharma, CC BY-SA 4.0.

One beautiful blooming sunflower isolated in front of a sunflower field — Sunflower flower. Wikimedia Commons contributors, CC BY-SA 4.0.

Sunflower seed head. Wikimedia Commons contributors, CC BY-SA 4.0.

The scale is unforgiving. Most image classifiers deal with tens or hundreds of categories. A useful plant identification system needs thousands of common species, plus a long tail of rarer ones that matter to the people who care most. Many of those species differ only in subtle ways: a slightly different leaf edge, a minor vein pattern, a different arrangement on the stem. These are distinctions botanists spend years learning. We expect a model to get it right from a phone photo in a fraction of a second.

Leaf arrangement diagram (alternate, opposite, whorled). Agnieszka Kwiecień, GFDL 1.3.

And then there’s the real world. People don’t take clean, well-lit, centred photos. They take pictures in bad light, at awkward angles, with cluttered backgrounds and half the plant out of frame. The model still has to decide.

Underneath all of this is a deeper issue. Most image models are trained to classify: to choose from a fixed set of labels. But that’s not how humans identify plants. When someone holds up their phone, they’re really asking: what does this look most similar to? That’s a different problem. More about comparison than categorisation. The gap between those two approaches quietly determines how well these systems work in practice.

Why I decided to build it anyway

I’ve always had a hobbyist interest in plants and gardening. I’m not a specialist, but I am someone who likes to get my hands dirty in the garden and I do notice when something looks wrong and I want to understand why.

When I used existing apps, I kept hitting the same limitation. They could usually tell me what a plant was. They couldn’t reliably tell me whether it was healthy, what was wrong with it, or what I should do next.

They are all technically impressive. None quite felt complete.

So I decided to build it myself. Not because I thought I could do better immediately, but because if the problem were genuinely solved, it shouldn’t feel this unreliable to use.

There was also something different about this project for me personally. For the first time I wasn’t maintaining or extending someone else’s system. I was building end-to-end: model, data pipeline, product decisions, how it would actually be used. And I wasn’t building a standalone classifier. The plan was a full mobile app for gardeners and plant owners. Something that didn’t just identify a plant but helped people care for it.

The hypothesis

Plant identification isn’t a general reasoning problem. It’s a perception problem with hard constraints: latency, reliability, privacy, cost.

I started this project partly to test a hypothesis I’d been forming over years of working on production AI systems. The default way people build AI systems today is to reach for a single large model, run it in the cloud, and treat it as a general solution to a wide class of problems.

I’d come to suspect this is the wrong default for systems that have to work under real-world constraints. The bigger the model, the more it abstracts away from the specifics of any particular task. The more cloud-dependent it is, the more it inherits the cloud’s downsides: latency, cost, privacy exposure, vendor risk. The more general it is, the less it can be optimised for the specific job in front of it.

My hypothesis: AI systems should be architected as composed, specialised components, not monolithic models, when operating under real-world constraints. Smaller specialised models that do specific things well, integrated into a system that knows what it’s doing and what it’s not. This is the kind of architecture that has been standard practice in software engineering for fifty years, and which the current AI moment has partly forgotten.

I didn’t know if that was actually true. This project was a way to find out.

This project is the test. The series that follows is what I found.

Why on-device matters

Almost every existing solution runs in the cloud. Capture image, upload to a server, run a big model, return the result. It’s the easiest way to build this kind of system. It’s also a bad match for how the product is actually used.

Plant identification happens in gardens, on allotments, on hikes, in greenhouses. Places where signal is patchy or absent. If your app depends on a server, the moment the user needs it most is often the moment it stops working. That alone is reason enough to take on-device seriously. But there are a few others worth acknowledging.

Every image sent to a cloud service comes with metadata: your location, your surroundings, sometimes the inside of your home. On-device removes that problem rather than promising to handle it carefully. No uploads, no tracking, no backend dependency to get breached later. For a product where the user is often photographing their own garden, that should matter to people.

There’s an economic argument too. Cloud inference has a real per-request cost, and for a subscription app that cost scales with engagement. The more your best users use the product, the more they cost you. On-device inverts that: once the model is deployed, inference is effectively free, and heavy users become your cheapest users instead of your most expensive ones.

Beyond the unit economics, you’re also not tying your product’s viability to a handful of providers whose pricing you don’t control. You don’t need to predict where compute costs go next; you just need to notice that the dependency exists.

The trade-offs are real, and I don’t want to downplay them. On-device means hard constraints: limited compute, limited memory, strict model size budgets. You can’t scale your way out of a problem by renting a bigger GPU. You have to design carefully, make trade-offs explicit, and actually understand what the system is doing. It removes the safety net, which is uncomfortable. It also forces the kind of careful design that often produces better systems.

Doing this without cutting corners

One thing I want to be upfront about: AI has a trust problem with the public, and a lot of that trust has been burned by how large companies treat other people’s data and intellectual property. Even small ML projects can be tempted to gloss over this. I didn’t want to.

The rule I set for myself was simple: only open datasets with clear licences, respect the terms of service of any external source, and pay for access where paying is the right model. For APIs, that means respecting rate limits and not scraping aggressively, treating shared infrastructure as shared.

Technically accessible isn’t the same as fair to use. Holding that line made the project harder. It also made it something I’d be willing to put my name on.

What I actually wanted to build

Not a demo. Not a classifier dressed up as a product. Something that could answer the questions people actually ask when they’re standing in front of a plant holding their phone:

What is this? Is it healthy? Does it have a disease? What growth stage is it in? What do I do next?

I wanted useful answers, no latency, no connection required, all running inside a pocket computer’s thermal and memory budget. That combination is what makes the problem genuinely interesting. And much harder than it looks.

What this series is about

This isn’t the polished tutorial version. It’s what actually happens when you try to build an ML system of this scope from scratch: what worked, what failed, what I misunderstood, and what turned out to be harder than I expected.

I went into this thinking the hard part would be building the model.

It wasn’t.

The first thing that broke was the data.

To be continued.

Images sourced from Wikimedia Commons (https://commons.wikimedia.org):

Sunflower seedling. Rob Duval, CC BY-SA 4.0.
Sunflower vegetative plant. Anshul24Sharma, CC BY-SA 4.0.
Sunflower flower. Wikimedia Commons contributors, CC BY-SA 4.0.
Sunflower seed head. Wikimedia Commons contributors, CC BY-SA 4.0.
Leaf arrangement diagram (alternate, opposite, whorled). Agnieszka Kwiecień, GFDL 1.3.

Licences: CC BY-SA 4.0 and GFDL 1.3

All field notes