Product

Why AI writes better code than product specs

Troy

CEO

Tue Jun 30 2026

10 min read

Why AI Writes Better Code Than Product Specs Blog

On most teams today, AI is more reliable at writing code than at writing product specs. That's not a knock against product teams. It's an observation about the kind of information each discipline hands the AI, and what the AI can actually do with it once it has it.

The information produced by engineering teams provides a more effective foundation for grounding AI throughout its workflow. I've seen this play out across different teams. An engineer points an agent at a problem and gets back something solid. A PM uses AI to write a story and gets back something that reads fine but is quietly off. At first I thought PMs might just be slower to adapt. It isn't that.

The question is whether the things that make AI good on the code side have an equivalent on the product side. I think they do, and getting there is what we've been exploring at Atono.

Why code is easier for AI

AI is better at code for a few reasons, and none of them are about the model being smarter.

The first is that most coding tasks come with the hard part already done: someone's already decided what to build. The agent just has to figure out how, and even that part usually has a narrower range of right answers than a product decision does.

The second is that code can be checked. A test passes or it doesn't, so the agent finds out right away when it's wrong, instead of someone catching it three weeks later.

The third is the biggest one, and it's not really about AI at all. Engineering built itself scaffolding long before any of this: tests, architecture decision records, schemas, rules files. It did that because engineers needed a record other engineers could trust, not because it had a model in mind. It just turns out that the same record is something a model can read too.

Why specs are harder

Product teams have their own version of that record. PRDs, design docs, Slack threads, the things everyone on the team just knows. The information isn't missing, it just wasn't written down in a way a model, or honestly even a new hire, could pick up and use right away.

Take a screen we call "Everything." Telling a model the word refers to a screen would help, but only slightly. What it actually needs is the product knowledge nobody keeps in one place. The screen shows every story, bug, and epic across the workspace a user has permission to see. It supports filtering and saved views. Private team items get excluded or redacted depending on who's looking. None of that is a definition. It's context, and context like that doesn't live anywhere a model could go find it.

The harder version of this same problem is the stuff that was never written down, because nobody felt they needed to: a workflow that's only available on certain plans, a field that behaves differently above a contract threshold, a decision the team made two quarters ago. None of that lives in a PRD, a ticket, or the code itself. It lives in whoever happened to be in the room. Researchers studying AI-assisted teams have a name for this now. They call it Intent Debt, the backlog of decisions and reasoning that piles up because no one captured it, and that an agent has no way to recover on its own.

And even past all of that, there's still no test for whether a spec is right. Code either compiles or it doesn't—product judgment has no equivalent. Deciding what to build is usually treated as the easy part, ahead of the "real" work of building it. I think it's actually the harder one.

Why this is showing up now

I don't think this gap is new. I think it's becoming more visible as we change how we use AI.

When AI was mainly used to draft text for a person to review, the gap was survivable because the human in the loop caught what was off before it went anywhere. Now that agents are doing more of the execution themselves, that loop is closing and there isn't always a person in the middle to catch what the model got wrong.

You can see this in how teams are actually using AI today. A recent survey of about 350 engineering professionals, run with Refactoring.fm, found that 71% use AI for writing code, versus just 9% for product requirements. There's nothing about a spec that should be inherently harder for a model to draft than code is. The real question is why teams aren't using it for that more.

My guess is that it comes down to what we've been describing. The same survey found that 64% of teams keep their most critical knowledge primarily in people's heads, not in a document or a ticket. A model writing a spec against that kind of gap isn't really wrong, it's filling it with something plausible yet not actually true for your product. Rework after the fact costs time too, sometimes almost as much as writing the spec from scratch. If the math stops working out, going back to what already worked is an easier call.

There's also a reason this can take longer to notice than it should. Bad code throws an error. A spec that's slightly off just becomes the thing everyone builds against, and nobody finds out it was wrong until later, by which point it's not obvious AI had anything to do with it.

The gap, in one place

Taken together, the difference comes down to where product knowledge lives, and whether a model can reach it while it's working.

AI needs to know…	Engineering usually has…	Product often has…
What does this term mean?	Types, schemas, and enforced naming	Shared language that’s rarely documented
Why was this decision made?	Decision records, tests, and rule	PRDs, Slack threads, and team memory
How do I access it while I’m working?	It’s already open in the repo	Someone has to be asked, and that takes time the model doesn’t have

What we've been trying

So what does closing that gap actually look like?

I don't want to claim we have this fully solved. But the challenge isn't just documenting product knowledge, it's making that knowledge available while the model is doing the work. We've found three pieces that matter most: a product Glossary that defines the language, AI Context that captures the reasoning behind decisions, and an MCP server that delivers both to the model at the moment it needs them.

Product Glossary

The Glossary is where the product itself gets defined. Not just terms, but the roles, workflows, permissions, and features those terms actually point to. "Everything" stops being just a word and starts carrying the real product knowledge behind it: what the screen shows, who can see what, how it behaves depending on permissions.

It's the most straightforward of the three to build, because the information already exists somewhere. You can point Atono's Glossary builder at docs you already have and it works through them to surface the concepts that actually need defining, the relationships between them, and the synonyms, cases where different teams use different words for the same thing.

AI context

Most teams trying to ground AI in product knowledge default to giving it too much: every doc, every channel, every tool connected at once. The model ends up with a pile of context and no signal for what actually matters to the task in front of it. We went the other way.

Instead of connecting everything, we curate a specific few things that matter most, including the design decisions behind a piece of work, the technical investigation that led there, and the implementation details once it shipped, all attached directly to the story or bug it belongs to.

For product specifically, it's the design decisions and implementation details that carry the most weight: the why behind a choice, the alternatives that got ruled out, the thing someone already tried that didn't work, and what the team actually built once the decision was made. Most of that reasoning never makes it into acceptance criteria. The call your team made about refunds or account limits last spring is local knowledge. No model could infer it on its own, no matter how good the model is, because the answer was never written down anywhere it could find it. Now it's in AI Context instead.

MCP server

Even with the language defined and the decisions captured, the model still needs to reach both while it's actually working, not go find them itself, and not get handed everything at once and left to sort out what matters. That's the last piece. We expose both through an MCP server, so whatever tools your team already uses, Claude, Cursor, or anything else, can reach them directly. MCP is an open protocol, so this doesn't pull you onto a new surface. It meets your stack where it already is.

What we've seen so far

When we compared our own AI-generated stories to our Glossary, about 60% of them needed changes. That's a lot. As we added AI context underneath that, rework dropped toward 20% in our own testing.

This is preliminary, and it's based on our own product, not a controlled study. But it's what we expected. Give a model what your product means and the design decisions behind it, and the spec is more likely to match the product on the first pass. Fewer surprises mid-cycle, and the rewrites that used to eat a PM's afternoon mostly stop happening.

None of this is a new idea. It's the same scaffolding engineering built for itself, decades before any model needed it, just applied to the half of the work that never got one. The teams that pull ahead here won't be the ones running the smartest model. They'll be the ones who gave product the same thing engineering already had: something structured enough, and close enough at hand, for a model to actually use.

A couple of questions worth answering directly

Does this mean product teams need to write more documentation?

No. If anything, that's the wrong takeaway. The information we're talking about usually already exists, it's just scattered across PRDs, Slack threads, and people's heads, in a form a model can't reach mid-task. Adding more documents on top of that doesn't fix the reachability problem, it just adds more places for the same gap to hide. The fix is structuring what already exists and putting it where a model can actually find it.

Do we have to move off Jira or Linear to do this?

No. A tracker tells you what work exists and who owns it. It was never going to tell a model why the work exists or which decisions are already settled, and that's still true no matter which tracker you use. What we've built runs alongside whatever you're already on. You're adding a layer, not replacing one.

Troy

Troy is happy to be back at it.

After many years of leading SaaS companies, Troy took a year to work in the fight against hunger and homelessness. Time well spent. Now he is stoked to be helping shape the future of software development. We can do so much better!

Spare time is invested with family, friends, faith, and fur (two German Shepherds).

Solutions

Resources

Why AI writes better code than product specs

Why code is easier for AI

Why specs are harder

Why this is showing up now

The gap, in one place