Back to Blog
ProductEngineeringDesign

Building above and below a story

Troy

Troy

CEO

Mon May 11 2026

9 min read

Keep Your User Stories Alive Hero Img

The PM walks into sprint planning with an AI-drafted story.

It says the team is building a RAG bot engineering already shipped in Q4. The acceptance criteria reference a workflow deprecated six months ago. The "Everything" view is using the word as plain English — not as the name of the actual product screen the team built last year.

She didn't write a bad story. The AI did. And she didn't catch it because it read fine.

Call her Sally. She's a composite of every PM I've talked to in the last six months — competent, busy, leaning on AI to clear the backlog the way every PM in 2026 is leaning on AI. She's not lazy. She's not bad at her job. She's plugged a tool that doesn't know her product into a workflow that assumes the tool does.

This happens at scale, every day, in product orgs that have convinced themselves AI is helping. Sometimes engineering catches it. Sometimes QA does. Sometimes the customer does. And sometimes — most expensively — nobody does, and the team builds the wrong thing for two weeks before someone notices.

That's the morning. Let's talk about why it happens.

The story is the wrong unit of work

The story is what we track. It is not what we build from.

Every story in your backlog sits on top of an enormous foundation of assumed knowledge — what your product means, what your terms refer to, what has already shipped, what has been deprecated, who your customer is, which features compete with which, which business rules apply where. And every story flows down into an enormous implementation surface — design decisions, technical investigation, edge cases discovered mid-sprint, the actual code that ends up in production.

Tools track the story. The work happens above and below it.

Humans handle this naturally, at small scale. A founding PM and a founding engineer share so much context that the story can literally be a sentence. "Add the export thing we talked about." Done. Both of them know what "the export thing" is, why it matters, what it should do, and what it shouldn't.

Scale up and that compression breaks. A PM joining a 200-person product org is missing most of what that founding pair carried in their heads. They learn it the slow way — Slack threads, design reviews, retros, the lived experience of shipping things that turned out wrong.

AI tools have none of that. Every session starts cold. They don't know what was deprecated last quarter. They don't know your customer. They don't know that "Everything" is a screen and not a noun. They have your prompt and whatever you happen to paste in.

When I was running xMatters, this was a problem we solved by talking to each other — small team, shared context, frequent in-person calibration. AI didn't exist. Now the team is bigger, the stack is larger, AI is in everyone's workflow, and no amount of hallway conversation loads context into Claude or Cursor.

Two failure modes, blamed on each other

So two things are happening at once.

The first failure happens above the story. AI doesn't know your product. It doesn't know your terminology. It doesn't know what's shipped and what isn't. It has no structured product context to draw on, so it generates stories that drift — confident, plausible, and wrong in ways that take a careful reader to catch. The PM reads them and they sound right. They are not right. They look right.

The second failure happens below the story. Even when the story is correct, the implementation context evaporates the moment work begins. The reasoning a senior engineer applied in week one doesn't carry to the engineer picking up the next ticket in week four. The next AI session starts cold. Living Stories should carry the why from spec through production — instead they die at the first handoff. The team rediscovers its own thinking, sprint after sprint.

These are not the same problem. But they have the same cause: nothing in the stack is structuring the context that should flow into and out of the story. Jira tracks tickets. Linear tracks tickets. Cursor reads code. Claude Code reads code. None of them know your product, and none of them carry your product knowledge forward.

AI doesn't fail loudly here. It doesn't crash. It doesn't return a stack trace. It just hands you something that looks correct, and you ship it.

Wrong gets caught. Almost right gets shipped.

What we found when we measured it

Here's the part that changed how I think about this.

When we compared Atono's own AI-generated stories against the Glossary, 60% of them needed changes. Sixty percent. On our own product, in our own org, with our own AI tools, written by people who built the thing.

We caught features mislabeled as new that were actually enhancements — the kind of mistake that costs a sprint. Acceptance criteria for capabilities that had already shipped. Terminology that had drifted from how we actually use the words inside our company. We were building stories on top of a fuzzy understanding of our own product, and we did not know it until we built the tool that told us.

We are not unusual. The Context Gap Report we ran with Refactoring.fm — a survey of 350 engineering teams — found the same shape at scale. Fifty-two percent of teams have no shared AI context. Sixty-four percent store critical product knowledge in people's heads, not in any structured form. Only 9% are using AI for requirements work at all. Most teams are aiming AI at the place it has the most context (code) and pointing it away from the place it has the least (specs). That is exactly backwards.

If you're a 50-person product org and you believe your AI output is fine, I would gently suggest you have not yet measured it.

What changes when you build above and below the story

Fix it on both sides, and two things start to compound.

Above the story is where Velocity lives. When AI has structured product context — your terminology, your customer profile, your feature inventory, your business rules, the things your team takes for granted — the first draft of a story is actually usable. PMs stop spending mornings rewriting AI output to match the product. Engineers stop building to specs that don't capture intent. The cycle from idea to spec to acceptance criteria gets faster, and what comes out the other end matches the product you're actually building. Not the product the AI guessed you might be building.

Below the story is where Quality lives. When the design decisions, technical investigation, and implementation choices flow from the story into the work — and back out into the next story — the team builds on its own thinking instead of rediscovering it every sprint. AI sessions don't start cold. The next engineer doesn't have to reverse-engineer last week's reasoning from a Slack thread. The story carries the why from spec through production, and the next round of work starts from accumulated understanding instead of a blank prompt.

The chain looks like this. Product context above the story produces accurate stories. Accurate stories produce accurate implementation. Accurate implementation, captured below the story, produces context for the next story. Each loop runs better than the last. Velocity and Quality compound.

Most teams are running this chain broken at both ends and wondering why faster AI isn't making them faster. They've handed individual contributors AI tools that make those individuals dramatically faster — and the team's throughput hasn't moved, because the rework lives in the seams between stories, not inside them.

The fix isn't a better AI model. The model is fine. The fix is giving AI the product context your team already carries in its head, and capturing the implementation context that's currently leaking out of every sprint into Slack threads and individual memory. That layer sits alongside Jira or Linear and makes the context above and below the story readable — to AI, and to the next engineer who picks up the work. The ticketing tool was never going to do that. It wasn't built for it.

The honest part

I want to be straight about something.

We do not have this fully solved. We caught the 60% problem when we started comparing our AI-generated stories against the Glossary, and the result was uncomfortable. The most valuable tool we shipped this year is the one that told us 60% of what we'd written was wrong. We are still finding edges. We are still adding types of product context — personas, customer journeys, business rules — beyond the Glossary. Some of it works first try. Some of it doesn't.

But I am confident about the shape. The unit of AI input shouldn't be the story. It should be the story plus what's above it plus what's below it — repeatable, refreshable, reviewable, permissioned — and any tool that doesn't structure that context is leaving most of the value on the table. The story-as-input approach made sense in 2019. It doesn't anymore.

If you're seeing this

Right now the work above and below the story lives in heads, in Slack threads, in the morning a PM spends rewriting AI output to match the product. The cost is rework that doesn't show up on a dashboard.

If any of this matches what you're seeing — AI output that reads fine and isn't, stories that drift from the product, individual speed without team velocity — the Context Gap Report has the data underneath the pattern. 350 engineering teams. The shape of the problem at scale.

It's free. Ten minutes to read. If after that you want to talk about how the context layer actually works, happy to chat — troy@atono.io.

Atono logo
The Context Gap Report

What 350 engineering teams reveal about planning, knowledge, and AI

Stay up to date

Share

Stay up to date

Share