Why traditional Agile breaks with agents

A developer asks an agent to implement a feature. Working code ships the same afternoon. Six weeks later, nobody on the team can explain why it works the way it does — including the developer who shipped it. The agent did exactly what it was asked. The process had no point where understanding was required before moving on.

That's not a tool problem. It's a process one. And it compounds.

Three things break

Comprehension doesn't keep up with output

One feature shipped without full understanding is manageable. The pattern becomes a problem — the codebase grows, the number of people who can confidently touch any given part of it shrinks, and the cost of change rises while the metrics say everything is going well.

Velocity went up. Comprehension didn't. Nothing in the process caught that.

This is the core accountability gap. Agentic Agile closes it with a structured Review at the end of every unit of work — questions the agent asks the developer about what was just built. Not optional. Not a quiz. A checkpoint that can't be skipped.

It doesn't appear in velocity charts. It appears in post-mortems, in estimates that keep coming in higher than expected, and in the conversations about why a "simple" change took three weeks.

The process is running on a broken assumption

Standard Agile was built around one thing: the person who writes the code understands the code. The ceremonies — planning, review, retrospective — all assume that. They were designed to coordinate humans who had that understanding.

When an agent writes the code, that assumption is gone. The ceremonies still run. The tickets still move. But the accountability the process was designed to produce isn't there — because nobody changed what "done" means when AI is doing the execution.

Where each ceremony breaks

Every Agile ceremony was designed around one assumption: the people building the thing understand it. AI breaks that assumption silently. The ceremonies still run, the tickets still move — but the accountability they were designed to produce is no longer guaranteed.

Here's where it shows up in practice.

Sprint planning

Story points measured effort. With an agent, effort collapses — a three-point story takes twenty minutes. Planning poker becomes a conversation about comprehension, not build time, but nobody has changed the format to reflect that. Teams over-commit because the work looks small. The sprint fills with phases that need careful human oversight at every step, and nobody planned for that oversight.

Backlog refinement

Tickets written for humans don't work for agents. "Add pagination" is fine for a developer who infers defaults and conventions. An agent will make different assumptions each time. Refinement still produces a Jira ticket — not an agent-ready phase definition. The gap between the two is nobody's job to close.

Daily standup

"What did you do?" has no good answer when the agent did most of it. Blockers have changed character — context window limits, scope drift, checkpoint gaps — but the standup format has no vocabulary for them. If the agent ran overnight and completed three phases, the morning standup is describing history, not coordinating work.

Sprint review

The demo shows working software. What it doesn't show is whether the developer understands what was built. An agent can produce correct code that nobody on the team can explain. The review passes. The knowledge gap doesn't surface until something breaks six weeks later. Velocity numbers spike, stakeholders raise expectations, and the team is now committed to a pace that depends on AI working perfectly — which it won't.

Sprint retrospective

Retrospectives are human-driven and remain unchanged. The team reflects on what happened, what worked, and what to improve — none of that changes with AI doing the execution. The retro is one ceremony Agentic Agile leaves alone.

Code review

Traditional code review looks for errors in the diff — logic mistakes, edge cases the author missed. AI-generated code rarely has those. The real failure modes are spec divergence and comprehension gaps, neither of which is visible in a diff. Agentic Agile addresses those directly: the spec is set before Build, and the Review verifies understanding before the phase closes. Line-by-line review of AI output is looking in the wrong place.

Definition of done

The DoD was written for humans. "Code reviewed," "tests written," "deployed to staging" — none of these criteria address AI-specific failure modes. There's no criterion for "the developer can explain what was built and why." That was assumed to be automatic when humans wrote the code. It isn't when an agent did.

Ceremony	What it was solving for	What breaks with AI
Sprint planning	Estimating human effort	Effort collapses; comprehension cost is invisible
Backlog refinement	Tickets clear enough for a developer	Tickets not precise enough for an agent
Daily standup	Coordinating human work in progress	No vocabulary for AI-specific blockers
Sprint review	Demonstrating working software	Working software ≠ understood software
Retrospective	Learning from what happened	Unchanged — human-driven, no Agentic Agile modifications
Code review	Catching errors before merge	AI code is rarely wrong line-by-line; the real risks are spec divergence and context gaps — addressed by the spec and Review before merge
Definition of done	Shared quality bar	No criterion for ownership or understanding

Patching doesn't work

The instinct is to bolt AI onto the existing process. Let developers use the tools during sprints. Add an AI review step to the PR checklist. Keep everything else the same.

This fails because the existing process was built around a constraint that no longer exists — humans as the execution bottleneck. Every ceremony assumes that. Remove the constraint and the ceremonies are solving for the wrong problem.

The bottleneck now is understanding and scope discipline: ensuring the work was built to spec, and that the people who shipped it can explain what they built. No standard Agile ceremony is designed to enforce either of those things.

What you're actually doing

loading diagram…

The phase loop — spec defines it, build executes it, review closes it

You decompose a ticket into phases — each one small enough that you can specify the outcome before the agent starts and verify it afterwards. That's the entire mechanism. The agent builds within those boundaries. Then it stops and leads a Review: questions about what was just built, surfacing anything you can't explain. Gaps get logged. Work doesn't block.

This works regardless of what's underneath. A 200,000-line legacy codebase doesn't change what a phase is — it changes how carefully you scope one. The checkpoints exist precisely because the agent can confidently produce something plausible that breaks things you didn't know were coupled.

The spec entry

The team has an objective: users need to be able to log in. That becomes a ticket — PROJ-42, broken into three phases. Phase 1 gets a spec entry before any Build begins: scope narrowed to one testable outcome, Product Owner approved.

Phase 1 gets a spec entry before Build begins. The scope is narrowed from the ticket's full intent to one testable outcome. The Product Owner approves it. The agent doesn't start until this exists.

Ticket→Spec entry

Projects / PROJ/FEAT-10 — User Account Security/PROJ-42

Add user authentication

Users need to be able to log in. Implement the full authentication flow.

Acceptance criteria

✓POST /auth/login returns a signed JWT on valid credentials
✓Returns 401 on invalid credentials
✓Token payload contains user ID only — no email or sensitive fields

SPEC.md

## Phase 1: POST /auth/login endpoint

Ticket: PROJ-42

Status: Spec

Criterion:

POST /auth/login returns a signed JWT on valid credentials and 401 on invalid.

Constraints:

- Token payload: user ID only — no email or sensitive fields

- JWT_SECRET from environment — do not hardcode

- Route handler lives in src/routes/auth.ts

The command

The Developer writes the Command — the file that tells the agent its role, its scope, and where to stop. If the Command is hard to write, the spec entry needs more work.

.agent/commands/build-phase.md

✦build-phase.md

Build Command — PROJ-42 Phase 1

You are a developer on this team. Your job is to build Phase 1 of PROJ-42.

Phase: POST /auth/login endpoint

Spec: SPEC.md § Phase 1

Ticket: PROJ-42

Criterion:

POST /auth/login returns a signed JWT on valid credentials and 401 on invalid.

Constraints:

–Token payload: user ID only — no email or sensitive fields

–JWT_SECRET from environment — do not hardcode

–Route handler lives in src/routes/auth.ts

When the criterion is met:

1. Update SPEC.md Phase 1 status to 'In Review'

2. Stop. Do not begin Phase 2.

3. Lead the Review.

The session

The Command runs. The agent builds within the defined scope, then leads the Review. Watch the spec file and ticket update as the phase progresses.

PROJ-42: Add user authentication – PROJ✕

proj.internal/browse/PROJ-42

Projects/PROJ/FEAT-10 — User Account Security/PROJ-42

Add user authentication

In ProgressSprint 3Assignee: @developer

Description

Users need to be able to log in. Implement the full authentication flow.

Acceptance criteria

—POST /auth/login returns a signed JWT on valid credentials; 401 on invalid
—POST /auth/refresh returns a new token given a valid, non-expired token
—POST /auth/logout invalidates the token server-side

step through the session

auth-service

1## PROJ-42: Add user authentication
2
3Criterion: POST /auth/login returns a signed JWT
4           on valid credentials; 401 on invalid.
5Status:    In Progress

PROJ-42 — Build & Review

The rest of this guide walks through each part in detail. But the shape is always the same: a ticket becomes a spec entry, a spec entry becomes a Command, a Command runs a Build, and a Build closes with a Review.