A developer asks an agent to implement a feature. Working code ships the same afternoon. Six weeks later, nobody on the team can explain why it works the way it does — including the developer who shipped it. The agent did exactly what it was asked. The process had no point where understanding was required before moving on.
That's not a tool problem. It's a process one. And it compounds.
Three things break
Comprehension doesn't keep up with output
One feature shipped without full understanding is manageable. The pattern becomes a problem — the codebase grows, the number of people who can confidently touch any given part of it shrinks, and the cost of change rises while the metrics say everything is going well.
Velocity went up. Comprehension didn't. Nothing in the process caught that.
This is the core accountability gap. Agentic Agile closes it with a structured Review at the end of every unit of work — questions the agent asks the developer about what was just built. Not optional. Not a quiz. A checkpoint that can't be skipped.
It doesn't appear in velocity charts. It appears in post-mortems, in estimates that keep coming in higher than expected, and in the conversations about why a "simple" change took three weeks.
The process is running on a broken assumption
Standard Agile was built around one thing: the person who writes the code understands the code. The ceremonies — planning, review, retrospective — all assume that. They were designed to coordinate humans who had that understanding.
When an agent writes the code, that assumption is gone. The ceremonies still run. The tickets still move. But the accountability the process was designed to produce isn't there — because nobody changed what "done" means when AI is doing the execution.
Where each ceremony breaks
Every Agile ceremony was designed around one assumption: the people building the thing understand it. AI breaks that assumption silently. The ceremonies still run, the tickets still move — but the accountability they were designed to produce is no longer guaranteed.
Here's where it shows up in practice.
Sprint planning
Story points measured effort. With an agent, effort collapses — a three-point story takes twenty minutes. Planning poker becomes a conversation about comprehension, not build time, but nobody has changed the format to reflect that. Teams over-commit because the work looks small. The sprint fills with phases that need careful human oversight at every step, and nobody planned for that oversight.
Backlog refinement
Tickets written for humans don't work for agents. "Add pagination" is fine for a developer who infers defaults and conventions. An agent will make different assumptions each time. Refinement still produces a Jira ticket — not an agent-ready phase definition. The gap between the two is nobody's job to close.
Daily standup
"What did you do?" has no good answer when the agent did most of it. Blockers have changed character — context window limits, scope drift, checkpoint gaps — but the standup format has no vocabulary for them. If the agent ran overnight and completed three phases, the morning standup is describing history, not coordinating work.
Sprint review
The demo shows working software. What it doesn't show is whether the developer understands what was built. An agent can produce correct code that nobody on the team can explain. The review passes. The knowledge gap doesn't surface until something breaks six weeks later. Velocity numbers spike, stakeholders raise expectations, and the team is now committed to a pace that depends on AI working perfectly — which it won't.
Sprint retrospective
Retrospectives are human-driven and remain unchanged. The team reflects on what happened, what worked, and what to improve — none of that changes with AI doing the execution. The retro is one ceremony Agentic Agile leaves alone.
Code review
Traditional code review looks for errors in the diff — logic mistakes, edge cases the author missed. AI-generated code rarely has those. The real failure modes are spec divergence and comprehension gaps, neither of which is visible in a diff. Agentic Agile addresses those directly: the spec is set before Build, and the Review verifies understanding before the phase closes. Line-by-line review of AI output is looking in the wrong place.
Definition of done
The DoD was written for humans. "Code reviewed," "tests written," "deployed to staging" — none of these criteria address AI-specific failure modes. There's no criterion for "the developer can explain what was built and why." That was assumed to be automatic when humans wrote the code. It isn't when an agent did.
| Ceremony | What it was solving for | What breaks with AI |
|---|---|---|
| Sprint planning | Estimating human effort | Effort collapses; comprehension cost is invisible |
| Backlog refinement | Tickets clear enough for a developer | Tickets not precise enough for an agent |
| Daily standup | Coordinating human work in progress | No vocabulary for AI-specific blockers |
| Sprint review | Demonstrating working software | Working software ≠ understood software |
| Retrospective | Learning from what happened | Unchanged — human-driven, no Agentic Agile modifications |
| Code review | Catching errors before merge | AI code is rarely wrong line-by-line; the real risks are spec divergence and context gaps — addressed by the spec and Review before merge |
| Definition of done | Shared quality bar | No criterion for ownership or understanding |
Patching doesn't work
The instinct is to bolt AI onto the existing process. Let developers use the tools during sprints. Add an AI review step to the PR checklist. Keep everything else the same.
This fails because the existing process was built around a constraint that no longer exists — humans as the execution bottleneck. Every ceremony assumes that. Remove the constraint and the ceremonies are solving for the wrong problem.
The bottleneck now is understanding and scope discipline: ensuring the work was built to spec, and that the people who shipped it can explain what they built. No standard Agile ceremony is designed to enforce either of those things.
What you're actually doing
You decompose a ticket into phases — each one small enough that you can specify the outcome before the agent starts and verify it afterwards. That's the entire mechanism. The agent builds within those boundaries. Then it stops and leads a Review: questions about what was just built, surfacing anything you can't explain. Gaps get logged. Work doesn't block.
This works regardless of what's underneath. A 200,000-line legacy codebase doesn't change what a phase is — it changes how carefully you scope one. The checkpoints exist precisely because the agent can confidently produce something plausible that breaks things you didn't know were coupled.
The spec entry
The team has an objective: users need to be able to log in. That becomes a ticket — PROJ-42, broken into three phases. Phase 1 gets a spec entry before any Build begins: scope narrowed to one testable outcome, Product Owner approved.
Phase 1 gets a spec entry before Build begins. The scope is narrowed from the ticket's full intent to one testable outcome. The Product Owner approves it. The agent doesn't start until this exists.
Add user authentication
Users need to be able to log in. Implement the full authentication flow.
Acceptance criteria
- ✓POST /auth/login returns a signed JWT on valid credentials
- ✓Returns 401 on invalid credentials
- ✓Token payload contains user ID only — no email or sensitive fields
SPEC.md
## Phase 1: POST /auth/login endpoint
Ticket: PROJ-42
Status: Spec
Criterion:
POST /auth/login returns a signed JWT on valid credentials and 401 on invalid.
Constraints:
- Token payload: user ID only — no email or sensitive fields
- JWT_SECRET from environment — do not hardcode
- Route handler lives in src/routes/auth.ts
The command
The Developer writes the Command — the file that tells the agent its role, its scope, and where to stop. If the Command is hard to write, the spec entry needs more work.
Build Command — PROJ-42 Phase 1
You are a developer on this team. Your job is to build Phase 1 of PROJ-42.
Phase: POST /auth/login endpoint
Spec: SPEC.md § Phase 1
Ticket: PROJ-42
Criterion:
POST /auth/login returns a signed JWT on valid credentials and 401 on invalid.
Constraints:
–Token payload: user ID only — no email or sensitive fields
–JWT_SECRET from environment — do not hardcode
–Route handler lives in src/routes/auth.ts
When the criterion is met:
1. Update SPEC.md Phase 1 status to 'In Review'
2. Stop. Do not begin Phase 2.
3. Lead the Review.
The session
The Command runs. The agent builds within the defined scope, then leads the Review. Watch the spec file and ticket update as the phase progresses.
Add user authentication
Description
Users need to be able to log in. Implement the full authentication flow.
Acceptance criteria
- —POST /auth/login returns a signed JWT on valid credentials; 401 on invalid
- —POST /auth/refresh returns a new token given a valid, non-expired token
- —POST /auth/logout invalidates the token server-side
The rest of this guide walks through each part in detail. But the shape is always the same: a ticket becomes a spec entry, a spec entry becomes a Command, a Command runs a Build, and a Build closes with a Review.