Post-Change Design Reflection: Making Every Task Leave the Codebase Cleaner
How a CMP practice turns the context an agent just acquired into design feedback—syncing docs, flagging duplication, and leaving ownership decisions for humans—so each change leaves the codebase easier to modify.
One of the most common worries about coding agents has little to do with whether they can write a correct function. It is longer-term and harder:
Will a coding agent slowly turn my codebase into a mess?
The worry is reasonable. An agent can finish the task in front of it quickly. But real projects are never written once — they are modified round after round. When every change means only “the feature runs, the tests pass, the task is done,” while a boundary quietly gets bypassed, a doc drifts out of sync, and duplicate logic scatters across the tree, the codebase turns into its own kind of AI slop.
The damage stays hidden at first. Each step is just a small shortcut, a temporary helper, an OpenAPI spec that never got synced, a doc still describing the old architecture, a business rule parked under the wrong owner. Any one of them looks harmless. A few dozen of them later, the codebase has become steadily harder to change.
I recently finished a series on the Context Minimization Principle (CMP), which proposes a practice for exactly this: Post-Change Design Reflection — have the agent run a short design reflection the moment a task is complete, while the entire execution path still sits in its context window.
What is Post-Change Design Reflection?
A normal task summary reports which files changed, what shipped, which tests ran, and whether they passed. That is useful, and it discards something valuable. Post-Change Design Reflection asks a different class of question:
What context was hard to find during this change? Did this change leave behind a spot that the next change will forget? Did it bypass an existing boundary? Did it create an ownership or placement inconsistency? Will it mislead future work through a stale doc, a wrong comment, or an old schema?
The focus shifts from “is the task finished?” to a sharper question: after this change, is the codebase still easy to change correctly?
That is the heart of CMP. The core of software development is modifying existing systems, and the hard part of a correct modification is acquiring enough context; the final few lines of code are the easy part. A maintainable codebase is simply one where that context is cheap to acquire.
Working from that idea, I wrote a skill around it. The full text lives on GitHub: post-task-reflection/SKILL.md. It has the agent check three kinds of problems.
Three checks
Missing Path. Did the agent work hard to find context that should have been easy to reach? Or, among the several places this change had to keep consistent, will a future change miss one of them for lack of a clue?
Unauthorized Shortcut. To finish the task, did the agent route around a boundary that was supposed to hold? Business logic dropped into a controller, a use case reaching straight for an external SDK, adapter details leaking into an inner layer, a block of existing logic copied instead of called through its proper boundary. The tests can still pass while a hole gets dug in the architecture.
Unowned Capability. Did the task introduce a new capability with no clear owner — a new formatter, renderer, mapper, policy, validator, or converter? When you cannot say where a function belongs as you write it, then the next time the same logic is needed, you will not know where to look to see whether it already exists. That is how duplication is born.
One rule here matters as much as the checks. When a fix would touch a design decision — inventing a new abstraction, moving a rule to a different owner, redrawing a boundary, creating a shared helper — the agent must not make that call on its own. It flags the problem and stops: this rule may need a clearer owner, this capability’s current spot may be only a temporary fit, this boundary may be under pressure. Settling any of them is the human’s call.
That is the boundary of post-change reflection. The agent stays out of the architect’s chair. After each change it lays out the context path it just walked, fixes the small problems that already have a clear design purpose, and raises the genuine design decisions — ownership, boundaries, new concepts — for a person to settle.
The first task already showed a change
The first real task came from an internal project, CertReporter. CertReporter is a structured collaboration system for large certification and testing reports. These reports run dozens or hundreds of pages as formal certification documents. Content is filled in, reviewed, and tracked clause by clause against a standard, then rendered into a strictly formatted Word report. The problem it solves is structural: break the report into sections, clauses, and work items so many people can fill, review, and track in parallel, then let the system render the final document.
This task was a structural change. In the old model, Standard owned both the clause schema and the clause render template, and one ReportType linked to exactly one Standard. That model was inaccurate. Standard should describe the data structure of a standard’s clauses. The render template should belong to a specific section under a ReportType. And different sections under one ReportType should be free to link to different standards.
Old model:
ReportType ──points to one──► Standard
├─ owns ─► Clause schema
└─ owns ─► Rendering template
New model:
ReportType ──► Section A ReportType ──► Section B
├─ owns ─► Template A ├─ owns ─► Template B
└─ chooses ─► Standard 1 └─ chooses ─► Standard 2
└─ owns ─► Schema 1 └─ owns ─► Schema 2
This is a change in ownership and in the data model. An agent that cared only about shipping the feature might delete a few fields, fix the API, get the tests green, and stop. This time Codex did more.
Why did it change the docs too?
The task touched 46 files (6 of them in the domain layer), and it updated the relevant descriptions across four markdown docs — a behavior I had not seen before. So I asked why.
The answer was the interesting part. It treated the docs as part of the architecture — routing signals for future development. Whether the next contributor is a human or another agent, they will read the system’s structure through its docs. When the ownership described in a doc is wrong, future changes get routed to the wrong place. A wrong doc carries real cost: wrong docs produce wrong routing.
I asked directly whether this came from the new skill. It did. The skill never told the agent to “always update docs.” It made the agent ask one more layer after finishing: is the context routing this change left behind still correct? Where it was not, fix it.
It also held back from refactoring
Another detail mattered just as much. In the reflection, Codex noticed that the frontend had multiple helpers, in several places, that find the Standard from a section or work item. If multi-standard reports keep growing, that logic might eventually deserve a shared selector. It recorded the point as a future convergence item and left the code alone.
This restraint is the whole point. Some changes are implementation details safe to hand to the agent. Others touch how the business is abstracted and where module boundaries sit — things a human needs to know and confirm, and that the agent should never change quietly. A mature agent asks first: does this duplication already amount to a real ownership problem, or is it just a signal that may need converging later?
That is the change I wanted from post-change reflection: a more restrained agent. It recognizes design problems while leaving new owned concepts for a human to name. It spots duplication risk while resisting the urge to abstract on sight. It reports placement problems while keeping human design decisions out in the open instead of buried in implementation details.
Agents can keep a codebase cleaner
AI coding agents already write code well. Writing fast is only the first stage. The value that lasts is a codebase that grows clearer as it changes.
That asks the agent to understand the design state after a change, not just to generate code. It needs to know which artifacts belong to the same modification closure. It needs to treat docs, tests, OpenAPI specs, and mocks as routing signals for future work. It needs to know which design decisions are not its to make. And it needs to know when to hold still rather than refactor.
This experiment sent a strong signal: under an explicit design-reflection mechanism, an agent can hold architectural consistency more steadily than many human developers. It manages this for a plain reason — after every change it reliably does the closing-up work that people so often find too tedious to bother with.
People cut corners. The docs can wait. The mock does not touch the main flow. The seed data is fine where it is. Arguing with a teammate over this helper’s style is not worth it, so writing my own again won’t hurt.
But an agent told to close the modification closure could do that same tedious work every time, and this is AI-assisted continuous refactoring. Every development task also completes one more step of closing up the design — boundaries reconfirmed, docs synced to the latest facts, duplication flagged, ownership brought into the open. Each change adds to the system and recalibrates its design at the same time.
Pushed to its limit, the practice answers the biggest fear about agentic coding. A codebase maintained this way can stay cleaner, more consistent, and more ready for the next change than most codebases kept by human teams.
Writing code fast is the starting line of AI programming. Making every task leave the codebase cleaner is where AI begins to rewrite software engineering.