The Kaizen of Context

March 24, 2026

In 2019, I was responsible for creating the prototype of the Google Play Store’s visual refresh, bringing it in line with modern Material Design. I built a beautiful, high-fidelity prototype using the latest and greatest Android development tools of the day. It was fluid, it was polished, and it sailed through several successful executive reviews.

But when I finally sat down with the engineering team to walk them through what I had made, the Tech Lead looked at the prototype and flatly said, “This makes us look bad.” They explained that implementing those beautiful page transitions would be a Herculean lift because the production code architecture had to support a massive, long tail of antiquated Android versions, something my pristine, disconnected prototype had completely ignored.

That was a hard lesson in the dangers of prototyping in a vacuum. Fast forward to the agentic era of 2026, and the paradigm has shifted entirely. Thanks to autonomous agents, prototyping in production isn’t just possible; it is the default. With Generative UI and coding agents, we don’t have to guess if a UI component will survive the backend architecture, because the agent is wiring it up to the real backend in real-time.

But this incredible power introduces a fascinating new dynamic. Speed without structural boundaries can quickly become chaotic. By adding the right constraints, however, that speed unlocks true agility.

The Shift from Generation to Verification

To understand the scale of this opportunity, we have to look at how software engineering is fundamentally changing. As engineering leader Addy Osmani recently outlined, we are no longer just writing code. We are building the factory that builds our software.

In this new factory model, you aren’t hand-holding a single agent through a single task. You are orchestrating fleets of agents. You spin up many agents in parallel: one handles a backend refactor, another implements a UI feature, and another updates the documentation.

Because of this, generation is no longer the bottleneck. Verification is.

Agents can produce impressive, functional code at blinding speeds. But confirming whether that output is structurally correct, secure, and perfectly aligned with your architectural intent is a distinctly human challenge. When you oversee dozens of agents running in parallel, clear requirements become your highest point of leverage, ensuring that the swarm’s velocity compounds into massive value rather than technical debt.

If we are transitioning from “Operators” who manually write code to “Architects of Intent” who evaluate it, our rigor simply moves upstream. We have the opportunity to embrace what I call the Kaizen of Context.

Derived from the Japanese business philosophy of Kaizen (continuous improvement), this is the practice of constantly refining our workflows, identifying friction, and eliminating waste. In the Middle Loop, Context Engineering is our Kaizen. It is the systematic design, structuring, and optimization of the information we feed to our models. And one of the most powerful mechanisms the Intent Architect has to enforce this continuous improvement is Test-Driven Design (TDD).

The Trap of Post-Implementation Testing

In the human-driven workflow, writing tests after the implementation was a manageable, if imperfect, practice. In an agentic workflow, skipping test-first development misses a critical opportunity to guide the agent.

Autonomous agents optimize for the stated objective. If you ask an agent to build a feature and write the tests for it simultaneously, the agent will naturally find ways to pass the tests. It will grade its own homework. If the tests are written after the implementation, as noted by senior leaders at a recent Thoughtworks engineering retreat, they are highly likely to test what the implementation happens to do, rather than what it should do.

TDD as the Ultimate Prompt

In the Middle Loop, TDD is no longer just a quality assurance practice; it is a vital form of prompt engineering.

By establishing a strict Red/Green TDD workflow, you create clear, unbreakable boundaries for the machine. You write the tests first. You confirm they fail (the Red phase). Then, and only then, do you unleash the agent to iterate on the implementation until the tests pass (the Green phase).

This sequence provides deterministic validation for non-deterministic generation. The test suite becomes your automated Approval Interface. It tells the agent exactly what success looks like—whether that involves specific API integration boundaries or exact accessibility requirements—and encourages it to relentlessly self-correct and iterate until your standard is met.

Diamond Prompting

This continuous refinement of context perfectly mirrors the famous double-diamond model of the UX design process, where we alternate between diverging to explore ideas and converging to refine them. UX pioneer Jakob Nielsen recently adapted this concept for the AI era, calling it Diamond Prompting.

As Intent Architects, we can alternate between two distinct prompting styles:

Exploratory Prompting: We start with broad, zero-shot prompts to benefit from the AI’s inherent ideation capabilities. We ask it to generate twenty different layout variants or structural approaches, broadening our thinking about the problem space.
Detail-Refining Prompting: Once we select the right path, we converge. We switch to highly specific, few-shot prompts. This is where we feed the agent our strict TDD constraints, our programmable design rules, and our specific failure modes.

By treating our test suites and constraints as our primary design tools, we shift from anxiously managing code to confidently directing outcomes. We aren’t abandoning quality; we are automating its enforcement. As an Intent Architect, you don’t need to read every single line of code the agent wrote, because you designed the framework that guides the code to success.