Codex Powers Self-Improving Tax Agents
Alps Wang
May 28, 2026 · 1 views
The Self-Improving Agent Loop
The article presents a compelling case study on building self-improving AI agents by integrating practitioner feedback into an automated iteration loop powered by Codex. The "three-pillar" approach – close practitioner engagement, production trace evidence, and a Codex-driven iteration loop – is a strong framework. The detailed rental property example effectively illustrates how human corrections are translated into structured signals for AI refinement, leading to measurable accuracy improvements and efficiency gains. The emphasis on "production traces" as the bridge between manual correction and automated learning is particularly insightful, as it addresses the common challenge of making real-world failures actionable for AI systems. The system's ability to handle increasingly complex tax filings over time, driven by this feedback mechanism, highlights its potential for continuous evolution.
However, a key limitation lies in the "bounded layer" of the product where this automation is applied. While the article states engineers remain responsible for architecture and product decisions, the article doesn't fully elaborate on the complexity of these higher-level decisions or how the self-improving loop interacts with them. Furthermore, the reliance on "structured signals" derived from practitioner feedback assumes that the feedback itself is consistently accurate and comprehensive. Ambiguous or subjective practitioner judgments, which are common in tax law, might still pose challenges to robust automation. The article also mentions "tax judgment" as a potential reason for divergence, implying that some aspects might remain outside the scope of automated self-improvement without significant human oversight. Finally, while the success in tax preparation is promising, scaling this specific "bounded" self-improvement loop to entirely new and vastly different domains might require significant re-engineering of the "eval infrastructure" and "practitioner expertise" capture mechanisms.
Key Points
- Tax AI, developed by Thrive Holdings and OpenAI, leverages Codex to create self-improving tax preparation agents.
- The system moves beyond manual error correction by using production use cases to generate structured signals for autonomous improvement.
- Key to this is a three-pillar approach: expert practitioner feedback, production traces (input-to-output history), and a Codex-driven iteration loop with tailored evaluations.
- Measurable improvements include significant increases in accuracy (e.g., reaching 75% correct field completion from 25% to 86% within six weeks) and throughput.
- The process involves capturing practitioner corrections, transforming them into actionable "findings" and "eval targets" for Codex.
- Codex investigates issues by inspecting source packages, extraction schemas, mapper behavior, and code paths, proposing and validating fixes against targeted and regression evals.
- This loop is applied to a "bounded layer" of the product (extraction and mapping), with engineers retaining responsibility for architecture and product decisions.

Related Articles
Comments (0)
No comments yet. Be the first to comment!
