One Month with OpenAI Codex 5.2 — And How Opus 4.6 Cleaned It Up Afterward

April 13, 2026 – 11 min read

I'm someone who tests new tools seriously. Not for an afternoon, but genuinely — with real projects, real code, real pressure.

When OpenAI released Codex 5.2 in December 2025, it was clear: this deserves a proper test. So I did it.

What came out of it was more interesting than I expected — and the biggest insight came not from the experiment itself, but from what happened afterward.

Codex 5.2: Where It Excels

I'll start where Codex 5.2 genuinely shines — because it does.

Delivery speed. When I describe a clear feature, Codex delivers fast. Not always perfect, but fast. For a first draft, for a proof of concept, for turning an idea into code — Codex is agile and direct.

Large repositories. Working in large codebases, navigating across module boundaries — Codex has real strengths there. Context compaction works well for long sessions in large projects, and the results are solid.

Refactoring and migrations. For larger code changes, restructuring, porting old code — Codex shows what it can do. Especially for large, clearly defined restructurings, it delivers reliably.

Visual processing. Screenshots, UI diagrams, technical drawings — Codex handles visual inputs well and can generate working code from them. That genuinely saved me time when translating design mockups into code.

That's not nothing. And in the first weeks, I was impressed by the speed.

What Showed Up Over Time

Then came what always comes when you work deeply with a tool: the patterns become visible.

Complexity as the default solution. Codex tends to build more complex solutions than necessary. Not wrong, not broken — but more elaborate than it needs to be. Abstraction layers nobody asked for. Helper methods used exactly once. Configuration options with no job to do.

Dependencies that add up. Over weeks, dependencies accumulated — some unnecessary, some in tension with each other. Harmless individually, a growing problem together.

Intent and description. Codex is good at building technically correct solutions. But sometimes it didn't quite catch my intent — the code did what I described, not what I meant. That's a subtle but important difference that compounds over time.

Long autonomous sessions. In truly long, autonomous sessions, Codex sometimes struggled to maintain the thread. It would stop, ask questions, or deliver incomplete results that I then had to finish.

None of this is catastrophic. The code ran. But it owed me something.

Opus 4.6 Enters the Picture

After about a month with Codex 5.2 — not a continuous month, but spread out over weeks, repeatedly — it was clear that a different perspective was needed.

I started reviewing individual modules with Opus 4.6. Not with the goal of tearing everything down, but of understanding what was there — and what could be better.

What followed was a gradual, iterative process:

**Audit:** Opus 4.6 analyzed the existing code, identified patterns, mapped dependencies

**Prioritization:** We decided together what actually needed to be touched

**Refactoring in waves:** Module by module, with clear focus on simplification

**Dependency cleanup:** Unnecessary packages removed, conflicts resolved

**Tests added:** Where tests were missing, they were written

This wasn't quick. It was work over several weeks — but it was worth it.

What the Process Revealed

Here's the interesting part: I don't think the Codex experiment was a mistake.

The code Codex produced worked. The project moved forward. And contact with a different model showed me things about my own style I wouldn't have seen otherwise.

The real insight came during the cleanup with Opus 4.6.

Code quality is not an immediate property. It shows itself over time. A feature that works today can become the source of problems three months from now — not because it's broken, but because it's too complex, too tightly coupled, too hard to change.

Opus 4.6 thinks in systems. What stood out during cleanup: Opus doesn't just see the code in front of it, but the system it describes. It doesn't just ask "Does this work?" but "Is this the right thing?"

The ability to clean up is just as important as the ability to build. That sounds trivial, but it isn't. Not every tool — not every developer — is good at both. Knowing which tool you need and when is the actual skill.

What I Would Do Differently

If I were starting the experiment today, I'd start it with clear checkpoints.

Not "try Codex 5.2 for a month" — but "try Codex 5.2 for specific tasks, and review code quality weekly."

The difference: I would have seen earlier where technical debt was building up — and would have addressed it continuously, rather than cleaning it up at the end.

That's a lesson that applies regardless of which AI tool you're using.

Different Tools, Different Strengths

What remains after this experiment?

Codex 5.2 is a tool worth taking seriously. For fast delivery, for large refactors, for prototypes — it has its place. Those who know when to deploy it can be productive with it.

Opus 4.6 is my tool for everything that needs to live beyond the next few weeks and months. Where code doesn't just need to work, but also needs to be readable, maintainable, and extensible.

Those aren't just two different models. They're two different philosophies of writing code — and knowing when you need which is the actual skill.

Conclusion: The Experiment Was Worth It

I would run the experiment again. Because with every test I learn — about the tools, about my own style, and about what I really expect from a coding partner.

And what I know now: for my way of working, for projects with long-term ambition, for code that should still look good a year from now — Claude Opus 4.6 is the right choice.

Codex 5.2 confirmed that for me, in its own way.

Patrik Germann

Solo Developer, AIpuna App