Vibe Check: Opus 4.6—The Best Coding Model We’ve Tested (With Some Maddening Habits)

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Anthropic’s latest model, Opus 4.6, is the best AI coder we’ve tested. It also sometimes makes maddening errors.

It solved a real iOS coding task that stumped both the GPT-5.3 Codex and Opus 4.5. It is more thorough, explores context more carefully, and is smarter than Opus 4.5. But this power has trade-offs. It’s slower and a bit more verbose, and it still falls prey to classic Claudisms: It sometimes makes changes you didn’t expect, and doesn’t always know its own skills and capabilities.

When it comes to coding, it feels like Anthropic looked at Codex’s strong-points—thorough, precise work on tough tasks—and tried to incorporate them into this release.

On writing and editing, the drafting experience is more fluid than Opus 4.5. It applies editorial rules more consistently and translates technical concepts into accessible prose more naturally. But in a blind test, the team preferred Opus 4.5’s prose—Opus 4.6 seems more prone to AI-isms like “X not Y” constructions than its predecessor.

What Anthropic told us

Best-in-class for coding and professional work: Built to power agents that handle whole categories of real-world work, excelling across the entire software development lifecycle.
Its most agentic model yet: It drives tasks forward with less handholding—parallelizing work, gathering more context, and taking smarter autonomous actions.
“Adaptive Thinking” replaces “Extended Thinking”: The model adjusts how much it thinks based on the difficulty of your question. On easy tasks, it skips the deep reasoning step entirely. This is on by default everywhere.

The Reach Test

Reach Test legend

🥇: Paradigm shift

🟩 : Psyched about this release

🟨 : It’s okay, but I wouldn’t use it every day

🟥 : Trash release

Dan Shipper—the multi-threaded CEO

Verdict: 🟢
Quote: I shipped a merged pull request on a codebase I’ve never touched—it researched an unsolved iOS problem and wrote a working fix that left me stunned. I also like its default parallelization for knowledge-work tasks. It raises the ceiling for what’s possible with AI coding. But I also found it to be sometimes unreliable, and to require closer management than Codex does.

Kieran Klaassen—the Rails-pilled master of Claude Code

Verdict: 🟢
Quote: It’s better at understanding code bases and doing more work longer than Opus 4.5. For vibe coders starting fresh, it might not be a super big jump, but it’s a really nice step up for day-to-day coders with bigger projects and a great refinement from what was missing. The model’s medium thinking option, a setting to make the model think less, is a good, faster alternative.

Naveen Naidu—graduate of IIT Bombay (the MIT of India 💅)

Verdict: 🟢
Quote: On the morning I got access to the new model, I canceled my Anthropic subscription because I wasn’t using it anymore. That afternoon, I got access to this new model, and I might resubscribe because of it. It’s good at thinking and figuring out gnarly issues, such as how to start Monologue keyboard dictation without forcing the user to switch back manually to the original app, or new features that are complicated to implement. I’m quite shocked that Dan, with Opus 4.6, was able to push a pull request to the Monologue iOS app.

Katie Parrott—AI-pilled writer by day, vibe coder by night

Verdict: 🟢
Quote: The drafting experience is so much more fluid and responsive than Opus 4.5—I feel like I’m collaborating rather than wrestling. What I’m really loving, though, is the agentic-ness of the chat experience. The resourcefulness and adaptation to the needs of your request are noticeable. I almost thought I was in Claude Code. One caveat: In the blind writing test, I was the outlier—the team preferred Opus 4.5. But I’m confident the AI “smell” will resolve with time and better prompting.