Midjourney/Every illustration.

Vibe Check: Opus 4.6—The Best Coding Model We’ve Tested (With Some Maddening Habits)

It one-shotted a problem other models missed—and brings agentic, parallel work to non-coding tasks

Like Comments

Was this newsletter forwarded to you? Sign up to get it in your inbox.


Anthropic’s latest model, Opus 4.6, is the best AI coder we’ve tested. It also sometimes makes maddening errors.

It solved a real iOS coding task that stumped both the GPT-5.3 Codex and Opus 4.5. It is more thorough, explores context more carefully, and is smarter than Opus 4.5. But this power has trade-offs. It’s slower and a bit more verbose, and it still falls prey to classic Claudisms: It sometimes makes changes you didn’t expect, and doesn’t always know its own skills and capabilities.

When it comes to coding, it feels like Anthropic looked at Codex’s strong-points—thorough, precise work on tough tasks—and tried to incorporate them into this release.

On writing and editing, the drafting experience is more fluid than Opus 4.5. It applies editorial rules more consistently and translates technical concepts into accessible prose more naturally. But in a blind test, the team preferred Opus 4.5’s prose—Opus 4.6 seems more prone to AI-isms like “X not Y” constructions than its predecessor.

What Anthropic told us

  1. Best-in-class for coding and professional work: Built to power agents that handle whole categories of real-world work, excelling across the entire software development lifecycle.
  2. Its most agentic model yet: It drives tasks forward with less handholding—parallelizing work, gathering more context, and taking smarter autonomous actions.
  3. “Adaptive Thinking” replaces “Extended Thinking”: The model adjusts how much it thinks based on the difficulty of your question. On easy tasks, it skips the deep reasoning step entirely. This is on by default everywhere.

The Reach Test


Reach Test legend

🥇: Paradigm shift

🟩 : Psyched about this release

🟨 : It’s okay, but I wouldn’t use it every day

🟥 : Trash release


Dan Shipper—the multi-threaded CEO

  1. Verdict: 🟢
  2. Quote: I shipped a merged pull request on a codebase I’ve never touched—it researched an unsolved iOS problem and wrote a working fix that left me stunned. I also like its default parallelization for knowledge-work tasks. It raises the ceiling for what’s possible with AI coding. But I also found it to be sometimes unreliable, and to require closer management than Codex does.

Kieran Klaassen—the Rails-pilled master of Claude Code

  1. Verdict: 🟢
  2. Quote: It’s better at understanding code bases and doing more work longer than Opus 4.5. For vibe coders starting fresh, it might not be a super big jump, but it’s a really nice step up for day-to-day coders with bigger projects and a great refinement from what was missing. The model’s medium thinking option, a setting to make the model think less, is a good, faster alternative.

Naveen Naidu—graduate of IIT Bombay (the MIT of India 💅)

  1. Verdict: 🟢
  2. Quote: On the morning I got access to the new model, I canceled my Anthropic subscription because I wasn’t using it anymore. That afternoon, I got access to this new model, and I might resubscribe because of it. It’s good at thinking and figuring out gnarly issues, such as how to start Monologue keyboard dictation without forcing the user to switch back manually to the original app, or new features that are complicated to implement. I’m quite shocked that Dan, with Opus 4.6, was able to push a pull request to the Monologue iOS app.

Katie Parrott—AI-pilled writer by day, vibe coder by night

  1. Verdict: 🟢
  2. Quote: The drafting experience is so much more fluid and responsive than Opus 4.5—I feel like I’m collaborating rather than wrestling. What I’m really loving, though, is the agentic-ness of the chat experience. The resourcefulness and adaptation to the needs of your request are noticeable. I almost thought I was in Claude Code. One caveat: In the blind writing test, I was the outlier—the team preferred Opus 4.5. But I’m confident the AI “smell” will resolve with time and better prompting.

The headline findings

Two big stories emerged from our testing:


Become a paid subscriber to Every to unlock this piece and learn about:

  1. How Opus 4.6 performed on Every’s LGF benchmarks
  2. How the model performed on writing and editing tests
Subscribe to read the full article

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Mail Every Content
AI&I Podcast AI&I Podcast
Monologue Monologue
Cora Cora
Sparkle Sparkle
Spiral Spiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

Pencil Front-row access to the future of AI
Check In-depth reviews of new models on release day
Check Playbooks and guides for putting AI to work
Check Prompts and use cases for builders