Vibe Check: Genie 3, Claude 4.1, GPT-oss, and GPT-5

Was this newsletter forwarded to you? Sign up to get it in your inbox.

It was a crowded week for AI model releases—so crowded, it’s hard not to suspect that the big labs are stepping on each other's toes on purpose in a bid to hog the spotlight. As Spiral general manager Danny Aziz put it in Every’s Discord, “Begun, the petty AI wars have.”

In a 24-hour span on Tuesday, the big labs released Genie 3, a text-to-world simulator from Google DeepMind that you can actually walk around in (once it’s out of research preview, anyway); Claude Opus 4.1, Anthropic’s incremental (but important) coding upgrade; and gpt-oss-120b and 20b, OpenAI’s first open-weight models since GPT-2 back in 2019.

Then came OpenAI again on Thursday, crashing through a wall like the Kool-Aid man to announce the arrival of its newest flagship model, GPT-5.

We've spent the last few days testing what we can, reading every benchmark table, and parsing the takes. Let's talk about what's new, what each model does best, and what the industry thinks about this AI rush hour.

Genie 3: Google's text-to-world machine

Genie 3 feels like the most sci-fi release of the week. Unlike earlier text-to-video models like Sora, which generate fixed, non-interactive clips, , Genie 3 creates interactive 3D worlds that respond to your actions within the virtual environment in real time. Think of it as the difference between watching a movie and playing a video game, except the game is generated on the fly from whatever you type.

What it's great at:

Real-time world building with memory: Genie 3 generates 720-pixel worlds at 24 frames per second and remembers them—maintaining visual consistency for minutes and recalling details from up to a minute ago. You can walk away (virtually) and return, and everything is right where you left it.

Promptable world events: Mid-exploration, you can type new commands to alter the simulation in real time. Exploring a desert? Add a thunderstorm. Walking through a forest? Spawn a herd of deer.

Physics without programming: Genie 3 develops an intuitive understanding of physics—water flows, objects fall, lighting behaves naturally—without relying on hard-coded engines. The model teaches itself how the world works by remembering what it has generated and reasoning over long time horizons.

Claude Opus 4.1: The precision upgrade

If Genie 3 is the flashy sci-fi reveal, Claude Opus 4.1 is the practical upgrade that developers will use every day. It’s the kind of measured improvement that Anthropic has become known for, targeting the specific pain points that matter most in real-world programming workflows.

What it's great at:

Precision coding: Opus 4.1 achieves 74.5 percent on SWE-bench Verified, a version of the software engineering coding benchmark that’s been checked by humans. That's up from 72.5 percent for the original Opus 4, and it’s near the top in real-world coding tasks where precision matters more than speed.

Making changes across many files without breaking the code: The model excels at making complex changes across multiple files without introducing bugs—a notoriously difficult task that separates good coding assistants from great ones.

Enhanced research and detail tracking: Beyond coding, Opus 4.1 improves at in-depth research and data analysis tasks. It remembers small but important facts across long documents (aka detail tracking), and uses agentic search to find and connect relevant information on its own.

gpt-oss 120b and 20b: OpenAI's open-source return

Five years after GPT-2, OpenAI is back in the open-source game with two models that pack serious reasoning capabilities into surprisingly efficient packages.

What they're great at:

Efficient reasoning on your own hardware: The larger gpt-oss 120b runs only part of its brain at a time, making it powerful but still able to run on a single high-end graphics processing unit (GPU). The smaller 20b model is light enough to run on a standard laptop with 16 gigabytes of memory, instead of needing expensive cloud servers.

Tool use and agentic workflows: Both models excel at chain-of-thought reasoning and can act as intelligent intermediaries. They can decide whether to handle a task themselves, or hand it off to another tool or system..

Customizable and permissive: Unlike models you access via API and pay-per-use, gpt-oss models are released under the Apache 2.0 license, so you can download, modify, and use them commercially without paying royalties—even offline on your own hardware.

GPT-5: The everything model?

OpenAI is betting big on its new flagship model, GPT-5. So big, they’re sunsetting past models you may know and love, like 4o and 4.5. GPT-5 collapses the choice paralysis of modern AI into a single system that figures out what you need and how hard to think about it.

In ChatGPT, it's the end of the model picker. In the API, it's priced to make competitors question their life choices. But for the cutting edge of AI development, it feels more like a refinement of the old paradigm than a glimpse of the new one.

What it’s great at:

Speed-adaptive intelligence: Routes questions between fast chat mode and deeper reasoning, adjusting how long it “thinks” based on complexity.
Aggressive pricing: GPT-5 Standard is $1.25 per million input tokens—1/12th the cost of Claude Opus 4.1—with GPT-5-mini even cheaper, on par with Google’s Gemini 2.5 Flash.
Beginner-friendly coding: In ChatGPT’s Canvas, can generate working front-end apps from a description—good enough to wow first-time users, if not yet a pro’s first choice.

Read our full Vibe Check on GPT-5.