
Was this newsletter forwarded to you? Sign up to get it in your inbox.
It was a crowded week for AI model releases—so crowded, it’s hard not to suspect that the big labs are stepping on each other's toes on purpose in a bid to hog the spotlight. As Spiral general manager Danny Aziz put it in Every’s Discord, “Begun, the petty AI wars have.”
In a 24-hour span on Tuesday, the big labs released Genie 3, a text-to-world simulator from Google DeepMind that you can actually walk around in (once it’s out of research preview, anyway); Claude Opus 4.1, Anthropic’s incremental (but important) coding upgrade; and gpt-oss-120b and 20b, OpenAI’s first open-weight models since GPT-2 back in 2019.
Then came OpenAI again on Thursday, crashing through a wall like the Kool-Aid man to announce the arrival of its newest flagship model, GPT-5.
We've spent the last few days testing what we can, reading every benchmark table, and parsing the takes. Let's talk about what's new, what each model does best, and what the industry thinks about this AI rush hour.
Genie 3: Google's text-to-world machine
Genie 3 feels like the most sci-fi release of the week. Unlike earlier text-to-video models like Sora, which generate fixed, non-interactive clips, , Genie 3 creates interactive 3D worlds that respond to your actions within the virtual environment in real time. Think of it as the difference between watching a movie and playing a video game, except the game is generated on the fly from whatever you type.
What it's great at:
Real-time world building with memory: Genie 3 generates 720-pixel worlds at 24 frames per second and remembers them—maintaining visual consistency for minutes and recalling details from up to a minute ago. You can walk away (virtually) and return, and everything is right where you left it.
Promptable world events: Mid-exploration, you can type new commands to alter the simulation in real time. Exploring a desert? Add a thunderstorm. Walking through a forest? Spawn a herd of deer.
Physics without programming: Genie 3 develops an intuitive understanding of physics—water flows, objects fall, lighting behaves naturally—without relying on hard-coded engines. The model teaches itself how the world works by remembering what it has generated and reasoning over long time horizons.
Claude Opus 4.1: The precision upgrade
If Genie 3 is the flashy sci-fi reveal, Claude Opus 4.1 is the practical upgrade that developers will use every day. It’s the kind of measured improvement that Anthropic has become known for, targeting the specific pain points that matter most in real-world programming workflows.
What it's great at:
Precision coding: Opus 4.1 achieves 74.5 percent on SWE-bench Verified, a version of the software engineering coding benchmark that’s been checked by humans. That's up from 72.5 percent for the original Opus 4, and it’s near the top in real-world coding tasks where precision matters more than speed.
Making changes across many files without breaking the code: The model excels at making complex changes across multiple files without introducing bugs—a notoriously difficult task that separates good coding assistants from great ones.
Enhanced research and detail tracking: Beyond coding, Opus 4.1 improves at in-depth research and data analysis tasks. It remembers small but important facts across long documents (aka detail tracking), and uses agentic search to find and connect relevant information on its own.
gpt-oss 120b and 20b: OpenAI's open-source return
Five years after GPT-2, OpenAI is back in the open-source game with two models that pack serious reasoning capabilities into surprisingly efficient packages.
What they're great at:
Efficient reasoning on your own hardware: The larger gpt-oss 120b runs only part of its brain at a time, making it powerful but still able to run on a single high-end graphics processing unit (GPU). The smaller 20b model is light enough to run on a standard laptop with 16 gigabytes of memory, instead of needing expensive cloud servers.
Tool use and agentic workflows: Both models excel at chain-of-thought reasoning and can act as intelligent intermediaries. They can decide whether to handle a task themselves, or hand it off to another tool or system..
Customizable and permissive: Unlike models you access via API and pay-per-use, gpt-oss models are released under the Apache 2.0 license, so you can download, modify, and use them commercially without paying royalties—even offline on your own hardware.
GPT-5: The everything model?
OpenAI is betting big on its new flagship model, GPT-5. So big, they’re sunsetting past models you may know and love, like 4o and 4.5. GPT-5 collapses the choice paralysis of modern AI into a single system that figures out what you need and how hard to think about it.
In ChatGPT, it's the end of the model picker. In the API, it's priced to make competitors question their life choices. But for the cutting edge of AI development, it feels more like a refinement of the old paradigm than a glimpse of the new one.
What it’s great at:
- Speed-adaptive intelligence: Routes questions between fast chat mode and deeper reasoning, adjusting how long it “thinks” based on complexity.
- Aggressive pricing: GPT-5 Standard is $1.25 per million input tokens—1/12th the cost of Claude Opus 4.1—with GPT-5-mini even cheaper, on par with Google’s Gemini 2.5 Flash.
- Beginner-friendly coding: In ChatGPT’s Canvas, can generate working front-end apps from a description—good enough to wow first-time users, if not yet a pro’s first choice.
Read our full Vibe Check on GPT-5.
Was this newsletter forwarded to you? Sign up to get it in your inbox.
It was a crowded week for AI model releases—so crowded, it’s hard not to suspect that the big labs are stepping on each other's toes on purpose in a bid to hog the spotlight. As Spiral general manager Danny Aziz put it in Every’s Discord, “Begun, the petty AI wars have.”
In a 24-hour span on Tuesday, the big labs released Genie 3, a text-to-world simulator from Google DeepMind that you can actually walk around in (once it’s out of research preview, anyway); Claude Opus 4.1, Anthropic’s incremental (but important) coding upgrade; and gpt-oss-120b and 20b, OpenAI’s first open-weight models since GPT-2 back in 2019.
Then came OpenAI again on Thursday, crashing through a wall like the Kool-Aid man to announce the arrival of its newest flagship model, GPT-5.
We've spent the last few days testing what we can, reading every benchmark table, and parsing the takes. Let's talk about what's new, what each model does best, and what the industry thinks about this AI rush hour.
Genie 3: Google's text-to-world machine
Genie 3 feels like the most sci-fi release of the week. Unlike earlier text-to-video models like Sora, which generate fixed, non-interactive clips, , Genie 3 creates interactive 3D worlds that respond to your actions within the virtual environment in real time. Think of it as the difference between watching a movie and playing a video game, except the game is generated on the fly from whatever you type.
What it's great at:
Real-time world building with memory: Genie 3 generates 720-pixel worlds at 24 frames per second and remembers them—maintaining visual consistency for minutes and recalling details from up to a minute ago. You can walk away (virtually) and return, and everything is right where you left it.
Promptable world events: Mid-exploration, you can type new commands to alter the simulation in real time. Exploring a desert? Add a thunderstorm. Walking through a forest? Spawn a herd of deer.
Physics without programming: Genie 3 develops an intuitive understanding of physics—water flows, objects fall, lighting behaves naturally—without relying on hard-coded engines. The model teaches itself how the world works by remembering what it has generated and reasoning over long time horizons.
An assistant that gets you
After a long day of work, your smart home should understand you. Alexa+ lets you ditch the robotic commands and just talk naturally—no more "Alexa speak" or repeating wake words. She remembers what matters to you and can take action across thousands of services and devices. Make dinner reservations, activate your speakers, or turn off the lights. Let Alexa help you get things done so your life keeps moving.
Claude Opus 4.1: The precision upgrade
If Genie 3 is the flashy sci-fi reveal, Claude Opus 4.1 is the practical upgrade that developers will use every day. It’s the kind of measured improvement that Anthropic has become known for, targeting the specific pain points that matter most in real-world programming workflows.
What it's great at:
Precision coding: Opus 4.1 achieves 74.5 percent on SWE-bench Verified, a version of the software engineering coding benchmark that’s been checked by humans. That's up from 72.5 percent for the original Opus 4, and it’s near the top in real-world coding tasks where precision matters more than speed.
Making changes across many files without breaking the code: The model excels at making complex changes across multiple files without introducing bugs—a notoriously difficult task that separates good coding assistants from great ones.
Enhanced research and detail tracking: Beyond coding, Opus 4.1 improves at in-depth research and data analysis tasks. It remembers small but important facts across long documents (aka detail tracking), and uses agentic search to find and connect relevant information on its own.
gpt-oss 120b and 20b: OpenAI's open-source return
Five years after GPT-2, OpenAI is back in the open-source game with two models that pack serious reasoning capabilities into surprisingly efficient packages.
What they're great at:
Efficient reasoning on your own hardware: The larger gpt-oss 120b runs only part of its brain at a time, making it powerful but still able to run on a single high-end graphics processing unit (GPU). The smaller 20b model is light enough to run on a standard laptop with 16 gigabytes of memory, instead of needing expensive cloud servers.
Tool use and agentic workflows: Both models excel at chain-of-thought reasoning and can act as intelligent intermediaries. They can decide whether to handle a task themselves, or hand it off to another tool or system..
Customizable and permissive: Unlike models you access via API and pay-per-use, gpt-oss models are released under the Apache 2.0 license, so you can download, modify, and use them commercially without paying royalties—even offline on your own hardware.
GPT-5: The everything model?
OpenAI is betting big on its new flagship model, GPT-5. So big, they’re sunsetting past models you may know and love, like 4o and 4.5. GPT-5 collapses the choice paralysis of modern AI into a single system that figures out what you need and how hard to think about it.
In ChatGPT, it's the end of the model picker. In the API, it's priced to make competitors question their life choices. But for the cutting edge of AI development, it feels more like a refinement of the old paradigm than a glimpse of the new one.
What it’s great at:
- Speed-adaptive intelligence: Routes questions between fast chat mode and deeper reasoning, adjusting how long it “thinks” based on complexity.
- Aggressive pricing: GPT-5 Standard is $1.25 per million input tokens—1/12th the cost of Claude Opus 4.1—with GPT-5-mini even cheaper, on par with Google’s Gemini 2.5 Flash.
- Beginner-friendly coding: In ChatGPT’s Canvas, can generate working front-end apps from a description—good enough to wow first-time users, if not yet a pro’s first choice.
Read our full Vibe Check on GPT-5.
What everyone at Every is thinking…
… about GPT-5
When do we stop switching models?
“From an AI editorial ops perspective, my question is: Are we GPT-5 people now? We just committed to Claude Opus 4 to run our AI editor. What happens when Opus 5 hits and is better? At what point do the switching costs become too high?”—Katie Parrott, writer and AI operations lead
Requiem for a deprecated model
“I’m kind of mad I don’t have access to GPT-4.5 anymore. I use it for a lot of writing, and I had prompts that were good for it, and they just don't work the same in GPT-5. I felt like I had a feel for it.”—Alex Duffy, head of AI training
… about Genie 3
Incredible tech, but to what end?
“I think Genie 3 is incredible—it’s insane that it’s even possible. Same with Opus getting better and GPT-5 landing as well. But they’re still just tools. How do we use them to meaningfully improve lives? We need a vision. A ‘we’re going to the moon’ moment that ties it all together.”—Alex
… about Claude Opus 4.1
Still wins for complex coding tasks
“I’ll still reach for Claude Code 4.1 for big, moving-parts features and migrations that need to keep the original structure intact. When I migrated the backend [of Monologue], both GPT-5 and Claude got it done, but Claude stayed faithful to the architecture instead of simplifying to make it work.’—Naveen Naidu, entrepreneur in residence
Shines as a research partner
“I think 4.1 is brilliant. I have it do research and it's insanely good!”—Kieran Klaassen, general manager of Cora
Big model smell
“Opus 4.1 is the only BIG model left [in terms of parameter counts, a proxy for its size and capacity], and in our use it even had more of a ‘big model smell’ than GPT-5. GPT-5’s intelligence is impressive, but Anthropic is the only big lab to tame and monetize a BIG model.”—Nityesh Agarwal, engineer on Cora
… about gpt-oss 120b and 20b
Great for tinkering, not for day-to-day
“Can't wait to see what people cook up. Having o4-mini running on my machine is sick, but outside of fun little projects, I don't see myself using it. Why would I mess around with a model whose intelligence is worse when there's a pretty affordable model (GPT-5) that I can access via the API?”—Danny Aziz, general manager of Spiral
Fun to try, not ready to trust
“We tried the OSS models [in Cora]. They are cool but not production ready. Too wild in their outputs.”—Kieran
What everyone else is thinking…
… about GPT-5
Enterprise testing suggest breakthrough capabilities
Box CEO Aaron Levie says GPT-5 significantly improved accuracy in their evaluations compared with GPT-4.1. In tests on contracts, research data, government documents, and other complex files, GPT-5 handled intricate logic, extracted structured data more reliably, reduced hallucinations, and improved visual reasoning—key for enterprise workflows in legal, finance, healthcare, and more.
More professional, less sycophantic
Peter Steinberger, a full-time open-source developer, said GPT-5 “feels more like working with a professional.” It’s less likely to slide into sycophancy than 4o and quick to recognize when the error is on the user’s side.
Ready for scientific primetime?
Sean Bruich, senior vice president of AI and data at biopharmaceutical outfit Amgen, emphasizes that GPT-5 clears the bar for scientific accuracy, "doing a better job navigating ambiguity where context matters," suggesting readiness for research and healthcare applications.
… about Genie 3
Originality beyond imitation
Serial entrepreneur Pieter Levels called Genie 3 “mindblowing," praising it for creating “something brand new that could not have existed without AI technology” rather than copying human work.
A potential edge in the headset wars
Product developer Parker Ortolani thinks Genie-like models could give Google a major advantage in the competition over AR/VR hardware, especially “when up against Apple, who…hasn't gotten itself together on static image models yet.”
Robotics will be a tougher test
Nvidia’s director of robotics Jim Fan draws a line between generating stable game worlds and simulating the complexity of real-world robotics—especially controlling humanoid hands and teaching robots to handle many different kinds of objects in different ways.. Still, he thinks Genie-style simulators could become the “clean energy” of robotics training data, replacing the “fossil fuel” of human teleoperation (people directly controlling robots to demonstrate tasks).
… about Claude Opus 4.1
A coding upgrade you can feel
Rakuten’s general manager of AI Yusuke Kaji said Opus handled a complex open-source project autonomously for nearly seven hours—“a huge leap in AI capabilities that left the team amazed.” The extended, uninterrupted run suggests gains in both reliability and the ability to sustain performance over long coding sessions.
The timing is interesting… very interesting
Developer Alec Velikanov is among several to suggest Opus 4.1 feels like "a rushed release to get ahead of GPT-5," comparing the model unfavorably to competitors in user interface tasks.
... About GPT‑oss 120B and 20B
An open-weight debut with instant traction
According to Hugging Face CEO Clément Delangue, gpt-oss is the top trending model out of nearly 2 million on the machine learning community platform, and sees the move as potentially transformative for the AI ecosystem—just like GPT-2 was in 2019.
Hallucinations are an issue
Not all feedback is glowing. Researcher Wenhu Chen says the 120b model “hallucinates a lot” and suspects it may have been trained by copying a larger, closed-source model’s answers—using AI-generated reasoning examples that may have built in and amplified the bigger model’s mistakes.
Small but mighty
Indie open-source developer Simon Willison was surprised by how close the models come to proprietary small-model performance: 120b is near parity with o4-mini on reasoning benchmarks, and 20b performs similarly to o3-mini. “I was not expecting the open-weights releases to be anywhere near that class.”
The bigger picture
If there’s a theme to this week’s pile-up of launches, it’s that AI progress is happening on multiple, diverging fronts at once. Genie 3 pushes the boundary of embodied AI, showing how text-to-world generation could reshape games, AR, and robotics. Claude Opus 4.1 doubles down on the unglamorous but high-impact work of making tools that slot into code production. gpt-oss revives the open-weights play, betting that developer freedom and local inference (running the model directly on your own hardware instead of through a paid API) will unlock a new wave of experimentation. And GPT-5’s debut reminds us that the big labs still see value in unifying everything into a single flagship that adapts to the task.
Taken together, it’s a snapshot of a maturing field where no single model or lab dominates the narrative, although that doesn’t stop them from trying. We’re heading into a field of overlapping races—open versus closed, frontier versus specialized, headline-grabbing demos beside everyday tools.. If the past few years were about proving that large models could work, the next few will be about deciding which kinds of intelligence—and which ways of giving it away—make the leap from headline-makers to industry standards. .
Katie Parrott is a writer, editor, and content marketer focused on the intersection of technology, work, and culture. You can read more of her work in her newsletter.
To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.
We build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora.
We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.
Get paid for sharing Every with your friends. Join our referral program.
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Comments
Don't have an account? Sign up!