Was this newsletter forwarded to you? Sign up to get it in your inbox.
Google's Gemini models may not dominate conversations (or searches) like OpenAI’s—but they’re starting to dominate something more important: the developer software stack.
Inside Every, Gemini 2.5 Pro and Gemini 2.5 Flash are already powering production workflows, and Flash runs quietly in the background of products like Cora and Sparkle. Our team is hardly the only one getting mileage out of these models, though; Pro has become the default brain inside go-to AI-powered developer tools Cursor and Windsurf. And according to Google Cloud CEO Thomas Kurian, more than four million developers are building with Gemini
With a fresh update to Gemini 2.5 Pro landing this week that’s meant to have stronger coding support and wider developer access, Google’s bid to win the hearts and minds of developers is getting harder to ignore.
Let’s dig into what Pro and Flash do best, how the Every team is putting them to work, and why Gemini could be the backend stack’s dark horse.
Gemini 2.5 Pro: The quietly powerful workhorse
Gemini 2.5 Pro debuted in March 2025 as Google’s first “thinking model,” a territory previously mapped by OpenAI with its o1 release in September 2024 and Anthropic with the release of Claude Sonnet 3.7 in February 2025. A thinking model, also known as a reasoning model, is an LLM that pauses to plan a step-by-step solution before answering. That extra planning, plus a 1 million-token context window—enough to read an entire codebase, a full research report, or about an hour of video—lets it handle problems other models have to tackle in bite-sized chunks.
On May 6, ahead of its annual I/O conference later this month, Google announced an update to 2.5 Pro (you may see it referred to as “Gemini-2.5-Pro-05-06,” in case you thought OpenAI was the only one with naming challenges). This launch touts sharper coding skills, richer web-app demos (click-to-try sample sites that let anyone play around with the model inside a browser), and, crucially, general access in AI Studio (Google’s free playground for quick experiments) and Vertex AI (its managed cloud service for production workloads). In other words: It’s easier to try, and far simpler for companies to roll straight into their apps.
What it’s great at:
- Coding and debugging at scale: Pro remembers details from massive context dumps and often catches and corrects its own logic.
- Long-context planning: Handles multi-turn planning, where each new prompt builds on the last, and can steer big engineering jobs, such as rewriting or reorganizing an entire codebase.
- Multimodal reasoning: Solid performance across text, code, and images in the same prompt thread.
Gemini 2.5 Flash: The glue model with speed control
Gemini 2.5 Flash landed in mid-April as Google’s first hybrid-reasoning model—designed to be fast by default, but able to “pause and think” when a task gets tricky. It’s like developers got a thinking-budget knob (0–24,000 tokens) that trades cost and latency for extra brainpower: Leave it at 0 for 2.0-level speed, or bump it up when problems need multi-step logic.
Google calls this its best price-to-performance option. Because most requests (known as “calls”) a developer sends to the model are lightweight (for tasks like routing, re-formatting, or quick look-ups), teams can keep 90 percent of their work with Flash at rock-bottom prices and reserve extra reasoning—and cost—for the rare tasks that require it, like drafting a complete product-feature spec from scattered meeting notes.
What it’s great at:
- Low-latency orchestration: Flash acts like a real-time dispatcher—cleaning up AI responses, deciding where each request should go (like a heavier-duty reasoning model for complex questions or an external API for fresh data), and tagging or sorting huge streams of data without slowing anything down.
- Programmable reasoning control: Teams can dial up or down how much “thinking time” the model spends on each request—aka its inference depth. Less thinking time means fast, cheap answers for simple tasks; more thinking time lets the model pause, reason through several steps, and return a more thorough solution (at the cost of a few extra tokens and milliseconds).
- Multimodal on a budget: Handles image input with surprisingly strong results—at a fraction of what Claude or GPT-4 charge.
What everyone at Every thinks
… about Gemini 2.5 Pro
It’s the new default inside Cursor—because it just works
“I love how powerful it is. Its large context window and reasoning capabilities mean I can dump in a lot of context ,and I feel pretty confident in using it well (remembering certain implementation details from some random file I gave it several messages ago, etc.)”—Danny Aziz, general manager of Spiral
Tool calls and comment spam are a problem
“Tool calls, at least within Cursor, are still a bit problematic—sometimes it generates code inline instead of applying changes, so you have to do it manually. That’s kinda annoying. Another annoying thing is the comments [explanatory lines throughout the file, which add clutter and degrades performance]—it adds a lot, and I always have to ask it to remove them. But other than those two things, it’s really great.”—Naveen Naidu, entrepreneur in residence
The Only Subscription
You Need to
Stay at the
Edge of AI
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators
Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Comments
Don't have an account? Sign up!
A super interesting article, however when reading about the frontier reasoning models I always find myself wishing that there was more commentary about the incredible use cases they have for your 'average knowledge worker' and/or senior leaders.
Gigantic context windows + multi-turn, high-powered strategic thinking = super powers for desk workers and strategic leaders.
These models are not just great for coders.