GPT-4.5 Won’t Blow Your Mind. It Might Befriend It Instead.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

We’ve had access to GPT-4.5 for the past few days, and it is our unfortunate duty to report that it is not totally mind-blowing.

OpenAI just released it as a research preview for ChatGPT Pro users, and to be sure, it benchmarks really well: OpenAI told us it got 64 percent on the Simple QA benchmark—almost double GPT–4’s score. SimpleQA tests for world knowledge in tricky areas without lots of data, so it should hallucinate less. But that’s not the big headline.

OpenAI billed it as compassionate, emotionally intelligent, and a good creative writer—sort of like Claude 3.5 but better—and it lives up to that.

So, in sum: It might not totally blow your mind, but it might befriend it instead.

From bland assistant to bestie—once you’re used to it

4.5 is not a major step up from 4o, but it is a step in a new direction—one with fewer refusals, more human answers, better formatted responses, and less rigidity.

Before the blood rushes from your face, you start to feel lightheaded, and your thumbs twitch restlessly as they begin to compose an all caps tweet about THE END OF SCALING LAWS, stop and take a deep breath. Let us apply a cold compress and a soothing aphorism:

Don’t panic.

Here’s why:

1. This shift is surprisingly timely. This week Anthropic launched Claude 3.7 Sonnet with significantly improved coding ability but far less warmth and emotional intelligence than 3.5. So there's a lot of room open for another model that can be a creative, empathetic, and supportive friend and coach in your day-to-day life. And that seems to be OpenAI’s goal with 4.5.

2. It’s important to acknowledge the reality. This is disappointing. Earlier this year The Information and other outlets reported on rumblings about OpenAI’s “Project Orion” not performing as expected in terms of intelligence, and those seem to be largely correct. Presumably, OpenAI has expended a lot of resources on GPT 4.5, and in a world where things were going to plan it would knock our socks off—especially because Claude 3.7 Sonnet is being extolled for its coding ability.

3. Let’s get some perspective. Sometimes it takes a while to get to know a model. We’re entering into territory where interacting with models via chat isn’t enough to understand their capabilities. We need to use them within other apps—like Cursor, Cora, or Sparkle—and create new benchmarks (more on this soon). It’s sort of like testing a new graphics card by doing your email—it would be hard to notice the differences.

To be completely honest, we didn’t love 4.5 on the first day we used it: It felt slow, we encountered hallucinations, and it was harder to steer. But in subsequent testing, it grew on us. OpenAI said it’s more opinionated and less sycophantic than other models, and that tracks: You might hate opinionated people at first but grow to appreciate them over time.

So we’ll reserve final judgment on 4.5 until we’ve spent at least a few weeks with it, and it’s started to filter into other tools that can use it in more powerful ways, such as writing tools like Lex.

We have a hunch it will be particularly effective within Advanced Voice Mode. In fact, OpenAI told us on a call that they think of 4.5’s output as being meant for the ear rather than read on a page. Its responses have pauses and line breaks that make them feel more emotive and conversational. This quality, which they call "Orion prose," makes the model's outputs easier to read aloud and more engaging, almost as if they were designed for oral delivery.

Source: ChatGPT-4.5. Courtesy of Alex Duffy.

4. Consider how this fits into OpenAI’s broader strategy before you start screaming about how the company is cooked. It’s shipping a lot, and very quickly. It’s willing to release products with rough edges, which seems to apply to this model too. But it’s also iterating quickly—those rough edges inevitably get smoothed out. 4.5 seems significantly faster in our testing today than it did yesterday, for example.

Also, 4.5 is not OpenAI’s only frontier model. 4.5 is a base model—it gives you its first response—not a reasoning one (which shows you how it thinks) like o1 and o3. OpenAI’s reasoning-focused models operate on a totally different set of scaling laws.

Let’s get into our testing.

4.5 is more extroverted and less neurotic

We wanted to get a good idea of how large, complex models like GPT change over time, so we decided to give it personality tests. We tested both GPT-4o and GPT-4.5 (along with other models) on the Dark Triad and OCEAN tests meant to reveal your key personality traits.

On the Dark Triad test, GPT-4.5 scored slightly lower on narcissism and Machiavellianism but was marginally more psychopathic.

On the OCEAN assessment, GPT-4.5 rated as more extroverted, open, agreeable, and conscientious, and less neurotic than GPT-4o, which tracked with our experience when asking it questions like, “Where would you move to in New York City?” or, “Tell me an actually funny joke about whatever you find most interesting.” GPT-4.5 was happy to share its opinion instead of responding with a canned “I am only a language model” line as it has in the past.

Then we decided to have some fun and asked them to fill out a couple of BuzzFeed quizzes. First was an aesthetic-focused quiz where both models ended up with the same result: Dark Academia. They also took a 40-question personality test, which declared them both “true entertainers who like to be the center of attention, have big hearts, and are overthinkers who don’t like to sit still.”

Both models tend to agree on most things. They prefer facts to feelings, don't believe in love at first sight, never half-ass anything, and have very clear goals in life.

There were some notable differences, though. GPT-4.5 was generally less anti-social, favoring dinner and a movie with a friend over countryside hikes alone, and was more open to the possibility of supernatural beliefs (not ruling out ghosts), whereas GPT‑4.0 was slightly more skeptical.

Comments

You need to login before you can comment.
Don't have an account? Sign up!

Paul Carney 10 months ago

Thanks for the insights, Dan. I like that the engine will understand a little more, but I shy away of saying it is "empathetic" because only humans can do that. No matter how well these are programmed, they do not have human experiences and really felt how something impacts us physically and emotionally. If we start showing it empathy back, we lose the ability to take advantage of many of its capabilities, like drilling it over and over if it fails to follow instructions - something we should not do with humans.

Thanks for getting this out there for us to learn!

♡ 1 · Reply

Bailey.Wier 10 months ago

Appreciated this post, thank you, Dan. Am curious:

What WAS
What WAS your first “ Writing and instruction” prompt about founders? :)

GPT-4.5 Won’t Blow Your Mind. It Might Befriend It Instead.

From bland assistant to bestie—once you’re used to it

4.5 is more extroverted and less neurotic

The Only Subscription
You Need to
Stay at the
Edge of AI

What is included in a subscription?

Related Essays

Seeing Science Like a Language Model

Seeing Like a Language Model

Microsoft’s AI Vision: An Open Internet Made for Agents

Comments

GPT-4.5 Won’t Blow Your Mind. It Might Befriend It Instead.

From bland assistant to bestie—once you’re used to it

4.5 is more extroverted and less neurotic

The Only SubscriptionYou Need to Stay at the Edge of AI

What is included in a subscription?

Related Essays

Seeing Science Like a Language Model

Seeing Like a Language Model

Microsoft’s AI Vision: An Open Internet Made for Agents

Comments

The Only SubscriptionYou Need to Stay atthe Edge of AI

The Only Subscription
You Need to
Stay at the
Edge of AI

The Only Subscription
You Need to Stay at
the Edge of AI