The transcript of AI & I with Spiral general manager Danny Aziz is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.
Timestamps
- Introduction: 00:01:00
- How Danny used Spiral to prepare for this podcast: 00:05:26
- Why slowing down makes AI writing better: 00:08:29
- The agents working under the hood for Spiral: 00:13:42
- How Spiral helps you explore the canvas of possibilities: 00:14:46
- Why Danny pivoted away from the old version of Spiral: 00:24:41
- How to use AI without losing your craft: 00:31:51
- Danny’s workflow for building Spiral as a solo engineer: 00:34:55
- Code with AI while staying in control: 00:40:39
- What Danny learned about getting AI to write well: 00:45:26
- How Danny used DSPy to give AI taste: 00:47:52
- Dan versus AI Dan: Can the machine match the man?: 00:56:16
Transcript
(00:00:00)
Dan Shipper
So seven, eight months, we went through a ton of iterations, and then you were like, I want to make this the most beautiful fucking app I've ever made. Like, show us the app right now. Show us what it looks like.
Danny Aziz
All right. Okay. So here we are. This is the new Spiral. And there's like so many tiny moments that we like to think about on this. You know, James and I and Lucas, we all really just like to spend a lot of time thinking about, what is the experience that you wanna have when you come here? And, you know, it's very subtle. This is one thing that I just like, I love it every single time, but we have this slight papery feeling effect. It's noise, but in the background it feels like it's paper. We intentionally chose this typeface.
Dan Shipper
Wait, I wanna pause you. Like, just looking at this, right? There's so much here. There's obviously like, what are we writing today, Danny? There's, and for people who are listening, there's like a, there's the box where you type in the post you wanna write about, and then there's like a plus button where you can attach more context. There's a microphone button where you can talk to it. There's a styles button where you can add styles, but then there's a whole sidebar of different workspaces. And within each workspace you have different products. There are these little icons. It's like, there's one for chorus. If you wanna write about Cora, another product. Like you can go in there and that's all beautifully designed. And then there's like a history. And what I wanna emphasize to your point, is you built all of this by yourself. That is crazy. That would've been totally impossible a year ago. Or two years ago. And so I think you're, I think you're making a really important point about what you can achieve with these tools if you use them well and you wanna make something great.
Danny Aziz
Yeah. I think it really, you know, allows you to go from an idea to something really quickly. And I think you can go from something to like, something beautiful also really quickly and, you know, we, I worked with some really talented people like Lucas James. You know, we all jumped in here and I think the really cool thing was that it allowed, you know, both Lucas and James who are not engineers to like mess around and claw and be like, oh, can we figure this out? Like, to your point of, you know, being able to speak to it, we have like a monologue integration and we have this like dithering animation and I think Lucas made the first one in like a code sandbox using Claude. And then we like it. From there and then mess around in Figma. So like, it also allows a bunch of people to jump in and, you know, who aren't necessarily engineers to like, help play around and like, take it to that final place.
Dan Shipper
So that makes a lot of sense. I, you know, for people who are not as familiar with every ecosystem, monologue is another product that we build internally. And so now Spiral connects to monologue, which is super cool, and that would actually be kind of fun to talk about. But why don't you show us how Spiral works, just so we get a sense for once you're like opening the Pandora's box of it, like what is possible and then what's what's really interesting to me right now is I wanna learn more about your workflow because I think we do a ton of cloud code content on every, and you are actually a droid boy. And I feel like you have some ways of working that are maybe a little bit different from some of the stuff that we've done previously. And this is really making me, you know, one of the things I feel about you is just from looking at your hair and how beautiful your hair is, and also that leather jacket you're wearing. You really have this, and you've said this before about yourself. You have this like, measure twice cut once mentality where I, you're just like, you're a craftsman. And I think that that has come through in this product, it has come through and how you think about building a tool for writers. And I think that to a point you made earlier, it's sort of at odds with what people think you can build with AI, it's like, oh, you're just gonna make slop and actually you can build beautiful things with AI. And I think you're a really good example of that. So let's start with using Spiral.
Danny Aziz
Okay. Thank you for those lovely words by the way. You're welcome. So I actually prepared beforehand talking about this interview with Spiral, and I thought that could be kind of meta to go through.
So first of all, I did it inside the Spiral workspace and the whole idea of workspaces. It's kinda like projects in ChatGPT or Claude, which is like, it just has documents and stuff and context beforehand. And so it knows what you're talking about. And so I actually said like, Hey, I'm about to jump on a podcast with Dan to talk about you. Help me prepare some notes and what it actually did. Oh, that was a little janky. I gotta fix that. But one of the things that it did was that it asked itself what do I know about myself? And it kinda like, thought about it and it sort of spat it all out and it was like, Hey, here are a bunch of things that you could talk to about Dan. Like most AI writing is slop. Spiral has taste. We can talk about what makes Spiral different. It chats first and it interviews. It explores three different angles. These are all things to come. We can also talk about the evolution story. So Spiral like suggested a bunch of things to me. And then I could, you know, it. Then the first thing that Spiral does is it asks questions. It always wants to know more concrete details from you. And so it asks, would you like me to expand or help you through spec? Think through specific stories or examples that you could share.
And so I said, actually, one of the other things I wanna talk about is like the sheer number of iterations that we went through testing and experimenting. Then Spiral again. Look through what it knew of the story of Spiral and kind of came through here are these different arcs, you know, the realization the old one isn't working, the pivot. And then it started asking me questions for each one. What was the first thing that I tried differently? When did the chat first insight emerge? How many iterations did I go through? What did you learn about what makes writing good that surprised you? And so like it's gone through its understanding of itself, but it's asking me questions all along the way to really figure out concretely what are the, what the hell is it that I'm actually trying to say.
And this is super useful because I think a lot of the times when I've gone to old Spiral and other AI tools like. Ask for something, it may ask questions, but it's usually you have to prompt it to ask you questions. And the questions don't really help me really think about what I'm saying that well. We had an early user who said something about Spiral. He said that Spiral helped him go downstream from the original thought that he shared with Spiral, as opposed to it just being a repurposing of the original thought that he shared. And I think that like, put it really like, well and succinctly of like Spiral will help you take that initial like jumble of thoughts that you had and go a little bit further with it from these questions, which I think is like super useful in figuring out what it is that you're trying to say.
And I guess it's also like seeing it from different perspectives also helps really like understanding the shape of the thing that you're trying to talk about. So I monologued a bunch of stuff and then sort of helped me think through the different things that I wanted to talk about right now. So that's the interviewing part of Spiral and like sort of figuring things out. And then it is still—
Dan Shipper
I just want to, I just want to pause you there. I think the interview thing is an interesting thing and it's, it's one of the things that we came to very early on in this process because we were thinking about, oh yeah, how do you build a, how do you build a writer with taste? And one of the, one of the metaphors that we use to think about this is, how would a person do this? And I think that's a really good and important metaphor for building AI products. If you're building an agent first you have to think about, like, how does an actual person do the tasks that the agent is going to do?
And so, you know, I was thinking about for me, in my, in my writing, how, how do I ghost write for people Well. And, in order to do that well, I would need a, I, I need to get a sense of who that person is and like to ask them questions about it. And there's this like, you know, mutual discovery process and you took that and, and like built this basically like interview mode, which I think is, is one of the things that makes this really different is it's not, it's job is not to just spit out the answer right away because that's how you get slop. It does spit out an answer if it knows who you are and what you want and all that kind of stuff.
Because if you've built a relationship with a ghost writer, for example, and you're doing something like what you've done before, that is actually a, can be a pretty quick process. But otherwise a good ghostwriter, their job is to make you feel like you really get underneath the thing that you're trying to do. And, and that's how you make, that's how you ghostwrite something that feels like it came from the person and isn't just like some generic bullshit.
Danny Aziz
Yeah. Yeah. And I think LLMs are, and this actually goes deeper into the story of how else this product works, but LLMs are great at doing that, just sort of helping you see things from an infinite number of perspectives. And I think one of the things that we spend a lot of time with, the prompt for the interviewer inside Spiral is yes. And you sort of, you know, coming from improv, is that it always kind of has to provide value in the questions. Some questions are kind of annoying and if it's not providing value and we try our best for it's yes.
And it is giving me a lot of questions that help me, you know, how do you encode asking good questions to the prompts, which is kind of what you just asked. And so it's thinking through all of the things and to your point of slop, one of the things that we're very intentional on if we're going to use a reasoning model and we're actually going to show the user all of the reasoning. And you can see it all because, you know, we're calling it a writing partner and I think that word partner is really important. And I think it's kind of one of those principles that we've anchored everything around. And with any good collaborator or partner that you're working with, whether it's writing or you know, engineering or design.
(00:10:00)
Understanding the perspective that they're coming from and how they're thinking about the thing helps you understand, oh, this is where they're thinking about it. This is where we're, here's where there's a delta between how I'm thinking about it, how they're thinking about it. Or, oh, they've got it completely wrong. They've understood something incorrectly or differently from how I have. And it's really useful to be able to look at the thinking.
When you start a conversation with Spiral, the thinking is open by default. You don't have to click it open. As opposed to, you know, other AI apps, they do show you the thinking, but you have to click it open. I think that's a very small, tiny, intentional thing if we want you to read it, actually understand what it's doing, because if it's doing something you don't want it to do, you still want people to see that and you could just tell it. Yeah, I think that's super important in building something that everybody else is trying to make content. Here's, you know, even the old version of Spiral is an infinite number of, you know, it'll spit out an infinite number of things, a click of a button. And I think we've gone completely the opposite way, where we've kind of zagged and been let's go slowly. We're going to use the thinking model and it's going to think for a sec for quite a few seconds and slow down, be a bit more intentional, answer the questions. And sometimes it's annoying, sometimes the questions are annoying. But sometimes, and more and more, the questions are really helpful for being, oh yes, that's exactly what I meant. Or it'll ask me, you know, it'll ask me a question and I'd be your way off dude. And here is actually what I mean. But that helps me articulate what I actually mean because you're asking me this question that's way over the left field.
Dan Shipper
Totally. Okay, so I think we've gone through the interview stuff now. Show us, once it's gotten a sense of who you are and what you're trying to do, how does it, you know, we really focus this around short form content. So tweets, LinkedIn posts, emails. You know, other types of short form marketing type stuff. How does it, how do you actually make something with this once it's interviewed you?
Danny Aziz
So usually when you, when you start with it, most people will say, help me write a LinkedIn tweet, X post, email, blah, blah, blah. So usually by the time you've answered quite a few of its questions, it will just be great, I'm ready, let's start writing. But in this case, I'm just going to say, Hey, help me write a banger tweet about this. Even though I started with saying I'm doing a podcast. And the first thing that it's going to do is reason overall the things that you asked it, that you've replied to it with.
And then one of the things that we learned sort of in the journey of building this is that it needed to be a multi-agent system. Having one big model do both the interviewing and the writing just really didn't work. As we tried to layer on more functionality, it just started to break because of the context we're in. All of these labs are talking about 1 million context windows, 1 million token context windows. And it's just that's, yes, it can do that, but how well can it actually pay attention to all those tokens just clearly isn't there yet. So we have a handoff system where the interviewer gives it to the writer, but actually they share the entire same context window.
So it's not a summary of the previous thing. It's not a tool call where the interviewer is summarizing the conversation to the writer. We are literally taking the previous context window and just writing into a new system prompt, sorry, writing into a new system context window, just being, Hey, here's your previous chat. And we do some prompting to make the writer understand what's happening, Hey, you're coming from the interviewer and that kind of stuff. And so we just saw here, it did a bunch of thinking and the first thing that it did is it spouted three options. And we have this UI where you can kind of explore all three and write in one go. And it's thought a lot about what we've said.
The reason that we did this is I think when we first started down this process of creating a new product, I think you were really, really interested in the idea of exploring the sort of infinite canvas of possibilities that LLMs could give. And this was something you were really, really excited about and traversing up the tree and then branching from the tree, the tree thing. Yeah, March, 2025 was a lot of trees and thinking about trees. I actually vividly remember, I think it was a Thursday or a Friday and I was in our old office in Manhattan and you just sent me a fucking tree on the Discord. I was here in Brooklyn. It was probably in Fort Green or something. You just sent me a picture of a tree and I was this fucking guy. Yeah. We were really, really interested about this idea. I mean, it's true, right?
Because they can just produce an infant number of tokens and you can kind of explore all these different ways of saying the same thing. I think what we realized is one from the same context window, every single iteration that you ask it for, it just degrades in quality. And we even see that still today with Spiral. We're still not there yet with its writing ability. But there is still something really interesting there about seeing things from different angles and sort of pruning and being, or recently I've been saying chiseling marble, you start with this blank slate. Maybe it's not the best analogy because you have these. But you kind of pick and choose what works and what doesn't work. So it starts off in this UI you have three options. And very quickly, it gives you sort of titles of what is the, the general gist of this version. So this is an honest realization.
This one's more about the failed experiment and this one is questions drafts. And I think we spend a lot of time thinking about how do you actually interact with these three options because chat is one thing, I can come in here and I can type, but also we kind of want the AI and you to be able to interact with all three of these things. So right now, Spiral can read all three of these things. It has context of what's open. So if I close one, it knows that there's only two open right now. If I close all of them, it'll understand that. But also if I make edits, it understands about the manual edits that I've made. The other thing that we can do is you can just come over here and you can just highlight things. I can say, here's a bunch of stuff across all three that I don't like. Remove them and I can still collaborate with Spiral across all three drafts. It doesn't have to be one draft at a time. And it kind of just allows me to go and traverse the tree. Sort of, but just three at a time.
I don't, there's only so many that I can handle mentally. It's also text. I think one of the, one of the things that we learned over the last six, seven months is I think everybody is really used to, from tools like Midjourney, seeing multiple options when it comes to image generation, but text generation, multiple options is really hard to pass. What the hell the differences are, you can tell, okay, the middle one here is bigger, but is it that much better? And so three kind of felt that perfect middle ground between showing you multiple options but not overwhelming you. So, yeah, that's how it works. And typically when people use it, when I've noticed a lot of our users using it, they sort of start here. They'll find one angle and they're ah, I don't want to go down this angle, I want to go down these two angles.
Dan Shipper
The questions are greater than drafts. I think that's actually the, the, the, the hook actually got me.
Danny Aziz
Right? And so that's typically what happens with people. They use it, they're ah, that one didn't get me, but that one got me. And they'll go. They'll find bits that they and they'll bring them over and then, a couple of more conversations back and forth and they'll find something that they're pretty happy with. So, yeah, that's just generally how it works. We're calling it a writing partner. A big part of that is that it talks to you, has conversations, sorry, asks you questions. And then it drafts and it's just very helpful in drafting across three different things. You can, I can ask it for, Hey, give me a bunch of options for the hook. And we have this UX where it'll generate a bunch of different options within one specific draft and you can kind of very quickly change them and switch them out and see what works and what doesn't work. And we've taken a lot of inspiration from Claude Code. So it has all these tool calls of, it reads the draft, and then it'll then make edits. So I built an arrow, so I can go next. Here's another one, here's another one. I can always go back and see the original. Here's another one.
Dan Shipper
Oh, interesting. I've never even seen this. This is really cool. Yeah, you're building so fast I can't even keep up.
Danny Aziz
Yeah. And so there's a lot of tiny little moments like this that we've added to help make it sort of useful to do that. That's kind of currently where it is. We have, like we've talked about workspaces, we have writing styles, which is very much grounded in what Old Spiral was. You give it a bunch of examples, it tries its best to emulate those, that writing style. And again, I think one of the things that I think I found over this experience is that writing is something that is fluid. I think when we first started this experience, one of the things that I did actually was I read this book, I read Why I Written by George Orwell, and this was written during the Second World War. When I was reading it, I thought this writing kind of sucks.
(00:20:00)
There's some parts of this that I don't, I didn't like. And I think that made me realize how much the way that we talk, the way that we communicate changes so much. And so good writing is so fluid. I think that's something that I've had to come to accept. And we are not training our own model.
I kind of was, how far can we push frontier models? And just with prompting and sort of context management and sort of economics and doing handoffs and that kind of stuff. And there's so much of that training data that is slop. And that is something that I think we're, we are. It's interesting, I remember when we first started this, I was going down this very much rules-based approach. All of my prompts had these rubrics with scores, and I would ask the writer to score itself of how well did it write a hook? And none of that worked. It was all really, it was all really bad. And I think we decided to move more into let's allow it to be fluid. Let's allow it to be flexible.
Let's not give it, let's not constrain it with these really hard rules. And that plus sort of the most recent generation of models, sort of Claude 4 or GPT-5. They unlocked a new level of capabilities that just wasn't possible for, especially for writing, especially for interviewing well and reasoning over a long context window.
Dan Shipper
Yeah, I mean, I love all of that. I think that the thing that's in my head about this is just looking at this and listening to you talk. I'm just, I'm so proud of you. Because where you were eight months ago is so different and where the product was is so different and you've just taken this thing and completely reinvented it. And it's, and in a lot of ways I think you've reinvented the product and I think you've, I don't know if you've reinvented yourself necessarily, but you've, you've certainly pushed yourself in this new direction to really understand what makes writing great. And it's been just awesome to watch. And what I, I want people to see is where you started. So could you open up Old Spiral so we can show them what that looked like so they can get a sense for how different it was? Let's do it.
Danny Aziz
And interestingly, this is not the first version of Spiral. This is. I guess that's true. Yes. So the first version of Spiral was an AI generated slop by Dan. It was, that's my specialty. So this is the current or the previous version of Spiral? It got a bit of a facelift when I first joined about a year ago or so now, and. What it does is you will feed it examples, you would feed it examples of writing that you liked, and then you would give it a new input and it would try its best to recreate those examples that you gave it. So here is one, rough outlines into punchy short tweets is one of the spirals. And the way that it worked is we just gave it a bunch of these examples of things, like, look, Japanese population is cool, Japan's population is collapsing. And what it would do is to take all those examples and it would create basically a style guide. And then the prompting was very simple. You would just give it, let's take maybe just the prompt for that. Let's see what that does with that. And then it would just generate three. And there are people who still use this today. I think there are certain parts of every that still use this for summaries and bullet points of things. And this allowed you to very quickly get some options. I think what was interesting to me was that this was a great first start for last year. I think that's why a lot of people kind of gravitated towards this. It was, oh, it's made well. What it, what it said that it did, it definitely did it. But I think we would look at the outputs that came out of this and we're just, Ugh, it's not there. And then this paradigm didn't allow for collaborating. And I tried to hack it a bunch of ways. I had collaborated with the AI specifically, collaborating with the AI.
And I tried to add on a bunch of things. A lot of people who still use this today will write custom instructions and that's where they will try and push the AI towards where they're actually trying to go. But then it never really worked. Yeah, I think we saw usage declining and personally for me, I never liked the stuff that came out of it and I never found myself going, reaching for it ever for anything. And when I did it was because I was, I should be, it, it very much felt like a should, I should be using the product that I'm claiming to, to wanting to work on. Yeah. But as I've learned over the last couple of years, shoulds are not what I should actually be doing with my life.
And yeah, I think it was, it was also this interesting tension of the people who actually use this thing. There are still people who use this thing today, which is really cool. But I, I just, I'm struggling to care and we are not using it. And I think one of the things that has become very apparent for the product studio over the last couple of months is if we're not using it, it just doesn't matter. It doesn't matter if other people are using it. I think Naveen and Monologue are such good examples of that. He just hammered away for months until he found something that he was really excited about and everybody else was excited about. And then he hammered away until it actually worked. I was actually, I think out of everybody at every, I was the last holdout to use monologue because he just wasn't there yet. And then, and then, and then he got me. And I think that's one of the things that we've learned at the product studio is, I think if we're super excited about it and the people around us are super excited about it, let's just do it and it's going to be great. Yeah. And this, I don't think any of us were excited about this anymore.
The transcript of AI & I with Spiral general manager Danny Aziz is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.
Timestamps
- Introduction: 00:01:00
- How Danny used Spiral to prepare for this podcast: 00:05:26
- Why slowing down makes AI writing better: 00:08:29
- The agents working under the hood for Spiral: 00:13:42
- How Spiral helps you explore the canvas of possibilities: 00:14:46
- Why Danny pivoted away from the old version of Spiral: 00:24:41
- How to use AI without losing your craft: 00:31:51
- Danny’s workflow for building Spiral as a solo engineer: 00:34:55
- Code with AI while staying in control: 00:40:39
- What Danny learned about getting AI to write well: 00:45:26
- How Danny used DSPy to give AI taste: 00:47:52
- Dan versus AI Dan: Can the machine match the man?: 00:56:16
Transcript
(00:00:00)
Dan Shipper
So seven, eight months, we went through a ton of iterations, and then you were like, I want to make this the most beautiful fucking app I've ever made. Like, show us the app right now. Show us what it looks like.
Danny Aziz
All right. Okay. So here we are. This is the new Spiral. And there's like so many tiny moments that we like to think about on this. You know, James and I and Lucas, we all really just like to spend a lot of time thinking about, what is the experience that you wanna have when you come here? And, you know, it's very subtle. This is one thing that I just like, I love it every single time, but we have this slight papery feeling effect. It's noise, but in the background it feels like it's paper. We intentionally chose this typeface.
Dan Shipper
Wait, I wanna pause you. Like, just looking at this, right? There's so much here. There's obviously like, what are we writing today, Danny? There's, and for people who are listening, there's like a, there's the box where you type in the post you wanna write about, and then there's like a plus button where you can attach more context. There's a microphone button where you can talk to it. There's a styles button where you can add styles, but then there's a whole sidebar of different workspaces. And within each workspace you have different products. There are these little icons. It's like, there's one for chorus. If you wanna write about Cora, another product. Like you can go in there and that's all beautifully designed. And then there's like a history. And what I wanna emphasize to your point, is you built all of this by yourself. That is crazy. That would've been totally impossible a year ago. Or two years ago. And so I think you're, I think you're making a really important point about what you can achieve with these tools if you use them well and you wanna make something great.
Danny Aziz
Yeah. I think it really, you know, allows you to go from an idea to something really quickly. And I think you can go from something to like, something beautiful also really quickly and, you know, we, I worked with some really talented people like Lucas James. You know, we all jumped in here and I think the really cool thing was that it allowed, you know, both Lucas and James who are not engineers to like mess around and claw and be like, oh, can we figure this out? Like, to your point of, you know, being able to speak to it, we have like a monologue integration and we have this like dithering animation and I think Lucas made the first one in like a code sandbox using Claude. And then we like it. From there and then mess around in Figma. So like, it also allows a bunch of people to jump in and, you know, who aren't necessarily engineers to like, help play around and like, take it to that final place.
Dan Shipper
So that makes a lot of sense. I, you know, for people who are not as familiar with every ecosystem, monologue is another product that we build internally. And so now Spiral connects to monologue, which is super cool, and that would actually be kind of fun to talk about. But why don't you show us how Spiral works, just so we get a sense for once you're like opening the Pandora's box of it, like what is possible and then what's what's really interesting to me right now is I wanna learn more about your workflow because I think we do a ton of cloud code content on every, and you are actually a droid boy. And I feel like you have some ways of working that are maybe a little bit different from some of the stuff that we've done previously. And this is really making me, you know, one of the things I feel about you is just from looking at your hair and how beautiful your hair is, and also that leather jacket you're wearing. You really have this, and you've said this before about yourself. You have this like, measure twice cut once mentality where I, you're just like, you're a craftsman. And I think that that has come through in this product, it has come through and how you think about building a tool for writers. And I think that to a point you made earlier, it's sort of at odds with what people think you can build with AI, it's like, oh, you're just gonna make slop and actually you can build beautiful things with AI. And I think you're a really good example of that. So let's start with using Spiral.
Danny Aziz
Okay. Thank you for those lovely words by the way. You're welcome. So I actually prepared beforehand talking about this interview with Spiral, and I thought that could be kind of meta to go through.
So first of all, I did it inside the Spiral workspace and the whole idea of workspaces. It's kinda like projects in ChatGPT or Claude, which is like, it just has documents and stuff and context beforehand. And so it knows what you're talking about. And so I actually said like, Hey, I'm about to jump on a podcast with Dan to talk about you. Help me prepare some notes and what it actually did. Oh, that was a little janky. I gotta fix that. But one of the things that it did was that it asked itself what do I know about myself? And it kinda like, thought about it and it sort of spat it all out and it was like, Hey, here are a bunch of things that you could talk to about Dan. Like most AI writing is slop. Spiral has taste. We can talk about what makes Spiral different. It chats first and it interviews. It explores three different angles. These are all things to come. We can also talk about the evolution story. So Spiral like suggested a bunch of things to me. And then I could, you know, it. Then the first thing that Spiral does is it asks questions. It always wants to know more concrete details from you. And so it asks, would you like me to expand or help you through spec? Think through specific stories or examples that you could share.
And so I said, actually, one of the other things I wanna talk about is like the sheer number of iterations that we went through testing and experimenting. Then Spiral again. Look through what it knew of the story of Spiral and kind of came through here are these different arcs, you know, the realization the old one isn't working, the pivot. And then it started asking me questions for each one. What was the first thing that I tried differently? When did the chat first insight emerge? How many iterations did I go through? What did you learn about what makes writing good that surprised you? And so like it's gone through its understanding of itself, but it's asking me questions all along the way to really figure out concretely what are the, what the hell is it that I'm actually trying to say.
And this is super useful because I think a lot of the times when I've gone to old Spiral and other AI tools like. Ask for something, it may ask questions, but it's usually you have to prompt it to ask you questions. And the questions don't really help me really think about what I'm saying that well. We had an early user who said something about Spiral. He said that Spiral helped him go downstream from the original thought that he shared with Spiral, as opposed to it just being a repurposing of the original thought that he shared. And I think that like, put it really like, well and succinctly of like Spiral will help you take that initial like jumble of thoughts that you had and go a little bit further with it from these questions, which I think is like super useful in figuring out what it is that you're trying to say.
And I guess it's also like seeing it from different perspectives also helps really like understanding the shape of the thing that you're trying to talk about. So I monologued a bunch of stuff and then sort of helped me think through the different things that I wanted to talk about right now. So that's the interviewing part of Spiral and like sort of figuring things out. And then it is still—
Dan Shipper
I just want to, I just want to pause you there. I think the interview thing is an interesting thing and it's, it's one of the things that we came to very early on in this process because we were thinking about, oh yeah, how do you build a, how do you build a writer with taste? And one of the, one of the metaphors that we use to think about this is, how would a person do this? And I think that's a really good and important metaphor for building AI products. If you're building an agent first you have to think about, like, how does an actual person do the tasks that the agent is going to do?
And so, you know, I was thinking about for me, in my, in my writing, how, how do I ghost write for people Well. And, in order to do that well, I would need a, I, I need to get a sense of who that person is and like to ask them questions about it. And there's this like, you know, mutual discovery process and you took that and, and like built this basically like interview mode, which I think is, is one of the things that makes this really different is it's not, it's job is not to just spit out the answer right away because that's how you get slop. It does spit out an answer if it knows who you are and what you want and all that kind of stuff.
Because if you've built a relationship with a ghost writer, for example, and you're doing something like what you've done before, that is actually a, can be a pretty quick process. But otherwise a good ghostwriter, their job is to make you feel like you really get underneath the thing that you're trying to do. And, and that's how you make, that's how you ghostwrite something that feels like it came from the person and isn't just like some generic bullshit.
Danny Aziz
Yeah. Yeah. And I think LLMs are, and this actually goes deeper into the story of how else this product works, but LLMs are great at doing that, just sort of helping you see things from an infinite number of perspectives. And I think one of the things that we spend a lot of time with, the prompt for the interviewer inside Spiral is yes. And you sort of, you know, coming from improv, is that it always kind of has to provide value in the questions. Some questions are kind of annoying and if it's not providing value and we try our best for it's yes.
And it is giving me a lot of questions that help me, you know, how do you encode asking good questions to the prompts, which is kind of what you just asked. And so it's thinking through all of the things and to your point of slop, one of the things that we're very intentional on if we're going to use a reasoning model and we're actually going to show the user all of the reasoning. And you can see it all because, you know, we're calling it a writing partner and I think that word partner is really important. And I think it's kind of one of those principles that we've anchored everything around. And with any good collaborator or partner that you're working with, whether it's writing or you know, engineering or design.
(00:10:00)
Understanding the perspective that they're coming from and how they're thinking about the thing helps you understand, oh, this is where they're thinking about it. This is where we're, here's where there's a delta between how I'm thinking about it, how they're thinking about it. Or, oh, they've got it completely wrong. They've understood something incorrectly or differently from how I have. And it's really useful to be able to look at the thinking.
When you start a conversation with Spiral, the thinking is open by default. You don't have to click it open. As opposed to, you know, other AI apps, they do show you the thinking, but you have to click it open. I think that's a very small, tiny, intentional thing if we want you to read it, actually understand what it's doing, because if it's doing something you don't want it to do, you still want people to see that and you could just tell it. Yeah, I think that's super important in building something that everybody else is trying to make content. Here's, you know, even the old version of Spiral is an infinite number of, you know, it'll spit out an infinite number of things, a click of a button. And I think we've gone completely the opposite way, where we've kind of zagged and been let's go slowly. We're going to use the thinking model and it's going to think for a sec for quite a few seconds and slow down, be a bit more intentional, answer the questions. And sometimes it's annoying, sometimes the questions are annoying. But sometimes, and more and more, the questions are really helpful for being, oh yes, that's exactly what I meant. Or it'll ask me, you know, it'll ask me a question and I'd be your way off dude. And here is actually what I mean. But that helps me articulate what I actually mean because you're asking me this question that's way over the left field.
Dan Shipper
Totally. Okay, so I think we've gone through the interview stuff now. Show us, once it's gotten a sense of who you are and what you're trying to do, how does it, you know, we really focus this around short form content. So tweets, LinkedIn posts, emails. You know, other types of short form marketing type stuff. How does it, how do you actually make something with this once it's interviewed you?
Danny Aziz
So usually when you, when you start with it, most people will say, help me write a LinkedIn tweet, X post, email, blah, blah, blah. So usually by the time you've answered quite a few of its questions, it will just be great, I'm ready, let's start writing. But in this case, I'm just going to say, Hey, help me write a banger tweet about this. Even though I started with saying I'm doing a podcast. And the first thing that it's going to do is reason overall the things that you asked it, that you've replied to it with.
And then one of the things that we learned sort of in the journey of building this is that it needed to be a multi-agent system. Having one big model do both the interviewing and the writing just really didn't work. As we tried to layer on more functionality, it just started to break because of the context we're in. All of these labs are talking about 1 million context windows, 1 million token context windows. And it's just that's, yes, it can do that, but how well can it actually pay attention to all those tokens just clearly isn't there yet. So we have a handoff system where the interviewer gives it to the writer, but actually they share the entire same context window.
So it's not a summary of the previous thing. It's not a tool call where the interviewer is summarizing the conversation to the writer. We are literally taking the previous context window and just writing into a new system prompt, sorry, writing into a new system context window, just being, Hey, here's your previous chat. And we do some prompting to make the writer understand what's happening, Hey, you're coming from the interviewer and that kind of stuff. And so we just saw here, it did a bunch of thinking and the first thing that it did is it spouted three options. And we have this UI where you can kind of explore all three and write in one go. And it's thought a lot about what we've said.
The reason that we did this is I think when we first started down this process of creating a new product, I think you were really, really interested in the idea of exploring the sort of infinite canvas of possibilities that LLMs could give. And this was something you were really, really excited about and traversing up the tree and then branching from the tree, the tree thing. Yeah, March, 2025 was a lot of trees and thinking about trees. I actually vividly remember, I think it was a Thursday or a Friday and I was in our old office in Manhattan and you just sent me a fucking tree on the Discord. I was here in Brooklyn. It was probably in Fort Green or something. You just sent me a picture of a tree and I was this fucking guy. Yeah. We were really, really interested about this idea. I mean, it's true, right?
Because they can just produce an infant number of tokens and you can kind of explore all these different ways of saying the same thing. I think what we realized is one from the same context window, every single iteration that you ask it for, it just degrades in quality. And we even see that still today with Spiral. We're still not there yet with its writing ability. But there is still something really interesting there about seeing things from different angles and sort of pruning and being, or recently I've been saying chiseling marble, you start with this blank slate. Maybe it's not the best analogy because you have these. But you kind of pick and choose what works and what doesn't work. So it starts off in this UI you have three options. And very quickly, it gives you sort of titles of what is the, the general gist of this version. So this is an honest realization.
This one's more about the failed experiment and this one is questions drafts. And I think we spend a lot of time thinking about how do you actually interact with these three options because chat is one thing, I can come in here and I can type, but also we kind of want the AI and you to be able to interact with all three of these things. So right now, Spiral can read all three of these things. It has context of what's open. So if I close one, it knows that there's only two open right now. If I close all of them, it'll understand that. But also if I make edits, it understands about the manual edits that I've made. The other thing that we can do is you can just come over here and you can just highlight things. I can say, here's a bunch of stuff across all three that I don't like. Remove them and I can still collaborate with Spiral across all three drafts. It doesn't have to be one draft at a time. And it kind of just allows me to go and traverse the tree. Sort of, but just three at a time.
I don't, there's only so many that I can handle mentally. It's also text. I think one of the, one of the things that we learned over the last six, seven months is I think everybody is really used to, from tools like Midjourney, seeing multiple options when it comes to image generation, but text generation, multiple options is really hard to pass. What the hell the differences are, you can tell, okay, the middle one here is bigger, but is it that much better? And so three kind of felt that perfect middle ground between showing you multiple options but not overwhelming you. So, yeah, that's how it works. And typically when people use it, when I've noticed a lot of our users using it, they sort of start here. They'll find one angle and they're ah, I don't want to go down this angle, I want to go down these two angles.
Dan Shipper
The questions are greater than drafts. I think that's actually the, the, the, the hook actually got me.
Danny Aziz
Right? And so that's typically what happens with people. They use it, they're ah, that one didn't get me, but that one got me. And they'll go. They'll find bits that they and they'll bring them over and then, a couple of more conversations back and forth and they'll find something that they're pretty happy with. So, yeah, that's just generally how it works. We're calling it a writing partner. A big part of that is that it talks to you, has conversations, sorry, asks you questions. And then it drafts and it's just very helpful in drafting across three different things. You can, I can ask it for, Hey, give me a bunch of options for the hook. And we have this UX where it'll generate a bunch of different options within one specific draft and you can kind of very quickly change them and switch them out and see what works and what doesn't work. And we've taken a lot of inspiration from Claude Code. So it has all these tool calls of, it reads the draft, and then it'll then make edits. So I built an arrow, so I can go next. Here's another one, here's another one. I can always go back and see the original. Here's another one.
Dan Shipper
Oh, interesting. I've never even seen this. This is really cool. Yeah, you're building so fast I can't even keep up.
Danny Aziz
Yeah. And so there's a lot of tiny little moments like this that we've added to help make it sort of useful to do that. That's kind of currently where it is. We have, like we've talked about workspaces, we have writing styles, which is very much grounded in what Old Spiral was. You give it a bunch of examples, it tries its best to emulate those, that writing style. And again, I think one of the things that I think I found over this experience is that writing is something that is fluid. I think when we first started this experience, one of the things that I did actually was I read this book, I read Why I Written by George Orwell, and this was written during the Second World War. When I was reading it, I thought this writing kind of sucks.
(00:20:00)
There's some parts of this that I don't, I didn't like. And I think that made me realize how much the way that we talk, the way that we communicate changes so much. And so good writing is so fluid. I think that's something that I've had to come to accept. And we are not training our own model.
I kind of was, how far can we push frontier models? And just with prompting and sort of context management and sort of economics and doing handoffs and that kind of stuff. And there's so much of that training data that is slop. And that is something that I think we're, we are. It's interesting, I remember when we first started this, I was going down this very much rules-based approach. All of my prompts had these rubrics with scores, and I would ask the writer to score itself of how well did it write a hook? And none of that worked. It was all really, it was all really bad. And I think we decided to move more into let's allow it to be fluid. Let's allow it to be flexible.
Let's not give it, let's not constrain it with these really hard rules. And that plus sort of the most recent generation of models, sort of Claude 4 or GPT-5. They unlocked a new level of capabilities that just wasn't possible for, especially for writing, especially for interviewing well and reasoning over a long context window.
Dan Shipper
Yeah, I mean, I love all of that. I think that the thing that's in my head about this is just looking at this and listening to you talk. I'm just, I'm so proud of you. Because where you were eight months ago is so different and where the product was is so different and you've just taken this thing and completely reinvented it. And it's, and in a lot of ways I think you've reinvented the product and I think you've, I don't know if you've reinvented yourself necessarily, but you've, you've certainly pushed yourself in this new direction to really understand what makes writing great. And it's been just awesome to watch. And what I, I want people to see is where you started. So could you open up Old Spiral so we can show them what that looked like so they can get a sense for how different it was? Let's do it.
Danny Aziz
And interestingly, this is not the first version of Spiral. This is. I guess that's true. Yes. So the first version of Spiral was an AI generated slop by Dan. It was, that's my specialty. So this is the current or the previous version of Spiral? It got a bit of a facelift when I first joined about a year ago or so now, and. What it does is you will feed it examples, you would feed it examples of writing that you liked, and then you would give it a new input and it would try its best to recreate those examples that you gave it. So here is one, rough outlines into punchy short tweets is one of the spirals. And the way that it worked is we just gave it a bunch of these examples of things, like, look, Japanese population is cool, Japan's population is collapsing. And what it would do is to take all those examples and it would create basically a style guide. And then the prompting was very simple. You would just give it, let's take maybe just the prompt for that. Let's see what that does with that. And then it would just generate three. And there are people who still use this today. I think there are certain parts of every that still use this for summaries and bullet points of things. And this allowed you to very quickly get some options. I think what was interesting to me was that this was a great first start for last year. I think that's why a lot of people kind of gravitated towards this. It was, oh, it's made well. What it, what it said that it did, it definitely did it. But I think we would look at the outputs that came out of this and we're just, Ugh, it's not there. And then this paradigm didn't allow for collaborating. And I tried to hack it a bunch of ways. I had collaborated with the AI specifically, collaborating with the AI.
And I tried to add on a bunch of things. A lot of people who still use this today will write custom instructions and that's where they will try and push the AI towards where they're actually trying to go. But then it never really worked. Yeah, I think we saw usage declining and personally for me, I never liked the stuff that came out of it and I never found myself going, reaching for it ever for anything. And when I did it was because I was, I should be, it, it very much felt like a should, I should be using the product that I'm claiming to, to wanting to work on. Yeah. But as I've learned over the last couple of years, shoulds are not what I should actually be doing with my life.
And yeah, I think it was, it was also this interesting tension of the people who actually use this thing. There are still people who use this thing today, which is really cool. But I, I just, I'm struggling to care and we are not using it. And I think one of the things that has become very apparent for the product studio over the last couple of months is if we're not using it, it just doesn't matter. It doesn't matter if other people are using it. I think Naveen and Monologue are such good examples of that. He just hammered away for months until he found something that he was really excited about and everybody else was excited about. And then he hammered away until it actually worked. I was actually, I think out of everybody at every, I was the last holdout to use monologue because he just wasn't there yet. And then, and then, and then he got me. And I think that's one of the things that we've learned at the product studio is, I think if we're super excited about it and the people around us are super excited about it, let's just do it and it's going to be great. Yeah. And this, I don't think any of us were excited about this anymore.
Dan Shipper
Yeah. I think there's a couple of lessons in here for me, one of them is, yeah. When I first built the first version of Spiral, it was very cool. And, but things move so fast in AI that because of it, it stayed with this sort of text transformation thing. As soon as the models got better, it needed to be totally different. And we just didn't do that basically. And I, and, and so that's one thing. Another thing is, it really underscored for me how important a daily use product is. And it made me think about, well how useful is this sort of text transformation for a writer and how often am I going to need to do that? And, and, and also how complicated it is to actually produce really good content and how hard it's to do that in a one shot way, and that we needed something more dynamic.
But I think the biggest thing is. I built the first version of this and then handed it off to you. And this is something that we've seen a lot actually now we've seen it with you. We've seen it with Yosh who is building sparkle, which is something that I built in 2020. And there's this process that a GM has to go through if you're working on something that's not yours, to fully own it because, I think seven or eight months ago, whenever that was, you came to me and you're like, I don't know if I'm the right guy for this. I don't know if I should be working on this. And you were just banging your head against the wall being, I got to get the MRR up. And, and that sucked to see, and when we dug into it together, it was clear that you didn't have a vision for it because it didn't feel like yours. And then that was really demotivating. And for me, I was, well, I mean I have a thousand things, I can give you a bunch of stuff, but I don't think that solves the problem. Because I'm not the GM.
You have to solve the problem and I can be there to support and help and whatever. And I sort of put it to you, is this what you want to spend your life on? Do you want to work on this or do you want to do something else? Because if you want to do something else, one of the, one of the lovely things about what the setup we have at Every is you can do something else. It's not, the funding runs out and then the startup fails. It's, okay, if this didn't work, great, let's do something else. Let's keep experimenting. And I think you went back and thought a lot about, do I care about this problem? Do I really want to make great writing or. I think what you came from, what you came to is the thing that motivates you is the great writing bit and the text transformation for marketing was not, not that interesting to you. Tell me about that. I think, yeah, the journey was really interesting.
(00:30:00)
Danny Aziz
I think it was March, I'd spent January, February and March just dragging myself out of bed, just feeling guilt. And yeah, we had that conversation and I think the thing that I realized was that good writing is downstream of good thinking. And that is a really interesting problem everybody is going towards. Let me just generate a thousand things, make a million blog posts, put out a bunch of tweets, AI generated replies and so on. One I just think is, where do I want to, what do I want to do myself? I think I want to be intentional with what I write. I want to be intentional with what I think. I want to be intentional with what I say. And, but that doesn't mean that I have to shun modern tools. I have to shun away this crazy, we've made sand think I'm not going to say no to that. But how can I use it in a way that it actually helps yes and me or gives me that, you know, it's the sum greater than the equal than its parts or whatever that phrase is.
Dan Shipper
And I think that the whole is greater than the sum of its parts.
Danny Aziz
That's exactly what I was trying to say. Dan Shipper steals hats and he steals words out of my mouth. We cut that.
Dan Shipper
I think I said that in our Q4, Q4 kickoff. So who's stealing from who? That's fair.
Danny Aziz
And yeah, I think just. I think there's something really powerful with good writing or it's not even about good writing, articulating what you want to say to the world. Well, I can do so many things. You know, not to go all crazy and be, this changes the world. But good writing, good thinking, good articulation moves mountains. It moves people, it gets people to do crazy things that they think are impossible. It gets people to shun the status quo. It gets people to do all sorts of things. And I worry about a world where people are, ah, cool, I'll just ChatGPT, do all the thinking for me. I'll let Claude do all the thinking for me. It's, no, it's actually just, let's use it intentionally. And I think what that requires is, you know, if, I think, if you think, if you go back to the first podcast that we did, I actually said this, where it's, most people aren't good prompters. 99 percent of people who are, who are using AI have probably rubbish prompters. And I still think there is this beautiful space for building products where it's, Hey, don't worry about the prompting. Just use the product well. And if the product is designed and prompted well to do the thing that I think it should be out in the world, which is taking a beat and actually thinking about what you're writing, AI can help you do that. But I think the models as they are people, the positive least resistance is just give me this, just write, don't make me think. And I don't, I don't think we should exist in that world. And I, I think deep down people don't want to exist in that world. And AI, we don't have to be Luddites, we don't have to shun away AI to, to, to still have that world. We can have both.
Dan Shipper
I totally agree. And I think that is a really natural segue into how you are building this and what are the things that you're using to make such a polished product. Fully featured product, useful product as one engineer.
Danny Aziz
Yeah. It's, you know, to your point that the models just changed so quickly. The tooling changes so quickly. If I think back to where we were in March when we first started this, I was, I was probably bouncing around from cursor to Windsurf and that was it. And
Dan Shipper
Windsurfing was the new thing. It was, whoa, this is actually, it was the first thing that was a little bit agentic.
Danny Aziz
Yeah. They had their Cascade feature. And Cascade was so different and it was agentic and it would do all this stuff. And now I'm in my terminal all of the time. I, you know, the one thing I regret is learning how to use Vim because I never use Vim anymore. I'm never bouncing around a text editor in Vim anymore. Shout out to all the people who know how to use Vim. And so now I'm completely inside my terminal for a very long time, it was all Claude Code. And as people who know about Every, we all love Claude Code. I think Claude Code was that first thing that kind of got rid of the fluff. You don't need an IDE, you don't need all of this. It's just text input, give the agent a bunch of tools, which, and it's actually not that many tools. It's read and edit, which is kind of all it did to begin with and make to-dos for yourself.
I think that's the first three tools that Claude Code probably had. And use Bash. And Use Bash, right? So four things. And it was amazing. The models themselves were really good. And then Claude, I think, I think there was this moment, at least this year, who knows, probably by next year it's not going to matter. But Claude 4 I think for me feels like such a pivotal moment in this space because I think it just unlocked so much capability in programming, writing, and a lot of things. The other thing to say is that it also opened up the space for everybody else to try this. So there's so many CLI tools now.
We have AMP, OpenAI have their Codex, and most recently I've been using factory droids. And I've completely fallen in love with Droid. And one of the reasons why, and we can jump into it, I'll share my screen, but I think one of the reasons why is that Droid is model agnostic. They give you five models to play with. They give you GPT-5 and they give you all the anthropic models. And then they most recently have the GLM 4.6 model. The other thing that I think they've done really well is they've really thought about the ergonomics for each model. They're clearly prompting it differently. They've really thought about, okay, we're only going to support this model and how the models interact with each other. And the ergonomics work really well, where I will literally have the same type of request inside Claude Code and another one inside Droid and Droid will just do so much better with the same model. This is, I'll give you a very concrete example.
I had a database migration that completely fucked up, and I was actually in a car going to Ikea to New Jersey. And so I was in the car going over the Williamsburg Bridge, New Jersey, not Red Hook. Talking to my wife is okay. She, she, she, she, she ordered the thing at the wrong place. And so yeah, she's driving this car, going over the Williamsburg Bridge. I'm there in the passenger seat just going insane. The Claude Code leading this database migration's fucked. And then I just took the same thing over, same model. It was Opus 4.1 and it just got it three minutes. It was, oh, here's the issue. Got it. And so they've given it. They've done a better job than Claude or the Anthropic team have in giving the model the ergonomics and the tools that it needs. It's, oh, here are the tools that I need and here's the, I think the right prompting. I don't know what else they're doing, but there's something about it where it's, I feel so much more confident coming to Droid that I just don't feel in some of the other tools. I think the other thing about what your original question was, how would I build this as a single engineer? I also think I always get shocked every time I talk to other people. We at Every, we're clearly at the edge of how to use these things.
I think for a very long time I was the number one user of Droid. I've been ripping the most tokens and it surprises me. So I'm on their max plan and it was October 3rd to October 24th, or something like that. I've already run out of tokens. I ran out of tokens well before time. And you got to call Ben Tassel and get you, get me some Morelos more credits. Yeah. Because yeah, I think we are just ripping them so much and I think going into the deep end, there's a lot of people for, maybe not now, but the last time I spoke to people, they weren't, they wouldn't use Yellow Mode.
They were afraid of using yellow mode. And I think that's kind of a foregone conclusion for us. It was, of course you're going to use yellow mode. Why are you going to accept everything that it does? You're just going to let it do it. So yeah, I think really leaning in, I think letting go of the identity of, I'm an engineer and I'm going to write, I'm going to use Vim and I'm going to write code and blah, blah blah. Letting go of that identity and being, I am. When you really peel back the layer of what I think I am is that I'm somebody who just makes things and I don't really care how I make them. I do, I care about how it's built and I clearly care about the end product, but I don't care if I have an agent writing the code or if I'm writing the code. As long as it's good code and it does what I want it to do, does it really matter? You know.
Dan Shipper
But how do you do that? Because I think a lot of people listening or watching are saying, cool, great, but how do you actually be a craftsman if something else is coding for you? Isn't that all just going to be slop?
Danny Aziz
Let me open up my terminal before I share my screen. I think it's a philosophy of, you know, you brought this up right at the top of measuring the top twice and cutting once. And I, and I think this is actually the one thing that I've learned from, from trying to build something that writes well, which is also just looking at something and feeling internally, what actually calls out to me and does that feel right or does it not feel right? And I think you could just, you can do that with, with a button. Does that button feel right or not? Does that spacing on that thing feel right? Not just does the code feel right? But does the thing that I expect, the thing that Claude just ripped out, does it do it the way that I wanted to do, but also further does it, does it add to the piece of the puzzle in the right way? LLMs can skin a cat a million different ways. There are probably only a handful that you actually should be using that are right, or actually fit the, the, the way that you want it to do.
(00:40:00)
So I think a lot of building this product has been feeling out, okay, when I ask Claude to do something or one of the other models, does it actually do it the way that I want to do it? So if, oh, if I share my screen, a lot of the ways that I use. And, and you could do a lot of this in, in Claude Code as well. But a lot of the ways that I use these tools is one, I'm always doing multiple things at once. I think waiting around for the model to complete. And I think the worst habit that I got into that I had to cut out was scrolling Twitter whilst I waited for an agent to complete. That is just a killer to productivity for me.
Dan Shipper
So what is it that you do, because you do a thing that I actually started to copy for this. How do you stop yourself from scrolling Twitter? So work tree is the big thing, so Oh, no, no, I was saying, so what I've noticed you do is you take your phone and you put it in another room.
Danny Aziz
So I've started putting my phone next to yours and now we've confused whose phone this is. Yeah. So we have a couple of places in the office where there's just a couple of ledgers or we have this beautiful Every sign. To talk about intentions, one of the things that I try to do the moment I walk in is I take my phone out of my pocket and I just place it face down on one of these ledgers. And I leave it there and I try my best not to look at it. And also, I'm always on not to disturb, as you can see here on the top right corner. I'm not disturbed. I have notifications off, I have an Apple watch, but this notifies me about nothing. And I think it's really easy to be, oh, now because of this tooling, I can do the job of three people as one person.
And I have seen in myself and in other people that actually what people end up doing is they end up doing the job of one person and then they just procrastinate or they do the other things in their life that they think they want to do. And I noticed in myself that I wasn't happy doing that. I really am somebody who's all in. And when I'm working, I'm working, don't talk to me about anything else. My wife hates that about me. And then if I'm not working, I'm not working. I'm trying to be present and intentional. If I'm with somebody, I'm not texting other people at the same time. And that's been really helpful.
And I think it is also really important when working with LLM models, because programming before was, I would look at the code, I would spend a lot of time and attention reading it. Tests would pass. I would spend a lot of time. And now it's, no, I'm just going to let it go rip and I'm going to go do something else. And so what I think that's allowed, and it requires being really intentional, is just can I do parallel things? Can I just do multiple things at once? And so one of the ways that manifests is in one terminal tab, I'll have two panes, or sometimes I'll have three panes open. And I'm usually ripping droids on a bunch of different things. So here, this one I was fixing a bug. This one, earlier today, Kate gave me some copy edits inside the app. So I had the droid go through and do those copy edits. But then here in another window, I've actually been messing around with some of our judging prompts and I've been using Python notebooks. So I'm writing up Python notebooks with a droid. I had this other one here, which is the policy for the judging.
Dan Shipper
Prompts are, judging whether writing is good. And that's the sort of core component of making good writing is getting the AI to recognize if writing is good or not.
Danny Aziz
And I think I will go on a quick tangent. Really, I think the one thing that we saw about Claude 4 Opus specifically, back in June or July, or whenever it shipped, it's really good at judging writing and not rubber stamping. I think models before this would, you would give it some of its own writing, and it was, yeah, my writing's great, it's a B plus. Or you would give it to others, you know, other people's writing or other agents' writing and it would always be a B plus. And there were a lot of, you know, tricks that people would do, give it a rubric and then tell it that actually a seven out of 10 is impossible. Forcing it to kind of actually change it from, but then everything would just be a six or a 5.8.
And what we found was that Opus was actually really good at reliably, repeatedly giving good answers and good prose as to why it was saying yes or no for something. So we have this one test internally where we see if writing is engaging, does each sort of sentence keep a reader engaged as you go through? And we found that Opus was the first, Opus 4 was the first model that would reliably almost every single time say the same answer and it would say the same answer that we would agree with. That then also came over to Sonnet 4.5 and somewhat now to Haiku 4.5.
And that's a really important part of the product. But it's also really important for us building this product and for me, building this product of where is it that this needs to get better? And this is actually a feature that we're going to build directly into the product. So Spiral can kind of judge itself using another agent to see where else it can improve. And this kind of helps us create this feedback loop of being able to watch how good or bad are the outputs that come out of it. Also with people just telling Spiral, I don't like this. I think it's really the benefits of having, you know, the one thing that we didn't have from the old product, the old product only had thumbs up and thumbs down and that was really useless with a chat-based product. People just say, I don't like this because, and you know, we have, we're using something called Raindrop, which is basically like Sentry for AI agents and we just get a report every single day of this is what people said they liked and this is what people said they didn't like and here's why. And I can just see the chat and it surfaces it all up for me. So. Being able to see how people use the product and just read what they're saying to Spiral is really helpful in improving it.
Dan Shipper
One of the interesting tangents that we've been down over the last couple weeks is, and this is one of those things where I only ever read it, so I don't know if this is actually the right way to say it, but DSPY is how I say it in my head. For other people, maybe it's DSPY, I don't know. But basically there's this prompt optimizer that is sort of having a moment right now among AI people where it will, basically you can give it a prompt and an objective and it'll optimize your prompt for that objective and is a specific method for doing that that I can't totally explain, but it's a new and interesting way of doing it. And so one of the experiments that we've tried is giving me a bunch of tweets and then having me thumbs up and thumbs down them, and then giving concrete feedback to see if we can generate a judge that has my taste. Do you want to talk about that and what we found, what's worked and what hasn't worked?
Danny Aziz
Yeah. So we have a great piece on Every with Michael Taylor around DSPY. He's actually the one, he came here from the UK and sat in the office and actually taught us how he uses it.
Dan Shipper
Does he say DSPY?
Danny Aziz
I don't know, but I think it's DSPY. I think DSPY sounds so much better than DSPY. Even so, I guess DSPY is probably the right way to say it. And yeah, it is this when people look at it for a very long time this year, I've been going on their landing page and I'm, I have no clue what this thing is. And he did a really good job of explaining, and I think we have a video out where he's actually teaching me and Kira how to use this. Where he finds just ignore 80 percent of what it is and just use it as an optimizer. And when you just use it as an optimizer, it makes so much sense. You don't have to throw away all of your frameworks that you might be using already in your app if you're using the AI SDK or whatever. Just use it as an optimizer, extract a prompt, put it in your product, because it's very much built to be this all encompassing framework, but it's just really confusing. I find it to be, but yeah, using it as a prompt optimizer is great. And what we, and I think we can actually look at that judge that we created for you. From labels, no. Let's generate from labels. Sorry. There we go. The way that it works is you basically just give it a bunch of predefined labels with, you know, generate a tweet about blah, and then we had you literally give it a thumbs up or a thumbs down and then just write out your thoughts. And we would just start with a basic prompt to begin with. See how good it did? So the base prompt was about 58.8 percent accurate to you. And that was just a prompt that I asked Claude to come up with. So just over half of the time it would say the same thing that you would say.
Dan Shipper
It's not terrible, but terrible. Yeah, yeah.
Danny Aziz
Yeah. But, you know. You could flip a coin basically. Yeah, that's true. I guess that's true, right? You're totally right.
Dan Shipper
Yeah. It's terrible.
Danny Aziz
Especially if, you know, this is just for writing marketing materials, but 50 percent. Yeah. If you're trying to do something important, this is really bad. Yeah, yeah, yeah. And the way that it works is then, DSPY and I, I also can't speak to exactly how it works, but my understanding is, is that you, you give it a metric for how it can, what it will do is it will generate a new prompt and then it'll test that prompt against your judges and the data that you've given it. And you'll see how close this new prompt is generated. Get to the sort of gold standard, which is from you in this example. And then goes, okay, this prompt that I tried wasn't that good, so let me try another prompt. And so the way that I kind of see it is that it is just trying as many prompts as it can until it finds something that's better.
(00:50:00)
So it really is just sort of throwing spaghetti on a wall and seeing if it sticks. You know, it will rip through tokens a lot. You can basically set it to a max number. So here I have max five full evals. So it goes through five means it goes through 85 iterations. I don't really know what that means, but whatever. And so here you can see it's just, here's one of the prompts it came up with, and then it came up with another prompt. And you know, that prompt was 66.7 percent accurate. And so it keeps going until it gets a little bit better. You know, here's another one that it tried that was 33.3 percent accurate, so way worse. And so the idea is just the inverse.
Dan Shipper
Of that prompt and it would be exactly.
Danny Aziz
And so it's just, you know, it, it learns and it tries and it learns from trial and error, which I think is cool, which is when I think about how I was prompted for very subjective things like this. It was just a kind of trial and error. And I would work with Claude and be, okay, the output kind of came out like this, but it, it, it, it was missing this thing. Yeah. Give me a suggestion or, you know, one of the things that I would do a lot before this, and I still do this, is I would find other people's prompting guides. And I would say to Claude, distill down the principles of these prompting guides, and then help me. And that's just somewhat useful. But I think this, at least for creating judge prompts, is a bit more, it feels a little bit more scientific. Maybe it's because we're in a Python notebook and there's percentages and stuff. But it's also allowed it to be, oh, I can very clearly see the before and after. So then finally it spat out this prompt that was 76.5 percent accurate, which is a lot better than half. Again, not perfect, but we're getting closer and closer there. And then it sort of goes through it and we can actually use it. So maybe what could be fun is if I stop sharing my screen, oh, that might not work.
Dan Shipper
Before, before you go there, before you go there, I just want to know what a DSPY is, Dan. What is in the prompt? Like, give me a couple things that are in the prompt because I, I'm just, one of the things that's that's really cool about this is it's telling me stuff about how my brain works that I might not even know.
Danny Aziz
Actually the prompt is so, it's actually very simple. If you asked me to write a prompt like this, write a prompt that emulates Dan's judging. I would not write something this simple, which I think is kind of counterintuitive. And, and it, this is, it's just a simple list of things. And, and I don't think this is perfect. And it doesn't work all the time.
And if I was to say, you know, in the dataset we gave it, it said 76 percent, I feel like it's probably more like 65 percent accurate to what I would perceive as actually good content vs bad content. So, it definitely still needs some work, but it's some very simple things like simple concrete, specific language, not vague generalities. And I think what's really interesting about this is past generations of prompting would be really, really specific on the prompt of well, what is simple, concrete, specific language? And it would have, here are a bunch of examples of what a really simple, concrete, specific language is. And what we've seen from doing this process is actually just letting the model breathe and, you know what, they're big enough. They have enough parameters, they have enough training there. They clearly know what simple, concrete, specific language is. Maybe sometimes it's maybe a little too buzzwordy and so there's stuff that we have to do there, but it, we, the models have gotten a lot better. Yeah. You know, a clear promise that gets paid off.
You know, I actually think that's a pretty important one. If I was to say, the kind of feedback that you've given me is, if you know, a lot of that. Yeah. Relatable, sometimes funny, focuses on topics that people care about. The proper tense one. Clear idea, but what is bad content? It's generic openings that could apply to anything. It's the wrong tense. It's too wordy. Breathless tone. Chop, slop, unclear, choppy syntax telling instead of showing. So it's a lot of there's some specifics there, proper tense, but it's also principles of telling instead of showing. I think the other thing that's interesting about the, uh, this prompter is the evaluation approach, how it actually does that.
It starts with a hook. Is the opening sentence surprising, counterintuitive, or is it generic and could apply to anything? And then next, always pull exact quotes from the content to support your points. So this is how it's evaluated. It's assessing the voice, looking for specifics. It's checking for a payoff. I think the thing that's the most surprising to me about this is how simple this actually is. Like if I was to, with Claude write a prompt that did this, it would probably, you know, this is 112 lines, it probably would've been many more lines of text and yeah, it's very, very interesting.
Dan Shipper
Well, let's test it, let's do the Newlywed Games with DSPY Dan and Dan. And we'll, we'll do a whole, you know, we'll do a little graphic here and I'll, I'll give a thumbs up or down and we'll see what DSPY Dan says and we'll see if we match.
Danny Aziz
Okay. I'm going to, I'm going to find a tweet on my timeline. I'm going to send it to you. Let's see. Okay. This is a good one. Where are you, Dan Shipper. Okay, I'm going to send it to you and then I'm going to paste it into DSPY Dan. And, and what I want from you, Dan, is a thumbs up or a thumbs down.
Dan Shipper
Cool. Do you want any of my actual feedback?
Danny Aziz
Yeah, that could be good too.
Dan Shipper
Okay. Does it, and it, does it give feedback, right? Like it gives feedback. Yeah, it gives, yeah. Yeah. So we should, we should match the, the thumbs up and thumbs down would be the main thing. But we should, I'm also curious about my, my feedback. Okay, so the, the, the tweet you gave me, and we'll put it up so people can see it, but the tweet you gave me was, I feel like the VC community is doing a disservice to the startup community ecosystem by constantly organizing all these events, especially on weekdays at 6:00 PM And I never understand how, how so many founders are able to make it to these leaving the offset 5:30 PM makes sense from their POV.
This is very thumbs down, sorry. Okay. So first thing is starting a tweet with, I feel like. Sometimes it actually really does work because it is, it feels more casual and something that you didn't edit, but also it lowers the impact of it. And so if you're using, I feel like I, you want to be pretty intentional about that. And because this tweet is very counter to it, he's kind of going after people a little bit. I think it would be better with a full send of the VC community doing way too many events, organizing so many events. I think, oh, you know what? I was only seeing part, part of the tweet. Okay. Because I was looking at it in the Discord preview, so it continues.
So I feel like the VC community is doing a disservice, blah, blah, blah, blah. I never understood how many, so many founders are able to make it to these. Leaving the office at 5:30 PM makes sense from their POV, they're building a network and effectively running the equivalent of an enterprise sales motion. But in the end, it results in drag in the community. Two few founders dare to say no to 99 percent of the invites they receive, which hurts their business. The thing I like about it is there's something true here. And it feels very stream of consciousness. Like he just kind of was in the car on the way to something and was, blah, blah, blah here. Here's what I think. And I think that that's where all the best tweets come from. And there's something interesting about the core idea of it, which is if I had to take it a little bit further, there are certain stable equilibriums in ecosystems where everybody in the ecosystem doesn't like it, but you end up there, you end up stuck there. So like, having so many events is, it's one of those things where it's actually not good for anyone because VCs are, I guess I had to put on all these events and I'm wasting money on it. And people, I guess show up, but I don't know if, if it's really that differentiated and founders are, I wish I could work. But uh, but everyone keeps doing it anyway because they feel like they have to. So I think that there's a core idea there that's kind of interesting. But I think this would just hit way harder if it was. I think VCs are actively hurting the startup community by holding so many events, with maybe one or two more sentences of why that is.
Dan Shipper
As it is, it's like, it's a little wordy and he's also going after VCs and founders at the same time. And as I think a little bit. It feels like I don't necessarily want to open up a two front war at the same time. So I, I give him points for an honest observation that has an, there's an interesting idea at the core of it, but I think execution on this, it's not done in a way that allows me to light up and be, yeah, you're totally right. I'm kind of, I don't know. It's too long to read. And I think he's kind of making fun of me too, instead of getting me on board with him because I would be down to bash VCs, you know? But, yeah, that’s my feedback. Thumbs down.
Danny Aziz
Thumbs down. Do you, because you, you set that thumbs down with such vigor when you first said, has it, has it, is it, is it softer thumbs down, or is it still a, is it still a really strong thumbs down?
(01:00:00)
Dan Shipper
It's maybe leaning a little bit. It's a, I think it is, I think it's softer in the sense that I only read the preview and so it looked like it was, it was just cut off, makes sense from their preview where it stopped. And I was, what even is that? So I'm a little less vigorous, but it's still not, not a great tweet.
Danny Aziz
Okay. So this is what DSPY Dan had to say. So he gave the tweet a thumbs up. Whoa. But what was interesting was that actually, you know, you said similar things, slightly wordy in places. A clear playoff, which is, you know, the, oh, sorry. Yeah. Clearly it felt like he'd written this himself. It felt handwritten, not AI generated, but DSPY Dan felt like it was counterintuitive. The VC community is doing a disservice to the startup ecosystem. Sort of a bold, contrarian take. Not something, so there is. I think there's clearly still work to do. I think DSPY Dan is really thinking more about, is more, what is this saying as opposed to how it is written? So that is something we have other judges that look at how things are written? Yeah. And so I think one, the, one of the things that we found kind of said this earlier, having a model do many things, doesn't work.
Having it do one thing is good, but sometimes that one thing isn't the whole thing. Yeah. So it would be interesting to see adding this into some of the other judges we have. How it would, how it would act. But also I think this is going to be one of the things that I've come to accept is that a product like this is going to be always evolving. The models are going to change, you know, Sonnet 4.6 is going to come out next week or something like that. And that's not, hope not. I'm so tired. And I think, you know, GPT-5 was really interesting. To use it, you have to prompt differently. There was a while where GPT-5 was the writing model inside Spiral and the prompt was so different. It had all these XML tags and it was very, and now we've gone back to a Claude model and it's a much different prompt. And yeah, I think it's, there is no, I mean I don't ever really think there was ever a, a thing of done when it comes to building products, especially software.
I think it's more so evolving because the product itself is non-deterministic. You can put whatever you want in the chat input and whatever else it wants is going to come out. And we want that to be the output that we can. It's actually a really interesting problem because you want it to be outputted. You know, if I was making Instagram filters and if I was an engineer at Instagram, I never was just to be clear. But if, if that's what I was doing and I was coding up a filter. I could be great, this filter is sick and it works for the things that it, and, and I never have to touch it ever again unless it breaks, right? But, I don't have to, whereas now I'm, I actually can't. I would love to say that I could stand by all of the writing that comes out with Spiral, but I can't, I literally cannot do that because it would require an infinite number of me looking at an infinite number of outputs, which is impossible. So it's a, it's a really interesting shift in building products that I don't even think I've really fully come to terms with just yet.
Dan Shipper
I think you're right. Good writing is alive. It comes from something that is alive and is honest to a living experience. And so anything that is static, it can be good for a while, but over time it, it becomes slop, it's static. And that's something that I think we're starting to grapple with and has some really interesting implications for the direction of this product over time.
Danny Aziz
Yeah. And I think it has implications for what the next generation of models look like and what they're trained on. And you know, a lot of these models, you know, AI isms, I, I really think the AI isms come from two things, which is one, I think we just get used to them. I especially like it because we're looking at AI isms all the time. But then also just the amount of nonsense that was written on the internet over the last 15, 20 years and just how much of that is in the training data. You know, in an ideal world, Anthropic would hire us to prune the nonsense out of their training data and then train Sonnet 4.9 with, with the Every bespoke training data. That would be great, Dario.
Dan Shipper
Let's, yeah, put us up. If you're listening, let us know.
Danny, this is fantastic. I actually feel like I learned a lot. Like I said, so proud of you and, and what you built. And so excited. Thank you for getting this out into the world, getting into people's hands and, and, and also for what comes next. Where can people find you and where can they find Spiral if they want to give it a shot?
Danny Aziz
So Spiral, you can go to writing.new is a domain. It's also writewithspiral.com. It's @tryspiral on Twitter. You can also follow Every everywhere and you'll find links for it. And I am DannyAziz97 everywhere on the internet. Thank you very much for having me.
Dan Shipper
Thanks for coming on.
Thanks to Scott Nover for editorial support.
Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.
We build AI tools for readers. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.
We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.
Get paid for sharing Every with your friends. Join our referral program.
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators
Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators
Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Comments
Don't have an account? Sign up!