How to Use ChatGPT as a Copilot for Learning

Transcript

Nathan Labenz (00:00:01)

This probably took me two to three hours. It writes all the code. Because again, I've never written a line of React code in my life. So bit by bit, we're refining the experience. We're finding the interface. And I would guess that this would have taken me easily an order of magnitude longer in a pre-ChatGPT era.

If this was two to three hours, it's probably two to three days of work to figure out all this stuff.

Daniel Shipper (00:00:39)

Nathan, welcome to the show.

Nathan Labenz (00:00:42)

Thank you, Dan. Great to be here. I'm excited for this.

Daniel Shipper (00:00:43)

I'm excited too. For people who don't know, you are the founder of Waymark. You are the host of the excellent podcast, The Cognitive Revolution, and you are a GPT-4 red-teamer. So you were responsible—or one of the people on a team of people who were trying to figure out how to make GPT-4 do bad stuff before it was released, which you had a really interesting tweet thread about, I don't know, I think a week ago or two weeks ago, something like that.

So we're very excited to have you. I think you'll have a lot of insights that I'm excited to share with everyone. I think one of the things in thinking about your work that stands out and thinking about The Cognitive Revolution in particular, the podcast that you run, is I think you have this idea that one of the values of AI is in helping us to offload cognitive work.

So just like in the way that [with] machines, in the Industrial Revolution, we offloaded manual physical labor, AI will augment or offload a lot of cognitive labor from humans. And I wanted you to just talk about that. Tell me more about what that means. And then tell me, is that a good thing? And where is it a good thing?

Nathan Labenz (00:01:46)

Well, that's a big question. I would say … I talk about AI doing work and helping us in a couple of different modes. For starters, we will probably spend most of our time today in what I call co-pilot mode. Which is the ChatGPT experience of, you are, as a human, going through your life and going through your work and encountering situations where, especially as you get used to it, you realize, oh, AI can help me here.

So you make a conscious decision in real time to switch over to interacting with AI for a second or a minute or whatever, to get the help that you need, and then you proceed, but you are the agent, right? In that situation, going around and, you know, pursuing your goals. In contrast, the other mode that I think is also really interesting is delegation mode, and that is where you are truly offloading a task.

And I always say the goal of delegation mode is to get the output to the point where it is consistent enough that you don't have to review every single output. And if you can get there, then you can start to really shift work to AI. In a way that you no longer have to do it. And that can, that can be useful in different kinds of ways, right?

The copilot mode is about helping you be better. That's your classic symbiosis or intelligence augmentation. And then the delegation mode is more like, we can save a ton of time and money on things that used to be a pain in our butts, or we can scale things that are not currently scalable. And there's a lot of that in the world, right?

I think almost everybody has things where they would say, if you just ask the question, is there stuff that you could be doing that would be really valuable to have done, but you just don't have time to do it. There's a lot of that that can be quite transformative. In the middle, and what's kind of missing right now still, is between copilot mode, where you're getting this kind of real-time help and deciding how to work it into whatever you're doing and delegation mode on the other end in between, is ad-hoc delegation, where it's, I'm going along, but ideally, I would like to delegate more and bigger subtasks to the AI on the fly. And that's where we're not quite there yet. The agents probably can't do much in the way of a significant task. So you're still shoehorned into one of two scenarios where you're engaging with it in real time and getting help, or you're going through the process of doing a setup and doing a validation, setting up a workflow, to where you can truly delegate. And it's that in between that I think is probably that gap gets closed over the next year as agents quote unquote, begin to work. And then we can start to delegate bigger chunks of work on the fly.

The next question was, is it good? I don't know if I have a great answer to that. I think it's largely good. I think it says it's good as long as humans stay in control of the overall dynamic. And I'm definitely one who considers everything to be in play for the future, both on the positive side and I don't think it's crazy to think of a post-scarcity world and on the negative side, to quote Sam Altman, I wouldn't rule out lights out for all of us. I think we are definitely playing with a force here that has the potential to be totally transformative in good and bad and probably a combination of ways.

I'm thrilled by how much more productive I can be, and that's some of the stuff that we'll get into in more detail. I am thrilled by the prospect of having infinite access to expertise, and especially for people who have far less means than I do to have that kind of access to expertise. I am a pretty privileged person who can go to the doctor without really thinking twice about taking the time off from work or what that's going to cost me or whatever.

Obviously, a lot of people don't have that luxury. I think there is a real way in which AI can cover a lot of those gaps, not fully yet, but already significantly. And obviously more and more over time. I think that kind of stuff is going to be potentially disruptive. It may be the source of a lot of political debates and challenges, but anyway, there's so much upside, but I think there is a very real risk. It's very easy to hold those two perspectives at the same time—to be just thrilled by the capability, but also to be always always keeping in mind a sort of healthy fear.

Daniel Shipper (00:06:20)

I love that. I think that's such a rare perspective and, as humans, we just tend to collapse on one—either it's horrible, or it's great. And then we have these camps. And I think, obviously, the wise perspective is there's going to be some really amazing stuff about this and there are dangers like when technology changes society, it'll change our brains. We will adapt to this in the same way that it is adapting to us that will change things and we'll need to deal with the dangers that it presents.

I think that's a very wise perspective. And I asked that question, is it a good thing that cognitive work will be offloaded? Because I think that there's good and bad, but one of the things that I feel is the fear scenario is quite dominant for a lot of people and, and I think the people who are anti-fear or presenting a hopeful view are a little, but they're a little bit too rose-colored glasses. And I think finding real ways and real use cases for how offloading some of this cognitive work actually helps people is just like a really important part of creating a world where AI is a force for good or force for creativity, rather than a world where it just replaces people or it creates dangers or—I don't know, all of the bad scenarios.

And one of the things that I've felt going back to your kind of copilot mode versus delegation mode point. One of the things that I felt is that AI reveals to me how much drudgery there is, even in highly-valuable, highly-creative knowledge work, and that we sort of lie to ourselves about the amount of drudgery because that work is so romantic compared to, I don't know, working in a factory maybe, or just any other kind of job. And it's easy to look at a lawyer and be like, well, a lawyer's job is full of drudgery or whatever. But I write, I run a business, I have a YouTube show, now I have a podcast. There's a lot of stuff that's just pure drudgery.

And I find it really interesting because using AI tools more broadly, it has made me aware of how many repetitive or just overall kind of braindead things I have to do just to write something smart on the internet on Every. And once it's visible, I use AI for it and then I don't have to think about it as much anymore. And I think that's a really cool thing.

Nathan Labenz (00:08:53)

Totally. For me, coding comes to mind most there when you talk about the drudgery of high-value and, again, pretty privileged work to be doing. But I'm not a full-time coder, have been for a couple of short stretches in life, but more often I've been somebody who's dipped in and out of it and it is a real pain in the butt to have to Google everything. And obviously different people have different strengths and weaknesses. I do not remember syntax super well. Sometimes if it's been a while, I'm like, wait a second, am I remembering JavaScript or am I remembering Python? Like what exactly is going on here? And so to be able to just have the thing type out even relatively simple stuff for me is a multiple-x speed-up often in terms of productivity improvements, often an improvement in just strict quality too, compared to what I would have done on my own, and makes it so much easier to get into the mode in the first place.

There's this kind of—I wouldn't even call this drudgery—but it's gearing up… People talk about [this] in birdwatching, getting your eyes on: really focusing on what are you seeing and trying to get that detector right. There's a similar thing, at least for me, in terms of getting into code mode.

And it also just streamlines that tremendously because, next thing you know, it's writing the code and I'm reading the code, and reading the code is a lot easier than writing the code. So I do find tremendous satisfaction [and] pleasure in just seeing this stuff output it for me at superhuman pace, better than me in quality, maybe not superhuman quality, but super Nathan quality. It's awesome.

Daniel Shipper (00:10:35)

What you're making me think about is… because I think in large part, not all of it, but in large part, what the current class, especially of text models, are doing is different forms of summarizing. And how much summarizing is involved in creative work—in programming, in writing, in decision making—a lot of it is just summarizing. In programming, you're summarizing what you find on Google. You have to decide what to summarize and you have to summarize it in the right exact way for your specific use case. But that's a lot of times what you're doing. Same thing for writing. A lot of the stuff in my pieces are summaries of books that I've read or conversations I've had or ideas that I found somewhere else that I'm stringing together in a sort of unique way.

And obviously I still have to do the overall management task of deciding which summaries to put in which order and like how they work or whatever, but a lot of it is summary. And I think that's a way that, using these tools, you start to see the world a little bit differently. And you're like, Oh yeah, there's a whole, there's a whole class of things I'm doing that are summaries that I don't have to do anymore. And I really think that's cool.

Nathan Labenz (00:11:47)

One of the areas where I have not adopted AI as much as I probably should have is in repurposing content—making more of what I do with the podcast, because I've put out a lot of episodes, there's a lot of stuff there. And we do use AI in our workflows to, for example, create the timestamp outline, right, of the different discussion topics at different times throughout the show. That's the most classic summarization where I'm not looking for a lot of color commentary. It's literally just what was the topic at each time, get it right. So we've got some stuff like that we go to pretty regularly. But I have not done as much as I probably could or should—maybe this will be a New Year's resolution—to bring that to all the different platforms.

And it is, I think this is partly a personal quirk, and it is also I think a limitation of the current language models tha, I never quite feel like I want them to write as me. I'm very interested to hear your thoughts on how you relate to it in the writing process.

When I put something out in my own name, I basically don't use chatGPT at all for it. I can use it, I find, for voice-of-the-show, if I want to do that timestamp outline or just create a quick summary, that's in kind of a neutral voice where it's not signed Nathan and isn't supposed to be like representing my perspective, but I haven't really had a great synthesis yet to help create stuff that I want to say in my own voice and my own name.

So if you have tips on that, that'd be something I would love to come away with a better plan of attack on, because I'm not quite there.

Daniel Shipper (00:13:29)

I do. I definitely do. I love it. I think it goes back again to when you talk about being a copilot, I think that the failure mode is usually trying to use it when it's a little bit more in delegation mode—just go do this whole thing. That's when it doesn't really work, but as a copilot, it really works incredibly well for specific microtasks in writing. So, first example, as I just brought up, everything is a summary. I often have to explain an idea. I was writing a piece a couple of months ago where I had to explain an idea that I knew the idea was I was talking about SBF and FTX's collapse and how utilitarianism and effective altruism, whether or not that philosophy contributed to the collapse.

And in order to write that article, I had to summarize the main tenets of utilitarianism. And I studied philosophy in college and I've read a lot of Peter Singer's work and I just generally know it, but I haven't written about that in a while. Ordinarily I would have had to spend three hours going back through all the different stuff to formulate my three- or four-sentence summary, but I just asked ChatGPT and it gave me the summary in the context that I needed it in three or four sentences and I didn't use that wholesale, but it gave me basically the thing I needed to like tweak it and put it into my voice. And so that's a really simple example, but I think you can use it in all different parts in the writing process from at the very beginning, I'll often just record myself on a walk, just spewing ideas and random thoughts free-associating, and then I'll have it transcribe it and summarize it and pull out the main things, and that'll help me find little article ideas.

When I have an article idea, I'll often start with just this really messy document full of quotes and sentences and little things that might go into it, and then I'll be like, I don't even know where to start with this. This is crazy. And then I will just be like, can you put this into an outline? And I'll just paste the entire document into ChatGPT, and it'll often find an outline.

And like the, the outlines it comes up with are really basic, but sometimes… I think what is one of the things that's really good at is pointing out the obvious solution that you missed because you're too close to the problem.

Oh, of course. Like the outline for this article is set up the problem and then talk about the solution to the problem that you came up with or whatever, that's such a common format for an article. But if you're in your head about it and you're being really precious, it can be hard to be like, for this special article, it's going to be this basic thing that you've written a thousand times before with the same basic structure.

And then I think like one of the, one of the other really great things is it's just incredibly good for helping you figure out what you're trying to express, put into words what you're going for, and then also going through the different options of like how to express what you want to express until you find something that exactly says the thing you want, for example, like trying to find exactly the right metaphor.

Okay. What kind of metaphor are you trying to find? What's the idea you're trying to express? And then here's 50 different options of ways to express that with a metaphor. And 49 of them will be trash. And one of them will be amazing. Or one of them will push you in the direction of what the actual one that you come up with is.

And I have like zillions of examples of that. So I find that ChatGPT is all over my writing, but none of the stuff that makes it into the writing I publish is wholesale from ChatGPT. It's like doing some of those microtasks for me all the time.

Nathan Labenz (00:17:02)

Yeah. That's interesting. Some of the stuff that you mentioned there, I have had some luck with: the talking to it on a walk is quite helpful in some cases. I've done a couple of things where I tried to draft a letter and do, as you said, talk my way through it. Here's what I want to say. I'm writing to this person. Here's a little bit of context. Here's the key points I want to get across. Can you do a draft and then, iterating verbally on that draft, a lot of times I'll follow up and be, okay, that's pretty good, but you can give it pretty detailed feedback. The transcription in the app is so good. Again, point of privilege, it understands me extremely well. So I can literally just scroll through its first generation and say, in the first paragraph I don't really want to say that, it's more like this, in the second paragraph more emphasis on this, add this detail, give it eight things. But you could wish it would do a little bit better on the revision.

I've had a few moments where, at the end of that process, I have something where, all right, when I get back to the desk, it's not that far of a leap from that to the actual version that I'll use. Yeah, it's probably still underutilized for me. I should go on more walks, honestly, get more time away from the screen, get the blood flowing a little bit and use a different modality.

The microtasks, I should do more though. I think that's the tip that I'm taking here is, and there's a separation between … Sometimes where I feel like it's hurting me is if I haven't, and this will even now start to happen in Gmail or anywhere where there's this like autocomplete that's popping up … Sometimes I'm like, I'm on the verge of a thought that is really the thought that I'm trying to articulate and then this autocomplete comes up and it's like, that's not right, but it can derail you at times where you're like, don't guess for me right now. Let me get the core ideas down first.

If you don't have those core ideas, then for me, it's been a real struggle to get anything good, but I think I've probably not done enough experimentation in the writing process of, okay, I do have some core ideas. Can you help me order them, structure them, iterate on them?

Interestingly, I also do use it at the other end, often. Critique this. Here's an email, here's a whatever, here's an intro to a podcast. Critique this. That could be really useful if its critiques are usually worthy of consideration, at least I would say.

Daniel Shipper (00:19:29)

It truly is good at that. And at Every, we have an editor—we have multiple editors who are highly skilled—and I still use it to be like, what do you think of this intro? Cause it's up at 2:00 a.m. the night before a deadline,

Nathan Labenz (00:19:41)

It's really hard to beat the availability. The responsiveness is clearly superhuman on that.

Daniel Shipper (00:19:48)

I think the writing sounds really fun. If you're ready for it, I would love to start just diving into how you actually use ChatGPT.

You sent me a doc with a bunch of historical chats and this was the first one. Give us the setup. What were you doing? And at what point were you like, Oh, I need to go into ChatGPT and then, and take us from there.

Transcript

Nathan Labenz (00:00:01)

If this was two to three hours, it's probably two to three days of work to figure out all this stuff.

Daniel Shipper (00:00:39)

Nathan, welcome to the show.

Nathan Labenz (00:00:42)

Thank you, Dan. Great to be here. I'm excited for this.

Daniel Shipper (00:00:43)

Nathan Labenz (00:01:46)

Daniel Shipper (00:06:20)

Nathan Labenz (00:08:53)

Daniel Shipper (00:10:35)

Nathan Labenz (00:11:47)

So if you have tips on that, that'd be something I would love to come away with a better plan of attack on, because I'm not quite there.

Daniel Shipper (00:13:29)

Nathan Labenz (00:17:02)

Daniel Shipper (00:19:29)

Nathan Labenz (00:19:41)

It's really hard to beat the availability. The responsiveness is clearly superhuman on that.

Daniel Shipper (00:19:48)

I think the writing sounds really fun. If you're ready for it, I would love to start just diving into how you actually use ChatGPT.

Nathan Labenz (00:20:03)

So I am working as the AI advisor at a company called Athena, which was founded by a friend of mine named Jonathan. And—

Daniel Shipper (00:20:18)

Is this the virtual assistant company—the Thumbtack—

Nathan Labenz (00:20:21)

Yes. He is one of the founders of Thumbtack and this is a different company, but founded on some of the lessons that he learned in the Thumbtack experience. He legendarily built up like a really amazing operation powered by contractors in the Philippines.

And that included hiring an assistant for himself and his role at Thumbtack who became another key partner in his life over a long time. And then Athena was built to essentially try to scale that magic for startup founders and executives in general. They hire executive assistants in the Philippines. They pay a premium wage. They're really focused on getting super high-quality people. And the idea is to empower the most ambitious and most high-impact people by equipping them with this ability to delegate to their assistant in a transformative way. Okay. Now we're working on “what does AI mean for us,” right? How do we bring that into the assistants’ work? So one of the things I've done is train the assistants on the use of AI. And that's been a fascinating experience, putting content together, examples, et cetera. Another thing that I've done is just worked on building a number of prototype demos for what the technology of the future might start to look like.

And this chat, which we call Athena Chat, is basically our own custom in-house ChatGPT. It was built on an open-source project, so I didn't have to code every line of it. But, it is amazing how quickly you can build things like this today with a bit of know-how. So it's been me and one other person who have built a number of these prototypes.

In this case, what we wanted to do is say, can we create a long-lived profile that represents the client that can assist the EA in all sorts of ways. So it's essentially a plugin but, with plugins you have some limitations whenever we’re experimenting with this on our own. One of the big things we wanted to enable is adding information to the client profile, updating information that's already in there. So the hope is that this could be a hub where over time client preferences and history and even background context documents all can gradually find their way in there. And you have this holistic view where the assistant can go query anything they need, but again, also in theory supposed to evolve over time, right?

So we have this ChatGPT-like interface. And one of the things that we've noticed is that we still see, despite our attempts at education, it's not perfect. We still see that assistants sometimes need coaching on how to effectively prompt a language model. So that was my motivation coming into this little thing.

I already had this React app, which is again, just a ChatGPT-like little app. And I wanted to add a module to it. The module I wanted to add was a prompt coach. So I wanted to put in another little layer where it would look at what the assistant, the human assistant, put into the chat app and send that through its own prompt to say, are you applying all the best practices?

Are you telling the AI what role you want it to play? What job you want it to do? Are you specifying a format that you want your response back in? Often these days we will do it by default, but are you setting it up in such a way where it will do some sort of chain-of-thought, think-out-loud, think-step-by-step reasoning before giving a final answer. That's actually one of the most common things I see people do to shoot themselves in the foot with AI performance is prompt in such a way where it prevents what is now the kind of trained-in default behavior of explain, analyze, think about it a little bit before getting to a final answer.

So you just have like a number of best practices—

Daniel Shipper (00:24:24)

Let me stop you there real quick. What are people doing that would prevent the model from doing the kind of chain-of-thought, best practice that makes it reason the best?

Nathan Labenz (00:24:32)

Anything that just sets it up in such a way where it's got to answer immediately with no ability to scratch its way through the problem is bad.

And I see that very often. It's common. It happens even in academic publications not infrequently. Often that's a hangover from the earlier era of multi-shot prompting. And obviously this is all changing super quick. But if you go back, the first instruction model that hit the public was OpenAI's text-davinci-002 in January of 2022.

So we're almost two years, but still not even two years, since you could first just tell the AI, “Write me a haiku,” and it would attempt to write you a haiku. At that point, it was not necessarily going to get the syllables right. The earlier generations were, you would have to say a haiku by author name, colon, and then hope that it would continue the pattern.

That's the classic prompting. And with instructions, now you can tell it what you want to do. And obviously that's gotten better and better, but in the benchmarking, in an academic context, that was developed before this instruction change, typically you would have like “question, answer, question, answer, question, answer, question.” And the AI's job would be to give you the answer. And so they would be measured often on five-shot prompts or what have you. But a lot of that stuff was, all that scaffolding was built before people had even figured out chain-of-thought. And so now if you take that exact structure and you bring it to a GPT-4, you're often much better off just giving it the single question with no structure, letting it spell out its reasoning because again, now it will do that by default and then give you an answer, versus if you set up question, “answer, question, answer, question, answer,” it will respect to the implicit structure that you are establishing and it will jump straight to an answer.

Often these are like multiple-choice or they could be a number or what have you, it will jump to an answer, but the quality of the answer is much reduced compared to default behavior. If you just let it think through it. And I've even seen this in Bard. I think this is hopefully now fixed, but not too long ago, Bard would give you an answer before explanation by default.

And again, that's just, you're going to have a problem. So sometimes people do that by mistake. They'll say, “Give an answer and then explain your reasoning.” You're just hurting yourself, right? Because it will explain its reasoning for a wrong answer. Once the wrong answer is established. So triple-A is my, in the EA education, it's triple-A for triple-A results: It's “analysis before answer always.”

Daniel Shipper (00:27:14)

I have never heard that before. I liked that. And just to summarize, I think basically what you're saying is a previous generation of prompting really encouraged, in your prompt, to give multiple examples of the kind of question-and-answer or kind of thing that you wanted the model to do, and then set up the last example, such that the next thing the model would do is give you a direct response.

But what we found over time is one other really effective thing to do is rather than have the model give a direct response or direct answer to a question or a problem posed to it is letting the model, quote unquote, think out loud first by reasoning through the problem, just like a human would do a word problem. And then at the end of its response, give an answer, improves the quality of the result that you get from the model.

And what has happened is OpenAI and other model providers have made that more of the default behavior so that it'll pretty much always do that, but using previous prompting techniques, a few-shot prompting or multi-shot prompting where you're giving examples, might lead it to just answer directly and you should look out for that and try to avoid that.

Nathan Labenz (00:28:25)

It's a great summary. Yes. If it is jumping directly to an answer, you are, for sure, leaving performance on the table for all, but maybe the most trivial tasks.

Daniel Shipper (00:28:40)

And just an aside, see how much of creative work is just summarizing?

Nathan Labenz (00:28:25)

Important. Because I tend to give it a long, long version by default. That's my

default behavior.

Daniel Shipper (00:28:46)

This is one of those microtasks that I'll just be handing off to an AI avatar version of me at some point. Okay. So let's get back to this: You're working on an app and you want to add a module to it that explains some prompting techniques. And it looks like the app itself is something that you didn't build from scratch and you're trying to get the lay of the land so you know what to do.

Nathan Labenz (00:29:05)

Exactly. Yeah. And the problem is, I know how to code generally, and I've even coded in JavaScript quite a bit, but React is a JavaScript framework that has a sort of hierarchy of best practices that, if you know them and you can easily apply them, then you can work quickly with the framework, right? That's the value of all these frameworks, but if you don't know them and you're coming in cold, like I was, then where do I even go? There's all these different folders and file structures and, where exactly am I supposed to look for the kind of thing that I want to do? And where do I put a new module?

And so that's where this chat really starts. I have a working app. I have the code for the app, but I've never worked with a React app personally, hands-on before. So I literally just set up the scenario and. I don't really use too much in the way of custom instructions or super elaborate prompts in my copilot-mode work.

Certainly in delegation mode, then you get into a lot more detailed prompts with “if this, then that” cases, structured formats, et cetera. But I often just find a pretty naive approach is effective for things like this. And so I just start off by telling it, “I'm working on this React app project and I am a bit lost. Can you explain the structure of the app?” I give it a little bit more information and it starts giving me a tutorial of what it is that I'm looking at. And then you've got React and you've got Redux and then you've got these kind of additional Slice JS toolkits and Sagas and these frameworks, in some cases, take on a life of their own where there's whole conferences, right, and companies, and it's like, you can be very deep down this rabbit hole. And whoever built this open-source project that I'm trying to modify, but they're using a bunch of different things that are not even necessarily standard, but are common or whatever. So just like five different things here that I have no idea about.

And without this kind of tutorial, I'd be going off to search for, “Okay, what is this Saga JS? What does that even do?” It's able to give me that entire rundown extremely quickly. And then, this I thought was a really interesting moment because I get a lot of value from things like this, where I feel like it's prompting me and it wasn't exactly that here, but it gives me this general structure.

And then I was like, oh. I find it to be a general pattern. It's if you can give it something in a format that it natively showed you. That's probably going to work pretty well. So sometimes if I'm even in kind of the delegation mode, sometimes I'll be like, I don't exactly know what structure this should have, but maybe if I have it suggest the structure, then we'll get a structure that it can naturally work well with, in this case, the structure is like dictated by the world, but it's pretty well-known that, okay, this is going to be your structure of a project in this React framework. Okay, cool. But this got me thinking, I should give it my actual structure. I want to print this thing out for this project that I'm working on, because I didn't make it. I don't know what it is. I don't want to have it help me interpret that full thing. But then again, I'm like, how do I print something like this? I don't even know how to do that. So then my next question for it is, “Can you write me the command to print out the file structure?” And this is where you're like, okay, this is magic, right? Because now again, I don't know how to do this. This tree command, I don't know if it was installed for me or not, but okay, it shows me how to do it. And next thing—oh, there's another step here of installing some package that needed to be installed. Okay. It was helping me with that. So I'm just encountering all these—This is the classic developer experience. Conceptually, I have a clear idea of what I want to do, but, now I'm three levels, three nested problems down here, right, where I'm like, Oh, okay. I need to understand this framework. Oh, okay, I need to print out the structure to better understand the version I'm working with in this framework. Oh, now I need to install something so I can do that print. And this is where people just, time goes to die, right? It's like, you talk to programmers and you're like, yeah, you didn't get anything done today on there, but, what happened was I was on the way to the market to get my app together, and then I had to install this thing and then I couldn't install, but each of these things, it's helping me get over … And now finally I'm able to say, okay, here is my app. This is the app that I actually am working with. And now we're really getting into something good because it can now break that down. And the names of the things are pretty semantic.

I noticed, I haven't even given it any code here. I've just given it the file names, but the file names have a kind of an indication of what is what, and it gets a sense just from that of what the app actually is. So let's go over to, I think I just got a link to a working version of the app.

It's pretty simple. It's a ChatGPT-like environment, we can create these client profiles. We have our chats, we have our history, a couple different models, and there's function-calling in the background that connects the chat experience to the client profile. And what I'm trying to add is a module in the lower right hand corner, which I'm actually not sure if this version has, but the point of it is to take my prompts, run them through this meta-prompt as we discussed, and then show feedback warnings of, hey, you may or may not be doing this quite right. So back to the thing I've given them the file structure, it's now able to understand the file structure. And now I'm saying, okay, here's what I'm trying to do. I'm trying to create this prompt coach. I forget exactly how I had approached this. Yeah, this is a different file. Let me see exactly what I'm doing here.

Daniel Shipper (00:34:42)

It seems like maybe you had some sample code or something you'd written or—

Nathan Labenz (00:34:45)

Yeah, I did. I guess I took one stab at it myself and it didn't work. Where I'm looking at, the human version that I was looking at the same file structure and I'm like, okay, I see that there's this module, there's like a sidebar here and because you see these names, right? So you've got a sidebar and search and there's going to be like chat history here somewhere.

I'm looking at this and I'm like, okay, I see all these different elements and I see all these things. Let me just try to copy one and mess with it a little bit and hopefully get somewhere. And then I'm not getting anywhere. It's not showing up where I want to show up. I'm not seeing it. And so that's where I come to say, okay, now here's what I tried. Why isn't it working? And I explained my problem here at the end: The problem I have is that it's being shown in the wrong place.

Daniel Shipper (00:35:30)

And then it explains the answer.

Nathan Labenz (00:35:32)

Yep. Next thing, it's giving me instructions with code, modify this, put it over here. This is pretty cool too. Unfortunately we can't share the old screenshots. I don't know exactly what I used, but this is right as vision was being introduced to ChatGPT as well. So. I was able to then say, here's my screenshot. Here's where it is showing up and here's where I want it to show up. And can you help me with that as well? So from the screenshots, from the HTML structure, basically we just work through this entire thing.

I continue to run into issues. We're only 25% of the way through this whole thing. This probably took me, I don't know, two to three hours total to get these suggestions, implement them, see what's going wrong, yada, yada, yada. It writes all the code basically, because again, I've never written a line of React code in my life, so I don't know any of this syntax. There's a million ways to get it not quite right when you have no idea what you're doing anyway. And so it's writing all the code and just bit by bit. We're refining the experience. We're finding the interface. Here, we're creating some CSS. We have a particular style pack that's already built into this. So again, that's just another thing I'm not at all familiar with. This is the syntax for figuring out how to use that style pack. Good luck making that up on your own. And then we go, basically after a couple of hours, I got to a working module where the prompt coach would intercept your call, do the meta-prompt, parse the response, identify—

I had it giving suggestions and the urgency of the suggestion. So we're color-coding those suggestions as they come up. If it's serious, then you get it in red. And if it's not, you'd get it in yellow or just a notice. And I would have, I would guess that this would have taken me easily an order of magnitude longer in a pre-ChatGPT era.

If this was two-to-three hours, it's probably two-to-three days of work to figure out all this stuff. And a lot more frustration that is. Because I'm not a super patient person. The feeling of a million people have done something almost exactly like this. There's nothing differentiated or special about what I'm doing. I'm just in this phase of not knowing what I'm doing and just getting constantly stuck, constantly stumbling, constantly running into friction. I really don't enjoy that. I think most people don't. This is none of that or almost none of it. Even just that going back to the install, or the command to print out the structure. Man, this is so stupid. I know exactly what I want. I know that it is doable. I know that it's been done a million times, a million places, and yet I don't know how to do it. And then liberating me from that frustration is. And it turns out your drudgery point, right?

That was probably 80-90% of the time in a world where I was doing this on my own. And now we're down to the two-to-three hours where it was really about defining what I want. This could have been one hour if I really knew React, but it taught me the ropes and did the task in probably again, 80-90% time savings compared to the unassisted version.

Daniel Shipper (00:38:47)

I love this. I think this is such a cool example. I really appreciate you bringing this. Because one, it's obvious that this kind of thing, which, if you're not a programmer—As a programmer looking at this, I'm like yeah, this is so much of what you do as a programmer, especially if you're a programmer, working on startup stuff is this kind of thing.

It's like, this is doable. It's been achieved before. I just need to do it in this in my specific context. And. It's obvious that this would have taken you days or taking really anyone days to do from scratch. But with ChatGPT, it makes it like way quicker and takes away a lot of the drudgery. But I think what's really cool and really beautiful, which is weird to say about this stuff, it's striking me right now, is there's this dance happening in this chat, where at the beginning obviously you're asking it to help you, but you are giving it what it needs and filling in the gaps that it needs in order to help you. And it is filling in the gaps for you as well. So it is explaining React to you, but you are explaining here is the project that I have, and here are the specific details that I want done. And then there's this dance back and forth where you're mutually filling in gaps that both of you can't on your own fill in and I think that is really cool, to just watch that evolve where at the start you don't know React and you don't know like where to, where to put your code and you don't know why it's not working.

And at the start, it doesn't know who you are or what you're trying to accomplish or what the specifics of your project is. But as you build up this chat, you yourself are starting to understand things more. You didn't ask it, just go do this for me. You asked, how does a React project work? And, what is the structure?

And so you learned more about React and it learned more about you. And as your mutual understanding increased, you were both able to accomplish the thing together. And I think that's really cool.

Nathan Labenz (00:40:41)

Yeah, it's awesome. The next generation, it's episodic. We’re still only halfway through this scroll for all the scrolling I've done.

I just highlighted this. Okay, cool. This is working because at this point I'm starting to get into refinements. Okay. Now I want to dial in the styling. And basically at this point, the core problems have been solved. And now again, it's just going to do the drudgery of making sure that there's padding and things are centered and so on and so forth.

I try to be polite and encouraging to my AIs wherever I possibly can, but you can envision a future and I think that future is already starting to become visible through the mist a little bit as more and more stuff gets published on the research side, where this sort of episodic relationship where I start a new chat, it now knows nothing about this, right?

I can continue this chat up to a limit and obviously superhuman expansive background knowledge, but zero contextual knowledge. It can't retain that from one episode to the next, but I do think that is also coming soon, too, and there's a couple different ways it could shape up, but I think we will, in a year, certainly not that much longer than that, I can't imagine, start to see things where all this history is accumulated or maybe divided into different threads or whatever, but where this kind of can follow you forward into different tasks as well in a history-aware way, I think will be another level of unlock.

Daniel Shipper (00:42:12)

I think you're totally right. That's what custom instructions is—It's a step in that direction. Unfortunately, custom instructions is very hard to set up, but if you do set it up, it's really great. It's really nice for it to have context on you, but I do think you're right. ChatGPT will definitely have a memory that it can reference this stuff and reference the context of what you need and who you are.

Even with the same level of intelligence of the model will make it like 10x more useful and 10x faster to get to the right answer.

Nathan Labenz (00:42:39)

How much do you put into custom instructions? Because for something like this, it might be my profile, my writing sample, maybe whatever, but I probably wouldn't have—by the way, Nathan's a React novice and he doesn't know how to install anything.

So would it, do you have a vision or a sort of recommendation for a custom instruction that would help me with things like this?

Daniel Shipper (00:43:01)

You're asking the right person. I have a very extensive custom instruction and a lot of opinions about it. If you want, I can share them. I can share it with you right now and we can talk about it.

Nathan Labenz (00:43:10)

Sure, yeah. Let's check it out.

Daniel Shipper (00:43:12)

Okay. The first part of custom instructions is what do you want ChatGPT to know about you? And I actually like really having it know a little bit about who I am because there's enough about me on the internet that it knows my name. And that actually helps—same thing with Every. There's enough about Every on the internet that it knows my name.

And every once in a while, not having to explain who I am or what the company is that I run is really useful. For example, I was thinking a couple of weeks ago about starting a course and I was working with ChatGPT to decide how to do the course and whatever. And the first prompt was, “I want to do a course. Can you help me think about it?” And with custom instructions on it knows that I'm a writer and entrepreneur. And so, “Cool. I'll help you build a course. Here's how to think about it.” Beause it knows that I'm probably going to build one. But if I turn custom instructions off, it will be like, “Cool. What course do you want to take?”

And it's those little things that really make a difference for me. But basically stuff like serious relationships in my life are, I have in here. My sister, her husband, her son. I have my girlfriend up there, who people are at, at Every, because referencing their names is just much easier for me to be like, “Oh, Kate,” when I'm talking about something and not have to explain who she is every single time is really helpful.

I think another really interesting thing is adding into custom instructions, what are the things about you that you know that you're trying to like work on? For example, I feel like I have a fear of rejecting people, which causes me to be too agreeable. I'm a little bit too opportunistic and I would like to be more strategic.

Stuff like that is really helpful to put in custom instructions. Because it's these little realizations that you have every day. Where you’re like, “Wow, yeah, I, I am a little too opportunistic.” I think ChatGPT is great for being the thing that can help you as you're in the moment day-to-day, remember to pull back and incorporate some of these insights that you have that everyone has about themselves.

And same thing for goals, having it know what your goals are and bring you back to those things all the time as you're using it is really helpful.

Nathan Labenz (00:45:17)

Cool. Well, thanks for sharing. I think I use it a lot more for just very unfamiliar topics. Just looking at these examples that we had queued up at, okay, there's an app in a framework that I've never touched and know nothing about, working on a patent application and creating diagrams for a patent application where I don't really know how to do that at all.

Again, I'm starting with these very basic questions. What's a good syntax that I might use to create a diagram for a patent application? I just come in so cold. But it does suggest that you are doing a lot more kind of thought partner brainstorming about your kind of core stuff, which is interesting.

I'm much more on these kinds of episodic things that were like my history and this doesn't overlap almost at all in a lot of cases, but it just goes to show how many different ways of using these tools there are too, and this could be another New Year's resolution to try to bring it a little bit closer to the core of what I do.

It's not to say that it's not at the core of what I do, but not in this copilot way with things like Waymark. I'm working very closely with language models to make an app work well. And I feel like I have intimate knowledge of the details of how it works in that respect and it's a big project for me, but again, it's a different mode than the interactive dance kind of mode that you described. Fascinating.

Daniel Shipper (00:46:49)

Yeah, that makes a lot of sense. I definitely use it for some of this knowledge exploration stuff too, but, it's totally a sort of thought-partner for me, but I'd love to keep looking through some of the other chats you brought.

Nathan Labenz (00:47:00)

Cool. Here's this next one on working on diagrams. Working on a combination of a provisional patent application and the supporting diagrams for the patent application. This is something that I was doing for Waymark and we have this ensemble method of creating advertising video for small business. Basically, folks come in to the site, they get to enter a website URL, typically people will give the homepage of their small business website, we have some code that goes in, like grabs content off of that website, and then we build a profile, kind of synthetically, your custom instructions, so to speak, within the context of our app: Who are you as a user? What's your business? What are you all about? What kind of business? What images? And then to actually create the video, you give a very specific, although it's like a super short instruction, “I want to make a video for my sale this Saturday” or “I'm opening a new location and here's the address” or whatever.

It's like this very, “this is my purpose in this moment” prompt. And then we've got a pretty complicated machinery that takes all those inputs and it works with a language model to write a script. And then it has computer-vision components that decide which of the images from your library should be used to compliment the script at all these different points along the way.

It's a pretty cool experience now, really compared to, again, you think about pre-AI and now, what we had before was an easy-to-use template library and what we have now is really the AI makes you content. It's a phase change in terms of how easy it is to use, how quick the experience is, how much you can just rifle through ideas.

If you don't like the first thing, you just ask it to do another. And it's qualitatively just way more fun. People used to have to sit there and type stuff in and they were like, oh, okay, what do I have? What'd I say? And I'm not sure what to say. And a lot of people are not content creators, but everybody—I always refer to the Mr. Burns episode from The Simpsons a long time ago, but he goes to an art museum for reveal of some piece of art and they reveal it and he says, “I'm no art critic, but I know what I hate” and that's, I feel like, exactly how our users operate. They ask for something, they wait 30 seconds. They now get to watch a video featuring their business. And if they like it, they can proceed. And if they don't like it, it's very obvious to them. And they can very quickly be like, “No, not that. Give me another one.” And here's an alternate instruction.

So anyway, this is the app that we built and now we were like, okay, maybe we should think about filing a provisional patent on that. Like most software companies, we're never going to prosecute our patents, but we just want to make sure nobody can come in and give us a hard time. So how do I write a patent and how do I create the diagrams?

And I want to be able to update it. I want to have something that's not like just a total mess. So this was a series of different interactions that ultimately led me to these diagrams. But I provided initially, basically what I just said to you, which is a rambling sort of instruction on here is my app and here's what it does, here's how it works, here's some of the parts behind it, the language model writes the script, and the code is scraped from the website, and then the other part with the computer vision that figures out what—I just literally tell it the whole thing and say, “Now, can you use some syntax to make me a diagram that shows the structure of that app that I just word-vomited to you?”

And so there's like a bunch of different structures out there. So that's the first part of that conversation as well. You could use the Mermaid syntax, or you could use Graphviz, or you could use a couple other things, but what are the pros and cons of those? And can they represent certain different kinds of structures?

We dialed it in on either Mermaid or Graphviz, it started to make me a thing, and then you can see here too—This is interesting because I did find in this one that at some point it got confused. I'd given it this thing, it had generated this syntax, asked for refinements on the syntax. Because I'm taking the syntax, by the way, going over to another app.

What's cool about the syntax is you drop in this pure-text syntax and it will render the app for you, right? So you've got things like this, Graphviz, diagraph—what is a diagraph? I don't even really know what it's called. This digraph is G and it has these elements and they have these properties and they're connected in this sort of graph structure, blah, blah, blah.

You load it in, half a second, it renders it, and you're like,”Oh no, that's not quite right. This point should be connected to this point.” And it's, it's skipping one that it, so whatever. So you give it these kinds of iterations. It would make progress, but then it would also get confused it seemed after a number of rounds, because it was just maybe like too much syntax.

So at some point I did say, okay, using the episodic memory to my advantage or working around its working-memory weaknesses by just wiping and starting over. I'm like, okay, here's the best one from that chat that was closest to what I wanted it to represent. We just go have another chat and this time we're going to skip all the parts about which format do we use and skip all the word salad and I can just be like, “Here's a diagram. I want to make some changes to it” and now have it do more localized edits for me. And again, a lot of little details, a lot of nuance here, but it's happy to do that. We worked through a number of rounds of it. And I believe I attached the thing for you.

What I ended up with after a couple chats, you even get to the point where you're like color -coding and really starting to make sense of it. It's like the green in this diagram now is the things that the user does. So the user tells us what their business website is. Then there's code to go scrape. Then there's this fork where we have to grab all the images and we process them in various ways. One of the big challenges is which parts of this can happen in parallel and which parts depend on which parts?

This is actually something that we didn't have until I did this, even for the technology team. And I'm not sure how well all the members of the technology team could have even drawn this. So now we actually have a better reference internally also to be like, “Hey, what depends on the image aesthetic step?”

Now we can go look at it and be like, Oh, okay. Yeah. You can't select the best images. Until you have the aesthetic scores, you know, completed. Just having that clarity is also, I think, just operationally useful, but this is the sort of thing that you can attach to a provisional patent application and, at least begin to protect yourself from future patent trolls coming your way.

You know, again, how long would this take? If I had drawn it freehand, I maybe could have drawn it in a somewhat comparable time to the amount of time that I spent in the exchange, but having the syntax and now having it in that structured-language way also makes this like much more maintainable, can fit in other things, even can be like more readily used in language models.

The vision understanding is getting very good, but I would say it's probably still better at understanding the syntax of the graph more than this visual rendering.

Daniel Shipper (00:53:57)

Yeah. I think that's great. Obviously, ChatGPT has the DALL-E integration, so I'm familiar with that, but I've been thinking a lot about—sometimes I want to create something that looks like this in a graph with texts and boxes and all that kind of stuff, and I didn't even think to do to have it just write, “Graphviz markup” or something like that and paste it somewhere else.

So I think that's a really cool thing to know it can do, and it's also pretty clear, I don't know, in a year, it'll probably just render the Graphviz stuff for you and you'll be able to like, move it around and do all that kind of stuff without even necessarily having to chat back and forth after the first round or something like that. I think that would be a very cool next step for ChatGPT is to jump into an edit mode for something like this.

Nathan Labenz (00:54:35)

The closest thing I've seen to that so far is DiagramGPT. This is a slightly different notation, but basically you can prompt in natural language, it will then generate, in this case, Mermaid syntax for you in response.

And then it will immediately render your image. And you can then edit the syntax, you can't quite drag and drop within the interface itself. But I think this highlights a really interesting question around, what things should be in ChatGPT versus what things should have their own distinct experience, even if there's still, like, a very AI-assistant component to it.

This is one actually that I would expect lives outside of ChatGPT. Who knows, right? In the fullness of time, maybe you have dynamic UIs getting generated on the fly. We're starting to see that a little bit already, but I don't think OpenAI is going to say “What we need to do is create a UI where people can edit these graph things.”

It possibly could do that. GPTs don't really give you the ability to create like custom-editor experiences, yet anyway. So for now, if you want to have something like that, you have to bring it to a different app, but increasingly these are out there as well, right? They just use ChatGPT and just a renderer.

So I had the AI doing all the syntax and then the renderer showing me what it actually is and then going back and continuing the dialogue with ChatGPT.

Daniel Shipper (00:56:03)

I think you're right. I could see a world where they let developers build their own renderers inside of ChatGPT. Not for really serious stuff. I think dabble in a graph one time or make a little video or whatever that like having something in the interface so that you can do it in there. Like a rough thing is really helpful. But then, yeah, I think you're right. There will have to be other pro tools for people that all they do all day is make graphs that are not inside of ChatGPT.

Nathan Labenz (00:56:27)

So here's another one. This is a recent episode in my life where I had to admit defeat after 10 years of swearing that I would not replace my car until the replacement was self-driving. And we're not quite there. So I finally—and I've had three kids in the meantime—so I finally had to break down and get a minivan.

Like many parents of young kids, I'm like, oh, what my kids do is they really depreciate stuff pretty quickly. So I was like, I think I'll get a used minivan because if I get a new one, it's going to be used pretty quick anyway. So let me just look at what is out there. Now, anybody who's ever shopped for a used car knows that it's a total jungle, right? The car dealer websites are terrible. What features they have is a huge question. And what you end up encountering very quickly is these trim levels, which if you're not like a car head, you may not even know what that is, but that is the sort of—You've got your make, which is the brand of the car, your Chevrolet or your Toyota or whatever, you've got your model, which is the kind of car—the Dodge Caravan is the make and model—and then you've got this trim, which is often just like a couple letters or whatever. It's like the XRT or the SRT or the L limited or whatever. They just have all—these are package levels, right?

What features, what upsells have been included? Does it have asunroof? Does it have a screen in the back that drops down out of the ceiling for the kids, or whatever, right? And it's just a jungle to even try to figure out what levels there are, and what those things have. So this is Perplexity, which is a great complement to ChatGPT. It is more specifically focused on answering questions.

So in this way, it's a more direct rival to a Google search. It's not so much meant to be like a brainstorming partner. They really aim for accurate answers to concrete questions. They do a phenomenal job on it. So here, I had a number of runs of this as well, different kinds of questions or whatever. But, okay, these minivans that are not super old but old, pretty cheap. What do they have? What do they not have? And this would have taken—I don't even know, if I had really tried, I wouldn't have done it, right? This is one of those things that you just—I wouldn't do. But, if you had set out to go collate, okay—Here's all the makes and the models and the trims and what they have you're gonna be in like user manuals or something, I don't even know really where that information is stored in groundtruth. But just in asking that question I was able to get the trim levels for all of the different brands for this window of time and just easily get a handle now that I could reference back to, okay, this one on this dealer site. It doesn't have any pictures. It doesn't say anything, but it does say, for example, oh, it's an SXT, Okay, cool. Now I can at least know that is the second of however many trim levels or whatever. So the SE, that's your top one, your SXT. That's your, you can imagine, right, trying to sort this out on your own? And then you get the AVP/SE.

Who comes up with this stuff? It's ridiculous. But it's super useful if you're like, I don't want to drive across Metro Detroit to go look at this minivan if it doesn't have something that I really cared about. And the things that I zeroed in on were like fairly basic safety features. I wanted the blind spot detection and the backup camera.

So there were other questions too, like when did USB charging get introduced into cars in general? I didn't know the answer to that. I'm old enough to remember when you had to plug the thing into the lighter. And I didn't want that. I don't want a car that's that old where I have to use the lighter outlet anymore. I want a car that's at least into the USB-charger era. But when did the USB-charger era begin for cars? That was another one that Perplexity was able to answer. And it is so good. I think this is about to be a huge trend if I had to guess, because I've been a big fan of this app for a while. I had the CEO Aarvind on The Cognitive Revolution twice.

And they ship super fast. They win head-to-head comparisons for answer accuracy. The product itself is super fast. It's got a great UI with these sources and starting to become more multimodal with images as well, which is relatively new, which is a great experience all the way around. And I see it as like setting a new standard for answers that are—I started, I'm starting to use the term per Perplexity.

I'm not sure this is necessarily rock solid ground truth. Like Perplexity is not always right, but it's the most accurate AI tool. It's usually right in my experience, you might be able to find something here that is wrong, but everything I ended up fact-checking turned out to be true. And so I think there's this kind of very interesting, good-enough-for-practical-purposes standard where I don't necessarily need it to be 100% accurate for it to be very, very useful. And I would make my decisions. Did I trust it enough, for example, to be confident that there was in fact going to be a USB charger in the car that I went to go look at? Yes. And in fact, it was correct about that.

And so I have this kind of per-Perplexity standard of verification. In my mind now, where I'm like, yeah, in many situations, it's good enough to act on. I wouldn't make life--and-death decisions without more fact-checking, but I don't even need to follow these links in most cases. Now for something like this, I'll trust it.

And it's an emerging standard in the family as well. My wife asks, “Do we really have to get a car that's that old? Do they have this? Do they have that?” And I was able to ask Perplexity and send her, yep, it should have a backup camera per Perplexity. It should have a USB charger, it should have the blind-spot detection, and it's an incredible time-saver.

A worthy alternative to even something like a Wirecutter, which has been the standard that my wife has used for a long time. But obviously that's an editorial approach where you can't just ask any question you want to ask. Here you can ask any question you want to ask, and I think you do get something oftentimes that is a worthy rival, even to a much more editorial product.

Daniel Shipper (01:03:05)

No, that makes perfect sense. It reminds me of Wirecutter. It reminds me of—there are all those sites that are like Quora, but it's for this new generation where no one had to think previously to ask this particular question and it can just gather and answer the question for you immediately. And I think that's so powerful. It's really starting to click for me when and how I might use it. There are so many questions I have that I'm like, I basically want to get to the best answer for a fact-based question, more or less. And I'm so lazy and I really don't want to do all the research and ChatGPT will do one search and then sort of crib the first article. And this feels a lot better than that.

Nathan Labenz (01:03:49)

Yeah. It's really good. It's faster than ChatGPT on the browsing side. So you're getting to answer notably faster and marginally more accurate—just more of the sort of answer that I want a lot of times. I've had a couple of instances where I tried the same thing with ChatGPT and I was able to get there, but it was slower on the browse, didn't give me the full answer the first time. I was like, “No, but I need a little more.”

And then I was able to get over the hump and get there, but this was definitely just a faster, cleaner experience that I do believe is a bit more accurate as well. It goes to show that there are different roles that you want AI to play, and I think there is—it's interesting, there's forces pushing both ways, right?

What makes the AIs so compelling is that they're extremely general-purpose, and it seems like there is something like there is a fundamental reality that they get really powerful at scale, and to scale they have to be the general-purpose. And so that kind of comes as a package, but here the scope has been narrowed.

And there are a lot of things that ChatGPT does for people that this is not trying to do for people. And in its specialization, it does seem to be achieving higher heights in the domain that it really attempts to be best in. So I definitely recommend Perplexity a lot. And I'm just old enough to remember when people were first saying that they were Googling it.

And this has a similar vibe to me where it's a standard that I think people can comfortably socially transact on and feel like they're on pretty solid ground.

Daniel Shipper (01:05:30)

I love this. You're using it to build stuff, but also really using it to fuel your curiosity. And I'm curious, before we wrap up, what are you excited about now? What are you thinking about right now? What's on your radar that you think people should be paying attention to in ChatGPT specifically, but broadly in AI over the next couple of years?

Nathan Labenz (01:05:51)

Boy, broadly in AI over the next couple of years … I think almost anything is possible. I take the leaders of the field pretty much at their word, in terms of being honest reflections of their expectations.

And you listen to what Sam Altman thinks might happen over the next couple of years. You listen to what Dario Amodei from Anthropic thinks might happen over the next couple of years. And we are potentially looking at something that is superhuman in very substantial and meaningful ways. I think there's a lot of kind of conflation and talking past one another when people try to analyze this.

And I do think it's important to say you can be superhuman in very consequential ways without being omnipotent or infallible. And I think there's actually quite a lot of space, right, between human performance and omnipotence or infallibility. And I kind of expect that AI is going to land there for a lot of different things over the next couple of years.

So I think the value of the kinds of things that we will be engaging with it for is only headed up just through a recent result from Google DeepMind on using their best language models for differential diagnosis was an extremely striking result. This team has been on an absolute tear. It was only maybe a year ago that they first got a language model to hit passing-level on medical-licensing tests, which, hey, that's crazy. But you can just kind of say, “Well, it's a test. It's more structured. The real world is messy and they're only passing. You wouldn't want a doctor that's just merely passing.” Okay. Guess what? We didn't stop there. Next thing you know, it was hitting expert-level performance on the test.

Next thing you know they've added multimodality and it can now do a pretty good job of reading your x-rays and other tissue slides. And again, is it perfect? No, it would be probably on the lower end of what the actual human radiologist does. Although even there, it was like 60-40, I think, I think it was like 60-40% that the human radiologist was beating the AI radiologist.

So it's okay. That's a pretty narrow margin. Obviously we're not done. The current thing is taking case studies out of medical journals, case studies being like. Extreme hard-to-figure-out cases, right? When a case gets reported in a medical journal, that's because this case, you know, is thought to be highly instructive, right? It was a confusing situation. It's an unfamiliar combination of symptoms, or what have you. So they don't publish just the routine cold, right, in the journals. So they take these case studies out of journals and they had a study of comparing AI's effectiveness at doing the differential diagnosis versus human with access to AI.

And AI was the best by a significant margin. The human alone was last. So, in their presentation of this—they're very modest and they take almost like a, in my view, almost like a too-grounded willfully-burying-the-lede-almost, at times, it seems. And what, one of the main conclusions of the paper was we need better interfaces, so that doctors can take better advantage of this, but it was like—to me—yes, that's one lesson I would take away from this paper but the other lesson is that the AI is getting it right like twice as often as the human clinician like 60%-to-30%. That's another big lesson, too, that I take away from a lot of these things is we don't often measure human performance. We think, because we've lived in a world for a long time, we're like, human doctors—I mean, we know that some are better than others, but we look at that as a standard that there's a human doctor and they're licensed and they're supposed to be good, but how often do they get the right diagnosis on this? It turned out in this particular data set, it was in the ballpark of 30%.

So there's a lot of room for improvement and you could perhaps say, well, what would the best doctor in the world do? The best doctor in the world, I'm sure, is a lot better, maybe even better than the 60% that their language model was able to do, but you probably can't access that person. We are apparently headed for a world where you should be able to access that AI doctor.

And if it's a 2x better performance on such a challenging task as differential diagnosis that I think we're headed for a world of radical access to expertise, which I think is going to be at unbelievably low prices, which I think is going to be a transformative force in society, right? It's going to be one of the greatest blows ever struck for equality of opportunity, equality of access, in many ways. It's also going to change a lot of market dynamics and change what wages can be commanded for different kinds of services. I'm excited about that. I also think it probably is going to be fairly disruptive and it probably is going to become more and more political.

I think the upside of that, I think, is pretty clear and really extremely compelling. So I hope we do get to actually enjoy the fruits of that future. Then one other thing I'll say is just, the transformer is not the end of history. ChatGPT is not the end of history.

This sort of no-memory AI, just this last week or two, we've seen a flurry of activity in the state-space model architecture. And again, it's been reported—if you’re on Twitter and seeing this stuff, it's like, hey, there's a new thing that might even be better than the transformer. It might be a transformer successor, it might be a transformer alternative, it might be a transformer replacement. It has some nice properties that the transformers don't have: better long-term memory, better scaling, better speed, better throughput. Maybe we just all flip over from one to the other and oh, the transformer was the old thing, this is the new thing.

But I strongly suspect that what we are going to see is a mixture of these architectures, where, just like in the brain, we obviously don't have just one single unit of the brain that gets repeated over and over again. We have a lot of different modules, including some that do get repeated. It seems like we're almost for sure headed for AI's that are like composites of different kinds of architectures that bring their own strengths and weaknesses in information processing to the table such that as much as this has been a shocking amount of progress to get to GPT-4 from GPT-2 just four years ago, I have to say, I think the next few years are going to bring at least as much more change. And it's, it's going to be a wild ride.

Daniel Shipper (01:12:28)

It's exciting. It's inspiring. I'm excited for the future. And I really appreciate you taking the time to share your thoughts and show us how you use ChatGPT. And I'd love to have you back and see where we are, see what new stuff comes up on the horizon.

Nathan Labenz (01:12:42)

Yeah, thank you. I appreciate the opportunity, Dan. This has been a lot of fun and I definitely learned some things and was inspired to go chase down a few more use cases as well. So hopefully next time I'll have some better custom instructions and a little bit better track record in the brainstorming department. I think it's been a great exchange. So that's

Daniel Shipper (01:12:57)

That sounds great. Thanks a lot.