Could AI Finally Learn Like We Do?

Hello, and happy Sunday! Our course How to Write With AI taught by Evan Armstrong is open for early bird registration. You’ll learn how to fashion expert prose and get your writing to resonate with your intended audience while harnessing the latest AI technology. If you enroll before Jan. 23, you can jump the nearly 10-000-person strong waitlist for our new AI email tool, Cora. Don’t miss out—our next cohort kicks off on February 13.—Kate Lee

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Release notes

Google launches Titan: Languages models that can learn from experience

In 2017, Google laid the groundwork for the AI boom, writing the seminal paper introducing the transformer architecture that has powered the language models that have come since. This week, the company claimed it’s taken another step forward with its new Titans architecture. It solves one of the outstanding problems with the previous generation of artificial intelligence models: They learn from experience, instead of forgetting what you tell them.

With better memory, AI models could recall past conversations, learn from new data, and—hopefully—never make the same mistake twice. If researchers have indeed cracked the code for AI memory, then we may soon enter an age of AI where users can trust and rely upon chatbots in a much deeper way.

The limitations of transformers—and what the new research means

AI chatbots are like geniuses with memory loss. You need to provide context for each and every conversation because, while they’re smart and perceptive, all they know is what was in their training data and the information you share.

Short-term memory is not a bug of modern AI, but rather a feature of the transformer architecture. In the same way that using a specific language can change how you think, these new AI architectures could shape the nature of the interactions we have with language models.

In 2017, Google researchers published “Attention Is All You Need,” which defined transformers and ultimately led to the development of OpenAI’s ChatGPT (GPT stands for “generative pre-trained transformer”). This week it shared Titans, a new architecture that could be the next step forward and enable AI to continuously learn. Meanwhile Sakana AI, a Japanese R&D firm that makes use of ideas from nature, published the paper "Transformer²." Each takes inspiration from how human brains work. They build on the short-term attention of transformers and aspire to impart the ability to act as a long-term, more persistent memory. Would you trust AI more if LLMs remembered past conversations, learned from them, and always cited sources?

Here is how transformers work today and some of the new ideas proposed by the Google and Sakana teams:

The (non)persistence of memory: Transformers’ key strength is being able to singularly focus on a provided prompt and predict how it might continue one word at a time. They train their “brains” for months on giant datasets before being deployed to predict one word at a time in order to answer a given prompt. But by the time they’re deployed, they’re also unplugged, so to speak. They are incapable of learning anything new after that point. While transformers get more powerful with scale, they're fundamentally constrained by their original design: They can't learn from conversations, build long-term memory, or adapt to new situations without extensive retraining. In practice, if a prompt gets too long, performance dramatically declines.

Source: The Illustrated Transformer.

Titans’s “surprise” memorization mechanism: Google’s Titans hopes to improve on the limitations of transformers by learning how to memorize key bits of information in a long prompt. It tries to mirror how human memory works, complete with a “surprise” mechanism that helps it decide what information is worth remembering. Just as our brain prioritizes events that violate our expectations—i.e., being surprising—Titans attempts to simulate the same process by checking if the words it is processing are themselves surprising or associated with a past surprise. Like our brains, the new architecture can handle much more information than the current models, handling sequences of over 2 million tokens long (about as long as the entire Harry Potter series, twice over). The promise of Titans is that a next-generation language model could hold all of that information in its memory at once—and learn from it. Titans outperforms OpenAI’s GPT-4 on many benchmarks, including Babilong, which is focused on testing if models can remember what happened in a story.

Source: ‘'Titans: Learning to Memorize at Test Time'’ and Babilong Benchmark.

Transformers²’s self-adaptive AI: For Transformers², the Sakana AI team took inspiration from how the human brain rewires itself after an injury. This vision of self-adaptive AI is at the heart of its two-step approach for Transformers². First, the model analyzes the incoming task to understand its requirements before attempting to learn; then it dynamically adjusts and provides results tailored to that task. Just like octopuses use camouflage, changing their skin patterns to match their environment, Transformers² adapts its entire architecture on the fly. Both Google and Sakana have effectively made AI models more adaptive, potentially a step change from the status quo.