The No-fluff Guide to AI Agents

In Michael Taylor’s work as a prompt engineer, he’s found that many of the issues he encounters in managing AI tools—such as their inconsistency, tendency to make things up, and lack of creativity—are ones he used to struggle with when he ran a marketing agency. It’s all about giving these tools the right context to do the job, just like with humans. In the latest piece in his series Also True for Humans, about managing AIs like you'd manage people, Michael explores the inner workings of AI agents—the next generation of assistive AI technology—and what they need to succeed. He goes in-depth on the reason and act (ReAct) pattern, one of the first attempts to give large language models the tools they need to be truly helpful.—Kate Lee

Was this newsletter forwarded to you? Sign up to get it in your inbox.

When I was in high school, I never understood why we couldn’t use calculators on tests. My teacher used to say, “You might not always have a calculator in your pocket.” I was annoyed, but I can’t blame her for not predicting the smartphone.

I’m not naturally gifted at doing multiplication in my head, but armed with a calculator app on my phone, I can be faster and more accurate than someone who doesn’t have a calculator. Having access to tools can make such a difference to our own abilities that using a calculator on a test can be seen as “cheating.” But, of course, when you finish school and get a job, your employer expects you to use a calculator to do your work as efficiently as possible. There’s no time for doing math in your head, and errors can be costly in the real world.

Teachers are concerned about students using ChatGPT to do their homework, but when those students graduate, their employers might expect them to be able to use AI to do their work. Using generative artificial intelligence has already been shown to improve workers’ productivity 40 percent, according to one estimate, and if the rate of progress holds, we can expect AI models to get 10 times better for the same cost year after year.

Source: Reddit.

The rate of progress in AI has had an immediate impact on knowledge workers such as marketers, graphic designers, and software engineers, who work primarily with text or images that can be generated with AI technology. Less impacted are jobs that require interaction with the physical world, or knowledge workers whose jobs aren’t as accessible programmatically. For example, I used to pay people to manually pull advertising data and format it into weekly reports for our clients because (perhaps counter-intuitively) it was significantly cheaper than paying the software vendors that automate this task.

AI agents are already being built that are capable of taking sophisticated action on your behalf without having to pay engineers to maintain expensive automations. Instead of paying a human to do manual data entry, imagine if I could give web-browsing access to an AI agent that could log into each advertising platform for me and put the relevant data into a spreadsheet, just like a human would. Most agents are at the prototype phase, as current LLMs aren't yet good enough at reasoning to operate at length without getting stuck. However, I've learned that with AI, anything that nearly works now might be fully functional—and feel like magic—in only six months, and could be ubiquitous in a year. The term “AI agents” is already surpassing “prompt engineering” in Google Search Trends, signaling a shift in the primary mechanism for getting better results from AI.

Source: Google Search Trends.

The prototypes of today won’t be prototypes for much longer. A new era of AI is coming. Let’s dive into how AI models use tools and the implications of your workplace being cohabited by AI agents.

How LLMs can take action by themselves

LLMs simply generate text, so you might be wondering how they can take action in the real world. That’s where coding comes in. If you tell the LLM which tools are available to use, it will respond with the ones it wants to use. From there, your script will take that action, and report the result back to the LLM.

Most of the most popular LLMs (ChatGPT, Claude, Gemini) include some form of tool use, also known as function calling, which opens up a world of potential for what tasks AI systems can do. In theory, any action that is available through an application programming interface (API) can be done by an LLM. This functionality is available with OpenAI’s custom GPTs—its platform for creating and hosting your own version of ChatGPT—and called GPT Actions.

Source: Apify.

For LLMs to be able to decide which actions to take, users need to tell them what functions are available for it to use and how to structure the data in order to get a response. Because LLMs are text-based, we have to look for when it intends to take one of the available actions. Our system then takes the intended action on the LLM’s behalf and returns the result of that action for it to deliberate on.

Here’s a simple implementation of a prompt that guides an LLM to take action:

We are instructing the AI to first output a Thought on how it can best answer a question, then output an Action that it needs to take to answer that question. For example, it might say “Thought: I need to use a calculator to add up 1 + 1,” then “Action: use_calculator(‘1 + 1’).” Our system has code that identifies the words “use_calculator” and runs those numbers through a calculator, before sending back the result as “Observation: 2.”

In essence, an agent is an AI system that prompts itself to do a task. The Thought, Action, Observation, Answer loop is the core pattern that allows it to do so. Rather than a user prompting ChatGPT what to do at each step, an agent runs a loop, “thinking” during each step in order to “decide” what to do without a prompt from the user. The loop finishes when the agent is confident enough to return the final answer, or give up.

Here’s a more sophisticated example: In order for an LLM to know the weather, I wrote code that gives OpenAI’s GPT-4o model access to a weather API. It can run a function called `get_temperature` by specifying a location and a date. When it “runs” the function, it returns the text “Action: get_temperature” with the location and date. Then my code looks for that text, running the code that calls the weather API, and returns the correct temperature. Now that it can access accurate temperatures, the agent can answer simple questions like, “How much hotter will it be tomorrow in Manchester, England?” Then, it can conduct more complex reasoning with regard to what actions should be taken, and act on them.

In Michael Taylor’s work as a prompt engineer, he’s found that many of the issues he encounters in managing AI tools—such as their inconsistency, tendency to make things up, and lack of creativity—are ones he used to struggle with when he ran a marketing agency. It’s all about giving these tools the right context to do the job, just like with humans. In the latest piece in his series Also True for Humans, about managing AIs like you'd manage people, Michael explores the inner workings of AI agents—the next generation of assistive AI technology—and what they need to succeed. He goes in-depth on the reason and act (ReAct) pattern, one of the first attempts to give large language models the tools they need to be truly helpful.—Kate Lee

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Clean your desktop with our AI-powered computer organizer—Sparkle

You know what they say: Live an organized life so that your work can be creative and sprawling. That’s why we built Sparkle, an AI tool that cleans up desktops—and keeps them clean. Get a head start on 2025 with a clean slate.

Try Sparkle now

Want to sponsor Every? Click here.

Source: Reddit.

Source: Google Search Trends.

How LLMs can take action by themselves

Source: Apify.

Here’s a simple implementation of a prompt that guides an LLM to take action:

Source: Screenshot courtesy of the author.

Let’s run this through our Thought, Action, Observation, Answer loop in order to break down how it works.

First it has a Thought that it needs to find today’s temperature in Manchester. It runs that Action and Observes the result. Next, it has another Thought that it will need tomorrow’s temperature in Manchester as well. Finally, it has the Thought that it will need to calculate the difference to answer the final question.

Because LLMs are bad at math, I also gave it access to a calculator it can access just by returning “Action: calculate” and the formula it wants to run. When it sends calculate 25.88 – 25.69 to the calculator, the code runs that function and returns 0.19, giving it the context it needs to finally Answer the question.

Think before you act

While LLMs have demonstrated usefulness in a wide range of language comprehension and writing, they are limited in their ability to do real-world tasks like browse the web or purchase products without some way to take action.

The first convincing example of tool use that I saw from LLMs came in the 2023 paper on ReAct—reasoning and acting—published by researchers at the Google Brain team. They combined chain of thought reasoning (thinking through a task step-by-step) with action-plan generation (choosing between available actions) to allow LLMs to use tools to complete tasks like we would as humans. The trick was the combination of the two techniques; relying solely on reasoning from the model’s built-in knowledge led to it hallucinating or making things up. And just the ability to take actions wasn’t enough on its own: Without reasoning, the LLMs displayed a lack of common sense in choosing the right tools for the job at hand. Combining reasoning with actions enabled the model to navigate more complex tasks, which required multiple Thought, Action, Observation loops.

Source: Arxiv.

ReAct could both reliably retrieve facts and reason about them to construct valid answers, much like how humans think to guide our actions, but also take actions to inform our thinking. Combining reasoning and acting had a much lower hallucination rate (6 percent versus 14 percent) on a dataset of questions that must be answered by combining information from two or more different Wikipedia passages (HotpotQA). There was also an improvement from 37 percent to 71 percent accuracy in taking the right actions in a text-based game environment simulating household tasks.

The limitation of the ReAct pattern is that you have to know ahead of time what tools will be useful to the LLM so you can define them in the prompt. Further research into autonomous agents has shown that LLMs can actually make their own tools and refine them in response to environmental feedback. In order to test AI agent capabilities, researchers have been using video game environments, where there are no real-world consequences if the agent goes haywire.

For example, in a 2023 paper by researchers at Nvidia, an AI agent called Voyager was found to be able to successfully play the game Minecraft by being given tasks to complete, a way to take action in the game (by writing and running JavaScript code), and a skill library in which to save the programs they wrote (in-game actions like “mine diamond,” “craft stone sword” and “cook steak”).

Source: Arxiv.

The AI agents explored the map and observed what happened to them, like being attacked by a zombie. Then they would plan out actions to take to achieve their goals. When the agent decided it needed to do something it hadn’t done before, it would write a program in JavaScript to achieve its goals by using the game's APIs (because a robot can’t simply press buttons like a human can). These programs would often be wrong at first, so when an action didn’t have a desired effect, the agent would refine the program until it worked. This ability to build and refine a skill library resulted in 330 percent better performance compared to other AI agents that didn’t have that ability.

Source: Arxiv.

What happens when AI takes over your computer?

LLMs are able to interact with Minecraft and other computer games by writing code. But that is also possible with many online products and services that can be accessed via API: An LLM could send a command to Uber to book a ride and to OpenTable to reserve a table. Anything you can do with code can now be done by AI, and many app developers are in the process of redesigning their APIs and documentation to be more LLM-friendly.

Many of our daily tasks like sending a personal email or buying a product online don’t have easily programmable APIs, however, because they’re designed primarily for humans, not automated code. While some LLM frameworks and applications are exploring “human-in-the-loop” architectures in which the AI can ping a human to do a task it itself can’t do, who wants to be a cog in a machine?

The solution is to train the AI to interact with the world the same way humans do. Tesla takes this approach with self-driving cars, eliminating expensive Lidar sensors in favor of driving by computer vision, driving the car like a human would. Anthropic has done something similar with Computer Use functionality, where its model Claude takes the reins in order to navigate complex tasks. Claude takes screenshots every few seconds to “see,” moving the user’s mouse to interact and generating text to fill form fields.

Source: Anthropic.

In my experience, these AI agents are still limited: They’re too slow, get stuck too often, and make mistakes. But if there’s one thing I’ve learned from AI, it’s that nascent tools and systems can improve rapidly. OpenAI’s o1-preview model was designed to be better than previous models at reasoning tasks, and the company plans to release an AI agent, code-named “operator,” early next year. Chief Product Officer Kevin Weill predicted that 2025 is “going to be the year that agentic systems finally hit the mainstream.”

In the future, LLMs will no longer be confined to the chat window and might start to inhabit your Slack channels, make changes to your Word documents, and show up in your social media timelines. AI coding agents like Devin are already being tested on completing real-world tasks from freelancer platform Upwork, and have even been spotted asking questions in coding communities when it gets stuck. For $500 per month, Devin can work on your code base independently, doing work equivalent to a junior engineer. As models progress, it’s easy to imagine Devin taking on more intermediate and advanced tasks with less coaching from a human developer.

Source: Cognition AI.

In a sign of stranger times to come, venture capitalist Marc Andreessen funded a semi-autonomous AI bot on X with $50,000 in bitcoin, which later pumped a meme-based cryptocurrency called Goatseus Maximus to a market cap that at one point surpassed $1 billion.

Source: CoinMarketCap.

There are many obvious downsides to letting an AI loose on your computer, not least the fact that it will drain your bank account from the token costs of processing all of those images and generating plans of what to do next. (These things aren’t free!) Rogue AI systems could also do something illegal, like commit fraud, or potentially immoral, like persuading someone to vote for an unsavory political candidate. Perhaps Computer Use is what finally elicits catastrophic impact from AI, but it could also turn out to make our lives immeasurably better, with a few negative externalities we have to deal with.

There are risks, but the upsides are too great to ignore. If AIs are able to act more independently rather than require your full attention to prompt an AI and check its results, you can have a virtual coworker—or a team of coworkers—who can work on their own to achieve broader goals. Given the obvious economic benefits of having someone on your team that never needs to take a break, it seems plausible that most—or at least some—of your coworkers will be AI in five years. Elon Musk has a tendency to be optimistic about timelines, so I doubt that the humanoid Optimus robots will be in production by 2026, but I wouldn’t bet against him producing them eventually. A humanoid body is the ultimate tool to give an LLM because any tool designed for humans could be used by robots. No longer limited to knowledge work, AI could in theory do any job currently done by humans.

Source: IOT World Today.

In eliminating real-world manual work, AI may finally live up to what we want out of it. As one Twitter user put it: “I want AI to do my laundry and dishes so that I can do art and writing, not for AI to do my art and writing so that I can do my laundry and dishes.”

Humans learning to use tools was key to our evolution, and working to put those tools in the hands of AI will be the thing that finally frees us from manual labor.

Michael Taylor is a freelance prompt engineer, the creator of the top prompt engineering course on Udemy, and the coauthor of Prompt Engineering for Generative AI.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex.