What is an Agent? | Will Chen

I've spent the last two years building AI systems, and in that time I've watched the same argument loop endlessly on Twitter, in Slack channels, at conferences: what counts as an "agent"?

Some say an agent is an LLM that can use tools. Others insist it needs memory. Others say it needs to plan, or to be autonomous, taking actions without human approval. The discourse is infinite and circular: AI Twitter argues about definitions, frameworks ship "agent" features, companies rebrand their chatbots as "agentic." And the whole conversation misses the point.

"Agent" is not a technical primitive. It's a marketing term that gestures vaguely at "AI that does stuff."

When you peel back the abstraction, there's no fundamental unit called an "agent." There's just code: context construction, model calls, tool execution, state management, control flow. The patterns you choose to compose from these primitives determine what your system can do. The word "agent" obscures more than it reveals, and the debate about what qualifies as one is the wrong debate entirely.

The useful question is different: what are the actual engineering primitives for building intelligent systems?

The Illusion of the Entity

When people say "agent," they typically imagine something like a persistent entity: a digital mind that remembers you, pursues goals, and takes actions in the world. When you talk to "the agent," you're talking to it, a coherent self that exists across your interactions, accumulating context and refining its understanding of you over time.

This intuition is incomplete. The entity is real, but it doesn't live inside the model. It's distributed across four components:

The model: Stateless text transformation. Given input, produce output. No memory, no identity, no continuity between calls.
The context: The input constructed for each call. This is where "memory" lives during a conversation: the system prompt, the message history, retrieved documents, tool results. Everything the model knows comes from what you put in front of it.
The storage: Persistent state across sessions. Databases, vector stores, file systems. What the system "remembers" between conversations lives here, not in the model.
The orchestration: Control flow logic that decides when to call the model, what context to construct, which tools to invoke, when to stop. This is the glue that turns stateless calls into coherent behavior.

The "agent" is the composite of all four. The illusion of a continuous entity emerges from your system design, not from any capability inherent to the model itself.

You can prove this to yourself: build a chatbot that randomly switches between Claude, GPT-4, and Gemini on each turn, while maintaining the same context window. The user experiences one coherent assistant. The "agent" is your architecture, not any particular model.

This reframe matters because it shifts where you look for leverage. If you think the agent is the model, you wait for better models. If you understand the agent is the composite, you realize something important: you control three of the four components entirely. Context construction, storage design, and orchestration logic are all your code. The model is a capability you rent.

Agent Is the Wrong Primitive

So if "agent" means "the composite system," why do we need the word at all?

Mostly, we don't.

When someone says "I'm building an agent," what do they actually mean? In my experience, it's usually one of four things:

A chatbot with tool use
An autonomous loop that runs until a task is done
A system with multiple specialized LLM calls coordinated together
Something that remembers context across sessions

These are four completely different architectures with almost no shared implementation patterns, yet the word "agent" lumps them together as if they were the same thing.

The frameworks don't help clarify matters. LangChain alone has Agent, AgentExecutor, create_react_agent, create_openai_functions_agent, create_tool_calling_agent: a proliferation of abstractions that reify "agent" as a thing you instantiate, rather than a pattern you implement. The abstraction becomes the territory, and you lose sight of what's actually happening underneath.

Strip it down. At the core of every system, regardless of what you call it, there are two fundamental operations:

Context construction: Deciding what information to put in front of the model for a given call.
Control flow: Deciding what to do with the model's output: call again? execute a tool? return to user? give up?

Everything else, memory, tools, planning, multi-agent coordination, is a specific pattern built on top of these two primitives. When you say "agent," you're usually pointing at a particular combination of context construction and control flow patterns. The useful conversation is about those patterns directly, not about whether something qualifies for the label.

The Reasoning Loop

The core pattern that people gesture at with "agentic" is the reasoning loop: the model is called repeatedly, with the output of each call informing the input to the next, until some termination condition is met. This is distinct from single-call usage, where you construct a context, call the model once, and return the result. Single-call is stateless request-response; the reasoning loop is iterative search.

The simplest reasoning loop looks like this:

while not done:
    response = model.call(context)
    if response.wants_tool:
        result = execute_tool(response.tool_call)
        context.add(tool_result=result)
    else:
        done = True
        return response.text

This is the ReAct pattern that underlies most "agentic" frameworks: the model reasons, decides to act (tool call), observes the result, and reasons again. But it's just one loop topology among many.

Reflection loop: The model generates output, then critiques its own output, then revises. The "tool" is self-evaluation.

draft = model.generate(context)
critique = model.critique(draft)
final = model.revise(draft, critique)

Planning loop: The model decomposes a task into steps, executes each step, and potentially re-plans if a step fails.

plan = model.plan(task)
for step in plan:
    result = execute(step)
    if failed(result):
        plan = model.replan(task, progress, failure)

Search loop: The model explores multiple branches, evaluates them, and selects the best path. Monte Carlo Tree Search with an LLM as the evaluation function.

Hierarchical loop: An outer loop calls an inner loop. A "manager" model decomposes work and delegates to "worker" models, each running their own reasoning loops.

The point is that "agentic" isn't a binary property. It's a spectrum of loop complexity. A single ReAct loop is weakly agentic. A hierarchical planner with reflection and search is strongly agentic. But they're all compositions of the same underlying primitives: context construction, model calls, and control flow.

Why Multiple Calls?

A reasonable question: why make multiple LLM calls if you could fit everything in one? After all, context windows are enormous now. GPT-4 handles 128k tokens. Claude handles 200k. Why not just dump everything in and let the model figure it out?

The answer is that a single call has fixed information. The call boundary is where three critical things happen:

New information enters. You can't know what a web search returns until you call it. You can't know what's in a file until you read it. Tool results add information that simply wasn't available when you constructed the original context.

Real verification happens. The model can "reason" about whether code is correct, but only execution proves it. The model can "think" about what a user wants, but only asking confirms it. Verification requires leaving the model and returning with evidence.

Actions execute. Sending an email, writing a file, making an API call: these happen at the boundary between model and world. The model produces intent; the system produces effect.

A single call is a pure function over its input. Multiple calls let you interleave computation with observation, verification, and action. The loop is what makes the system actually do things in the world, not just talk about doing things.

This is why the reasoning loop is the fundamental unit of "agency." It's not about whether the model is "smart enough." It's about whether the system architecture allows for incorporating new information, verifying beliefs against reality, and producing effects in the world.

The Anthropomorphization Trap

Now there's a tempting move here: if agents are composite systems, and systems can have multiple components, why not design "multi-agent" systems where multiple agents collaborate?

You've probably seen the demos. A "researcher agent" and a "writer agent" work together. A "manager agent" delegates to "worker agents." A whole "society of agents" debates and votes. The framing is seductive because it maps to how humans organize: specialized roles, delegation, collaboration, consensus.

I want to argue this framing is usually wrong, and often counterproductive.

The problem isn't that multiple LLM calls are bad. It's that anthropomorphizing them as "agents" with identities, roles, and social dynamics imports a ton of assumptions that don't actually hold.

Consider what's actually happening when you split work across two LLM calls, say a "planner" and an "executor": you're constructing two different contexts (one optimized for planning, one for execution), potentially using different prompts or even different models, and defining an interface for how information passes between them. This is just function decomposition. It's the same pattern as splitting a monolithic function into two functions with a defined interface. Calling them "agents" that "collaborate" adds nothing but confusion.

The anthropomorphic framing creates specific problems:

Implicit statefulness. When you think of agents as entities, you expect them to "remember" their interactions. But each call is stateless unless you explicitly persist and reconstruct state. The "researcher agent" doesn't remember what it found unless you build that memory yourself.

Social dynamics that don't exist. "Debate" between agents isn't actually debate. It's two context constructions that include each other's outputs. There's no persuasion, no updating of beliefs, no compromise. Just text concatenation.

Reliability collapse. Each agent boundary is a point of potential failure. If one agent misunderstands the task, the cascade fails. Multi-agent systems compound unreliability rather than improving it.

The unsexy truth is that for most applications, you want the simplest loop that accomplishes the task. Usually that's a single ReAct loop. Sometimes it's a reflection loop. Rarely is it a "society of agents."

The multi-agent paradigm has its place: in research, in exploring novel architectures, in cases where context isolation genuinely helps. But it's not a default. It's not the future of all AI. It's one pattern among many, and often not the right one.

System Design, Not Agent Design

Here's the reframe that dissolves the whole debate: agent design is system design.

More specifically, it's a branch of distributed systems engineering. The problems you face building intelligent systems are the same problems you face building any distributed system:

State management: Where does state live? How is it persisted? How is it accessed?
Coordination: How do different components communicate? How do they agree on shared state?
Failure handling: What happens when a call fails? How do you retry? How do you recover?
Observability: How do you know what the system is doing? How do you debug it?
Consistency: How do you ensure different parts of the system have coherent views?

The fact that one of your components is an LLM doesn't change the fundamental nature of the engineering.

When you adopt this lens, the "what is an agent?" question dissolves. You stop asking whether something counts as an agent and start asking concrete questions: What's my state model? What's my control flow? How do I construct context? How do I handle failures? How do I observe behavior?

These questions have concrete answers. They lead to actual design decisions. And they generalize across every system you'll build, whether you call it an "agent" or not.

The 10 Elements of Intelligent System Design

So what are the fundamental primitives? After two years of building AI systems, and after writing a book-length treatment trying to derive them from first principles, I've landed on 10 elements that I believe capture the complete design space.

Each element is a category of design decisions you face when building any intelligent system.

1. Context

Every model call receives a context: the text that constitutes its input. Context construction is the most leveraged decision you make. What goes in, what stays out, how it's ordered, how it's formatted: these choices determine what the model can do.

Context is bounded (the window has a limit) and expensive (more tokens means more cost and latency). Managing this constraint, deciding what to include, what to summarize, what to retrieve on demand, is a core design problem.

2. Memory

The model is stateless, but systems need to remember. Memory is the persistence layer: databases, vector stores, file systems, caches. The design questions are: what gets stored? How is it indexed? How is it retrieved?

Memory systems range from simple (append messages to a list) to complex (hierarchical summarization with semantic retrieval). The right choice depends on your access patterns.

3. Agency

Agency is the capacity to produce effects in the world: to call tools, write files, send messages, query APIs. A model without agency can only talk about the world. A model with agency can change it.

The design questions: what actions can the system take? What constraints apply? How are actions authorized? How are results observed?

4. Reasoning

Reasoning is the structure of model calls. Single call? Iterative loop? Reflection? Planning? The reasoning architecture determines what kinds of problems the system can solve.

More complex reasoning (planning, search, verification) enables harder tasks but costs more and introduces more failure modes. The design question: what's the minimum reasoning complexity that accomplishes the task?

5. Coordination

When a system has multiple components making model calls (whether you call them "agents" or just "functions"), they need to coordinate. How does information flow between them? How is shared state managed? What's the topology?

Coordination patterns include: sequential (pipeline), parallel (fan-out/fan-in), hierarchical (delegation), and cyclic (iterative refinement).

6. Artifacts

For complex tasks, the output isn't a message but an artifact: a document, codebase, analysis, or plan. Artifacts are structured, versioned, and evolve over multiple iterations.

The design questions: what's the structure of your output? How do intermediate states get represented? How do multiple components contribute to the same artifact?

7. Autonomy

Autonomy is about who initiates action. In a chatbot, the user initiates and the system responds. In an autonomous system, the system initiates on its own: scheduled tasks, triggered workflows, proactive notifications.

The design question: what triggers computation? User messages? Schedules? Events? Conditions in the environment?

8. Evaluation

How do you know if your system is working? Evaluation is the practice of defining and measuring success. Without evaluation, you're flying blind.

Evaluation ranges from vibes ("this response looks good") to rigorous (automated evals on test suites with quantitative metrics). The design question: how do you measure quality, and how does that measurement inform iteration?

9. Feedback

Feedback is information that flows back into the system to improve it: user corrections, thumbs up/down signals, evaluation results, explicit preferences. Feedback is what closes the loop between deployment and improvement.

The design questions: what signals do you collect? How do they influence future behavior? Online learning, batch retraining, or just prompt iteration?

10. Learning

Learning is the capacity for the system to improve over time. Not the model weights (those are frozen), but the system: better prompts, better retrieval, better tools, better control flow.

The design question: where is learning localized? Is it manual (you edit prompts) or automatic (the system self-improves)? What's the feedback loop?

Using the Framework

Use the 10 elements as a design space map, not a checklist. Every intelligent system makes choices in each dimension, whether explicitly or by default.

When you're designing a system, walk through each element:

Context: What information does the model need? How will I construct it?
Memory: What needs to persist across sessions? How will I store and retrieve it?
Agency: What actions can the system take? What are the boundaries?
Reasoning: What's the loop structure? Single call, iteration, planning?
Coordination: Is this one component or multiple? How do they communicate?
Artifacts: What's the output structure? How does it evolve?
Autonomy: What triggers the system? User-initiated or self-initiated?
Evaluation: How will I know if it's working?
Feedback: What signals will inform improvement?
Learning: How does the system get better over time?

Each question has many valid answers. The framework doesn't tell you what to build. It tells you what decisions you're making, even when you don't realize you're making them.

Beyond Agents

I've been working on a longer treatment of these ideas: a book called Elements of Agentic System Design that derives each element from first principles and shows the patterns in code. The framework provides a vocabulary for intelligence design that's more precise than "agent" and more general than any particular framework's abstractions. It maps the space of choices, so you can make them deliberately rather than defaulting to whatever the framework gives you.

What is an agent?

Wrong question.

The right questions are: What's my context construction strategy? What's my reasoning loop topology? What's my state model? How do I handle coordination? How do I evaluate success?

These questions have answers. They lead to code. They compose into systems that actually work.

The word "agent" will keep being useful as a marketing term, as a shorthand, as a way to gesture at "AI that does stuff." That's fine. But when you're building, think in systems, not agents.