What an AI Agent Actually Is

Strip away the marketing and an agent is four parts — a driver, tools, an environment, and a loop. Name the broken part and the fix is usually obvious.

7 min read · 1316 words

The word “agent” has gotten away from us. It now stretches from a single prompt-plus-tool-call demo all the way to fully autonomous systems running unattended for hours. That’s a wide enough range to be useless as a definition, so when I talk about agents with other engineers I find it more productive to break the thing into its actual moving parts.

Strip away the marketing and an AI agent is four things glued together: a driver, a set of tools, an environment, and a loop. Once you see it that way, the architecture of any given agent — from a Claude Code session to a cron-driven scraper to a phone-based assistant — becomes a lot easier to reason about.

The driver

The driver is what generates content and decides what to do next. In 2026 this is almost always an LLM, but it doesn’t have to be — vision-language models, multimodal models, and even non-language models can play this role for the right task. Anthropic’s definition of an agentic system draws a useful line here: a workflow orchestrates LLMs and tools through predefined code paths, while an agent is a system where LLMs “dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks” (Anthropic 2024).

That dynamic control is the whole point. The driver isn’t just a content generator — it’s the thing choosing which tool to call, how to interpret the result, and when to stop. Everything else in the system exists to give the driver useful options and a faithful view of the world.

The tools

Tools are how the driver acts on anything outside its own token stream. They come in a few flavors:

MCP servers — the open protocol Anthropic introduced in November 2024 for connecting LLM applications to external data and capabilities (Model Context Protocol n.d.). MCP has since been adopted by OpenAI, Google DeepMind, and most of the major coding-agent vendors, and there are now thousands of community-built servers. The model talks to a standard interface; the server handles the actual integration with Slack, Postgres, GitHub, or whatever else.
External APIs — any HTTP endpoint the agent can hit directly, with or without an MCP wrapper.
Custom code — local functions, shell commands, file operations. Coding agents in particular live and die by this category.

Coding assistants like Claude Code (Anthropic n.d.) and OpenCode (OpenCode n.d.) are useful examples because they bundle all three. Out of the box they ship a curated tool set (file reads and edits, shell execution, search), and they let you plug in additional MCP servers on top (Model Context Protocol n.d.). OpenCode in particular makes a point of being provider-agnostic — you can drive it with Claude, GPT, Gemini, or local models — which is a nice illustration of how cleanly the driver/tools split actually decomposes in practice.

The interesting design question with tools isn’t “what should the agent be able to do” but “what should the agent see.” Tool descriptions live in the context window and cost real tokens; Anthropic’s piece on code-execution-based MCP makes the point concrete: letting agents load tool definitions on demand instead of all upfront cut one worked example from 150,000 tokens to about 2,000 — a 98.7% reduction (Anthropic 2025). The shape of your tool surface is a first-class design concern, not an afterthought.

The environment

An agent runs somewhere. That somewhere matters more than people give it credit for, because it determines what the agent can touch, what it costs to run, and what happens when something goes wrong.

A few common shapes:

Locally, on a developer’s machine — the default for coding agents. Cheap, fast, easy to debug, but tied to that one machine.
In a Kubernetes cluster or on a VM — the default for backend agents that need to run on a schedule, persist state, or serve multiple users.
In a sandbox — short-lived containerized environments for untrusted code execution. Increasingly the default for “let the agent run arbitrary commands” workflows.
On a phone or in a browser — the emerging frontier, with all the constraints that implies (intermittent connectivity, battery, narrow tool access).

The environment shapes the failure modes. A local coding agent that gets confused can be killed with Ctrl-C. A scheduled agent running unattended in a VM needs real guardrails, retry logic, and observability, because nobody is watching the terminal when it goes off the rails. Anthropic’s building-effective-agents post argues for exactly this — pausing for human feedback at checkpoints or when an agent hits a blocker, with guardrails and sandboxed testing before you hand it autonomy (Anthropic 2024) — and the right checkpoint design depends entirely on where the agent lives.

The loop

This is the part most people gloss over, and it’s where the actual agent lives.

A loop is just iteration, and we’ve been writing loops forever. A for loop is a loop. What makes an agent loop different is what triggers it and what runs inside it. The trigger is usually one of:

A prompt: a user (or another agent) sends a prompt, the driver reasons and acts until the task is done, then the loop waits for the next prompt. This is the shape of every interactive coding agent.
An event: something happens in the world — a webhook fires, a file changes, a metric crosses a threshold — and the agent wakes up to handle it. This is the shape of most production automation.
A scheduler: cron or its equivalent fires every N minutes and kicks off a run. This is the shape of monitoring and maintenance agents.

Inside any of those triggers, the loop body usually looks like some variant of the ReAct pattern (Yao et al. 2022): the driver produces a thought, picks an action (a tool call), observes the result, and feeds the observation back into its context for the next iteration. It continues until it decides the task is done — or until something external stops it.

In rough pseudocode:

loop:
    prompt = wait_for_trigger()
    context = [prompt]
    while not done:
        thought, action = driver.step(context)
        observation = tools.execute(action)
        context.append(thought, action, observation)
    respond(context)

That’s the entire pattern. Everything else — multi-agent orchestration, planning, evaluator/optimizer loops, the whole menagerie of patterns Anthropic catalogs in their agents post (Anthropic 2024) — is a variation on the shape of that loop or a way of composing multiple loops together.

Why the decomposition matters

The four-part frame is useful because it tells you where to look when something is wrong:

Bad outputs? That’s usually the driver. Different model, different prompt, different reasoning strategy.
Can’t get the work done? That’s usually the tools. Wrong tool surface, missing capability, descriptions that don’t tell the model what it needs.
Flaky, slow, or expensive? That’s usually the environment. Wrong place to run this thing, or the wrong observability story.
Goes off the rails? That’s usually the loop. No stopping condition, no checkpoints, no guardrails, accumulating context until it forgets what it was doing.

Most “the agent is broken” complaints I hear from other engineers are really complaints about one of these four pieces, dressed up as a complaint about agents in general. Once you can name which piece, the fix tends to be obvious.

Agents aren’t magic, and they aren’t a single thing. They’re a driver, a set of tools, an environment, and a loop. Build each one deliberately and the system mostly takes care of itself.

References

Anthropic. n.d. “Claude Code.” Anthropic. Accessed May 15, 2026. https://www.anthropic.com/claude-code.
Anthropic. 2024. “Building Effective Agents.” Anthropic. https://www.anthropic.com/engineering/building-effective-agents.
Anthropic. 2025. “Code Execution with MCP: Building More Efficient Agents.” Anthropic. https://www.anthropic.com/engineering/code-execution-with-mcp.
Model Context Protocol. n.d. “What Is the Model Context Protocol (MCP)?” Accessed May 15, 2026. https://modelcontextprotocol.io/.
OpenCode. n.d. “OpenCode.” Accessed May 15, 2026. https://opencode.ai/.
Yao, Shunyu, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2022. “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv. https://arxiv.org/abs/2210.03629.