Explainer
What are AI agents, really?
An AI agent is software that takes a goal, decides what to do next, uses tools to do it, checks its work, and reports back. That is the whole idea. Everything else is implementation detail — and where most of the confusion lives.
The simplest possible definition
A chatbot answers a question. An agent works through a task. It might search, open files, write code, call APIs, drive a browser, send an email, or wait for an event before continuing. The interesting word is decide: the model is choosing the next step, not just generating text.
The four things every agent needs
- A goal. Stated clearly, with a definition of done. Vague goals produce vague agents.
- Tools. Concrete actions the agent can take outside the chat box — read a file, call an API, send a message, run a query. The smaller and sharper the tool list, the better the agent.
- Memory. Some way to keep track of what has already happened — short-term within a task, longer-term across tasks if relevant.
- Limits. Boundaries on what the agent can do alone, and where a human reviews. Agents that touch money, permissions, or production data without a human in the loop are how teams get into trouble.
The three flavors people confuse
“Agent” is a stretched word. Three different things sit underneath it:
- Workflows with model steps. A pipeline where one or more steps happen to be an LLM call. The control flow is hand-coded by you. Most “agents in production” are actually this — and that is fine.
- Tool-using agents. A model picks tools and arguments inside a bounded loop. Good for structured tasks: filling forms from documents, triaging tickets, running a known sequence of API calls with judgment in between.
- Open-ended agents. A model picks what to do next with broad freedom. Browser agents, autonomous coding agents, computer-use agents. These are the demos. They are also the ones that fail in long horizons.
The mistake is to use “agent” to mean the third one when you actually want the first.
What 2026 changed
Three things finally lined up. Frontier models got reliably good at multi-step planning. Tool-use, structured output, and long-context support all matured. And inference got cheap enough that an agent run that cost $4 a year ago now costs cents. Result: the question changed from “can an agent do this?” to “is the agent cheaper, faster, or more accurate than what we do today?”
What works in practice today
- Customer-support triage with a human approving sends.
- Internal knowledge agents over the company wiki and codebase.
- Coding agents on bounded tickets with tests.
- Sales-ops automation: lead enrichment, meeting notes, CRM hygiene.
Where the hype goes wrong
Three places, repeatedly:
- “Autonomous employee.” Long-horizon open-ended agents drift after twenty steps. They are good at sprints, not marathons.
- Multi-agent debate. Two agents arguing with each other looks impressive in a demo and burns tokens in production. Specialist sub-agents called by an orchestrator work; round-table debate mostly does not.
- “It will just figure it out.” The teams that ship cleanly bound the tool list, force structured output, write evals before they write the prompt, and put a human between the agent and any irreversible action.
How to start, today
- Pick one workflow with a clear time or cost metric. The more boring the better.
- List the smallest set of tools the agent needs. Resist adding more.
- Write five test cases before you write the prompt.
- Build the simplest version. Run it behind a human approval for two weeks.
- Look at the failures. They tell you whether to expand the agent or shrink it.