Two Ways to Give an Agent Capabilities — And Why the Choice Isn't Settled

Henrik Holst
Apr 19
6 min read

Every team shipping AI agents right now quietly hits the same fork in the road, usually a few weeks into the work.

You have a capability. Maybe it's your product's API, maybe it's an internal service, maybe it's a third-party integration. You want an agent to be able to use it. You have two credible paths, and they point in genuinely different directions.

Path A: expose it as a tool-calling surface. The Model Context Protocol (MCP) is the current incarnation. The agent's host — Claude Code, Codex, Claude Desktop, any MCP-aware client — discovers your operations as a typed menu. The LLM picks one, fills in the arguments, the host wraps the call in JSON-RPC, your server runs it, the result flows back into the LLM's context.

Path B: expose it as an HTTP API and let the agent call it. Write the OpenAPI spec. Stand up the service. Hand the agent a shell tool or a Python tool, and let it curl or requests.post(...) the endpoints directly. This is how every other piece of software integrates with every other piece of software, and it has worked for thirty years.

Both paths lead somewhere. They do not lead to the same place. Teams keep picking one without fully seeing what they traded away.

The tool-calling surface

What it buys you is real:

Typed argument schemas per operation. The LLM receives a JSON Schema for every tool. The host validates arguments before dispatch. For narrow, well-shaped operations, this drops the error rate materially compared to asking the model to hand-roll a request body from a spec.
Discovery as a protocol feature. `tools/list` tells the client what exists. The agent doesn't have to be told — it asks.
Auditability at the host layer. Every tool call shows up in the client's UI as a discrete, user-visible event with its arguments. A user can watch, approve, or block specific operations. This is a product-grade experience, not a debugging affordance.
Auth isolation. Credentials live in the server process, not the model's context. Secrets never appear in tokens the LLM emits or consumes.
Cross-client portability. The same server works across every MCP-aware host. Write once, consume from many.

What it costs you, and this part is less advertised:

The LLM composes arguments in its output stream, token by token. Large payloads get expensive and unreliable. The protocol is built around "the model says the arguments."
Tools are peers, not composable primitives. One tool cannot invoke another. The LLM is the dispatcher, and composition happens only by the LLM deciding to call a second tool after seeing the result of the first. Scripts the agent writes cannot reach into the MCP surface from inside their execution.
The ceiling on data volume is a protocol ceiling. Anything that crosses the MCP channel is UTF-8 JSON-RPC. Binary or bulk data has to be base64'd or referenced out-of-band via shared storage. There is no mode within the channel that avoids this.

The HTTP surface

What it buys you is also real:

Universality. Every language has a client. Every agent framework has a way to make an HTTP call. No protocol adoption required — HTTP is the adoption.
Free composition. A Python tool the agent invokes can chain ten calls, branch on responses, handle retries, transform payloads — all inside one execution. The LLM emits code, not a sequence of dispatches. For agents that think in code (an emerging and powerful pattern), this is the native shape.
No ceiling imposed by the integration layer. Streaming responses, binary uploads, unusual content types — the HTTP stack already handles them. You don't have to design around a protocol's data model.
Observability infrastructure already exists. Every tool for watching HTTP traffic — proxies, tracing, logging, rate limiters — works unchanged. You're not starting over.

What it costs you:

No typed tool surface for the LLM. The model either reads the OpenAPI spec at runtime (token-expensive, inconsistent across sessions) or operates from documentation you wrote (staleness, drift). Argument validation happens at the server boundary, not before the call leaves the model.
No host-level audit. HTTP calls buried inside a shell or Python tool invocation are opaque to the client UI. Users see "the agent ran some Python," not "the agent called DELETE /invoices/42." Approval-per-call is not available.
Discovery is your problem. The agent either already knows the API or doesn't. There is no tools/list equivalent built in. If you want discovery, you build it.
Auth becomes the agent's problem. Tokens, refresh, secret injection — if the agent is making the HTTP call, it needs the credentials somewhere in its context or its environment. The boundary is less clean.

The deeper divide: what is an agent?

This is the question that actually sits underneath the surface choice, and most teams don't ask it out loud.

There are two working mental models, both with real products behind them:

Agent as LLM-plus-tools-plus-loop. The LLM is a decision-maker choosing among a small, curated menu of operations. Each turn, it picks one, executes it, observes the result, decides the next. The primitive is the tool call. This model excels at interactive, user-facing tasks where the user wants to see what the agent is about to do and has the option to intervene. Tool-calling surfaces fit this shape natively.

Agent as LLM-that-writes-code. The LLM is a programmer whose output artifact is an orchestration script. Capabilities are libraries. The "agent" is the code it emits plus the sandbox that runs it. This model excels at batch jobs, multi-step workflows, and anything where the LLM needs to loop, branch, or handle data at volume. HTTP APIs plus a code execution tool fit this shape natively.

Neither is a subset of the other. A coding agent that dispatches individual tool calls for each line of work is painfully slow and expensive. A user-facing agent that writes a Python script to edit someone's calendar has surrendered the approval loop that made the product tolerable in the first place.

How to actually decide

Stop framing it as "MCP vs REST" and frame it as "what shape of agent am I building, and who watches it work?"

Signals pointing toward a tool-calling surface:

End users interact with the agent turn by turn and want to see each action.
The API is small-to-medium (roughly 10–50 operations), and each operation is discrete.
Argument correctness matters and errors are expensive or irreversible.
The integration needs to work across multiple agent hosts.

Signals pointing toward an HTTP surface consumed by code:

The agent runs unattended, in a background or batch context.
Work involves composing many calls, handling large data, or branching on response content.
The API is already shipped for human consumers — an OpenAPI spec and an HTTP endpoint are the primary artifact.
Users interact with the outcome, not the sequence of calls.

And the answer for most non-trivial products is not one or the other. It is both. The same capability gets exposed twice — as a typed tool surface for the interactive dispatch model, and as an HTTP API for the code-generating model — from the same underlying implementation. The cost of doing both has been collapsing as tooling matures.

The unresolved part

Three things are genuinely unsettled, and the answers will shape which surface dominates over the next few years:

How good will LLMs get at reading an OpenAPI spec and writing correct orchestration code in one shot? If they get very good, the case for a typed tool surface weakens — the LLM becomes its own integration layer. If they plateau, typed surfaces remain the reliability floor.
How much value does host-level auditing hold as agents take on higher-stakes work?The "approve each call" UX is a tool-calling feature. If regulatory and trust pressures push agents toward reviewable action traces, tool-calling surfaces gain structural advantage. If agents move toward background autonomy, the auditing story matters less.
Does the MCP ecosystem reach the scale where adoption is self-reinforcing? Protocols succeed when the network effect tips. HTTP's network effect is already tipped. MCP's is forming. The next two years will tell.

The honest answer to "which surface should I build for?" is that both will exist, neither will win outright, and the interesting engineering question is which shape of agent you are actually trying to ship. Pick that first. The surface decision follows.

The tool-calling surface

The HTTP surface

The deeper divide: what is an agent?

How to actually decide

The unresolved part

Comments