How AgentWarden Works

AgentWarden uses one policy model across build-time review and runtime enforcement.

Build-time review helps teams understand what an agent can reach. Runtime integrations then send AgentWarden normalized events at the points where prompts, tools, tool outputs, or final responses cross a security boundary.

The Core Flow

Runtime event -> signal -> flag -> policy -> effect

Step	Meaning
Runtime event	A prompt, tool request, tool output, or final response reaches a checkpoint.
Signal	AgentWarden inspects the event for evidence, such as private data, external communication, prompt injection, or PII.
Flag	A signal emits a named fact that policy can match.
Policy	A reviewed rule decides what should happen for that use case.
Effect	AgentWarden returns the decision for the host application or runtime to enforce.

Runtime Boundaries

AgentWarden protects boundaries, not frameworks. A boundary is a point where data, instructions, or side effects can move from one trust zone into another.

Boundary	Why it matters
Prompt input	Evaluates user input before the model receives it.
Tool request	Evaluates an action before the tool executes or changes state.
Tool output	Evaluates tool results before they become model context.
Final response	Evaluates the final answer before it reaches the user.

Prompt filtering alone is not enough. Unsafe instructions can arrive later, inside a tool result. Side effects also need a checkpoint before execution, because blocking the final response is too late if the tool already sent data, wrote to a system, or triggered an external action.

Pre-Action and Content Decisions

AgentWarden has two kinds of runtime decisions.

Pre-action decisions: run before a prompt or tool action continues. They can allow, block, request approval, or return a safe response.
Content decisions: run after content is produced but before it moves onward. They can allow, replace, block with feedback, or add context.

The host still owns the agent loop. AgentWarden evaluates the event, returns a policy decision, and records the evidence needed for review.

Build-Time and Runtime Work Together

Build-time review provides the evidence used to define reviewed policy: tool inventory, capability labels, use-case context, and observed trajectories. Runtime enforcement applies the reviewed policy and records what happened.

That creates a feedback loop:

Review tools and trajectories -> approve policy -> enforce at runtime -> use runtime evidence for future review

The same model works whether events come from an SDK inside a custom agent or from hooks inside a supported coding-agent runtime.

Next, see Build-Time Flow for how evaluation evidence becomes reviewed runtime policy.

The Core Flow​

Runtime Boundaries​

Pre-Action and Content Decisions​

Build-Time and Runtime Work Together​

The Core Flow

Runtime Boundaries

Pre-Action and Content Decisions

Build-Time and Runtime Work Together