Skip to main content

How AgentWarden Works

AgentWarden uses one policy model across build-time review and runtime enforcement.

Build-time review helps teams understand what an agent can reach. Runtime integrations then send AgentWarden normalized events at the points where prompts, tools, tool outputs, or final responses cross a security boundary.

The Core Flow

Runtime event -> signal -> flag -> policy -> effect

StepMeaning
Runtime eventA prompt, tool request, tool output, or final response reaches a checkpoint.
SignalAgentWarden inspects the event for evidence, such as private data, external communication, prompt injection, or PII.
FlagA signal emits a named fact that policy can match.
PolicyA reviewed rule decides what should happen for that use case.
EffectAgentWarden returns the decision for the host application or runtime to enforce.

Runtime Boundaries

AgentWarden protects boundaries, not frameworks. A boundary is a point where data, instructions, or side effects can move from one trust zone into another.

BoundaryWhy it matters
Prompt inputEvaluates user input before the model receives it.
Tool requestEvaluates an action before the tool executes or changes state.
Tool outputEvaluates tool results before they become model context.
Final responseEvaluates the final answer before it reaches the user.

Prompt filtering alone is not enough. Unsafe instructions can arrive later, inside a tool result. Side effects also need a checkpoint before execution, because blocking the final response is too late if the tool already sent data, wrote to a system, or triggered an external action.

Pre-Action and Content Decisions

AgentWarden has two kinds of runtime decisions.

  • Pre-action decisions: run before a prompt or tool action continues. They can allow, block, request approval, or return a safe response.
  • Content decisions: run after content is produced but before it moves onward. They can allow, replace, block with feedback, or add context.

The host still owns the agent loop. AgentWarden evaluates the event, returns a policy decision, and records the evidence needed for review.

Build-Time and Runtime Work Together

Build-time review provides the evidence used to define reviewed policy: tool inventory, capability labels, use-case context, and observed trajectories. Runtime enforcement applies the reviewed policy and records what happened.

That creates a feedback loop:

Review tools and trajectories -> approve policy -> enforce at runtime -> use runtime evidence for future review

The same model works whether events come from an SDK inside a custom agent or from hooks inside a supported coding-agent runtime.

Next, see Build-Time Flow for how evaluation evidence becomes reviewed runtime policy.