How AgentWarden Works
AgentWarden uses one policy model across build-time review and runtime enforcement.
Build-time review helps teams understand what an agent can reach. Runtime integrations then send AgentWarden normalized events at the points where prompts, tools, tool outputs, or final responses cross a security boundary.
The Core Flow
Runtime event -> signal -> flag -> policy -> effect
| Step | Meaning |
|---|---|
| Runtime event | A prompt, tool request, tool output, or final response reaches a checkpoint. |
| Signal | AgentWarden inspects the event for evidence, such as private data, external communication, prompt injection, or PII. |
| Flag | A signal emits a named fact that policy can match. |
| Policy | A reviewed rule decides what should happen for that use case. |
| Effect | AgentWarden returns the decision for the host application or runtime to enforce. |
Runtime Boundaries
AgentWarden protects boundaries, not frameworks. A boundary is a point where data, instructions, or side effects can move from one trust zone into another.
| Boundary | Why it matters |
|---|---|
| Prompt input | Evaluates user input before the model receives it. |
| Tool request | Evaluates an action before the tool executes or changes state. |
| Tool output | Evaluates tool results before they become model context. |
| Final response | Evaluates the final answer before it reaches the user. |
Prompt filtering alone is not enough. Unsafe instructions can arrive later, inside a tool result. Side effects also need a checkpoint before execution, because blocking the final response is too late if the tool already sent data, wrote to a system, or triggered an external action.
Pre-Action and Content Decisions
AgentWarden has two kinds of runtime decisions.
- Pre-action decisions: run before a prompt or tool action continues. They can allow, block, request approval, or return a safe response.
- Content decisions: run after content is produced but before it moves onward. They can allow, replace, block with feedback, or add context.
The host still owns the agent loop. AgentWarden evaluates the event, returns a policy decision, and records the evidence needed for review.
Build-Time and Runtime Work Together
Build-time review provides the evidence used to define reviewed policy: tool inventory, capability labels, use-case context, and observed trajectories. Runtime enforcement applies the reviewed policy and records what happened.
That creates a feedback loop:
Review tools and trajectories -> approve policy -> enforce at runtime -> use runtime evidence for future review
The same model works whether events come from an SDK inside a custom agent or from hooks inside a supported coding-agent runtime.
Next, see Build-Time Flow for how evaluation evidence becomes reviewed runtime policy.