Policy Outputs

Privacy (PII)

redacted_text - Text with specified entity/regex/blocked types redacted
redacted_entities - Dict mapping each redacted type to a dict mapping each uniquely redacted entity to a list of the entities it replaced
redacted_entity_positions - List of tuples containing unique redacted entity and the span positions in the plaintext it refers to.

Example:

{
    "redacted_entities": {
        "LOC": {
            "<LOC_1>": [
                "US"
            ]
        }
    },
    "redacted_entity_positions": [
        [
            "<LOC_1>",
            19,
            26
        ]
    ],
    "redacted_text": "Who is the current <LOC_1> president?"
}

Toxicity

classification - Classifcation (safe or unsafe)
reason - Reason

Hallucination

avg_entailment_probability - Entailment probabiltiy. Higher is better

RAG Hallucination

retrieval_relevance - Probability representing how relevant the retrieved context is to the user prompt. Higher is better
response_faithfulness - Probability representing how relevant the model response is to the user prompt. Higher is better
response_relevance - Probability representing how relevant the model response is to the retrieved context. Higher is better

Content (Alignment)

guard_classification - Classification (safe/unsafe) the guardrail model gave to the query
guard_rationale - Rationale for classification
violated - Boolean indicating whether this policy was violated

Privacy (PII)​

Toxicity​

Hallucination​

RAG Hallucination​

Content (Alignment)​

Privacy (PII)

Toxicity

Hallucination

RAG Hallucination

Content (Alignment)