Policy Outputs
Privacy (PII)
redacted_text- Text with specified entity/regex/blocked types redactedredacted_entities- Dict mapping each redacted type to a dict mapping each uniquely redacted entity to a list of the entities it replacedredacted_entity_positions- List of tuples containing unique redacted entity and the span positions in the plaintext it refers to.
Example:
{
"redacted_entities": {
"LOC": {
"<LOC_1>": [
"US"
]
}
},
"redacted_entity_positions": [
[
"<LOC_1>",
19,
26
]
],
"redacted_text": "Who is the current <LOC_1> president?"
}
Toxicity
classification- Classifcation (safe or unsafe)reason- Reason
Hallucination
avg_entailment_probability- Entailment probabiltiy. Higher is better
RAG Hallucination
retrieval_relevance- Probability representing how relevant the retrieved context is to the user prompt. Higher is betterresponse_faithfulness- Probability representing how relevant the model response is to the user prompt. Higher is betterresponse_relevance- Probability representing how relevant the model response is to the retrieved context. Higher is better
Content (Alignment)
guard_classification- Classification (safe/unsafe) the guardrail model gave to the queryguard_rationale- Rationale for classificationviolated- Boolean indicating whether this policy was violated