Application Metrics - DynamoGuard API Metrics Documentation

This document describes the Application Level OpenTelemetry metrics captured by the DynamoGuard API. These metrics are used to monitor the performance and latency of various guardrail operations.

Note: These metrics are collected by the opentelemetry-collector-application deployment and sent to OpenTelemetry, from which they can be exported to any desired backend (Prometheus in the DynamoAI package).

⚠️ Deprecation Notice: These Application Level metrics will be deprecated in the 3.25 minor release. As part of moving away from MongoDB for moderation logs and storing them in PostgreSQL DB instead, we are also removing Prometheus dependency from DynamoGuard. These metrics may be reintroduced in future releases.

Overview

The DynamoGuard API captures six key metrics:

Latency Metrics (Histograms):

Input Guardrail Latency (dynamoguard_input_guardrail_latency)
LLM Completion Latency (dynamoguard_llm_completion_latency)
Output Guardrail Latency (dynamoguard_output_guardrail_latency)
End-to-End Latency (dynamoguard_e2e_latency)

Request Metrics (Counters):

Successful Requests (dynamoguard_successful_requests)
Failed Requests (dynamoguard_failed_requests)

All latency metrics are measured in milliseconds (ms).

Understanding Latency Metrics

The four latency metrics measure different phases of a request lifecycle. Understanding their relationship helps you identify where time is being spent in your guardrail pipeline.

Request Flow Timeline

For a typical request to /moderation/model/:modelId/chat/:session_id, the metrics capture time in this sequence:

Request Start
    │
    ├─► [Input Guardrail Latency] ──► Input analysis & policy evaluation
    │
    ├─► [LLM Completion Latency] ────► LLM generates response (if input not blocked)
    │
    ├─► [Output Guardrail Latency] ──► Output analysis & policy evaluation
    │
    └─► [End-to-End Latency] ────────► Total request time (includes all above + overhead)

Metric Comparison

Metric	What It Measures	What It Excludes	When It's Captured
Input Guardrail Latency	Time to analyze and apply policies on user input	LLM processing, output analysis, request overhead	During input moderation phase
LLM Completion Latency	Time for LLM to generate response	Input guardrails, output guardrails, request overhead	Only when input is not blocked
Output Guardrail Latency	Time to analyze and apply policies on LLM output	Input guardrails, LLM processing, request overhead	During output moderation phase
End-to-End Latency	Total request time from start to finish	Nothing - includes everything	Always captured

Key Relationships

End-to-End Latency ≥ Input Guardrail Latency + LLM Completion Latency + Output Guardrail Latency + overhead
The End-to-End Latency includes all processing phases plus any additional overhead (network, serialization, etc.)
LLM Completion Latency is only measured when the input guardrail doesn't block the request
For /moderation/analyze, only Input or Output Guardrail Latency is captured (depending on textType), not both

Metric Details

1. Input Guardrail Latency

Metric Name: dynamoguard_input_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model input (user prompts/messages).

What it measures:

The duration from the start of content moderation until completion
Includes policy evaluation, content analysis, and decision-making for input text
Covers the entire content moderation process for input text

APIs where it's captured:

POST /moderation/analyze
- Condition: Only when analyzing model input (when textType is MODEL_INPUT)
- Measurement: Time from start of content moderation until completion
- Labels/Attributes: None
POST /moderation/model/:modelId/chat/:session_id
- Condition: Always captured for input analysis
- Measurement: Time from start of input moderation (including RAG context retrieval for RAG models) until completion
- Labels/Attributes: modelId (the ID of the model being used)
- Note: Includes time for RAG context retrieval if the model is a custom-rag model

2. LLM Completion Latency

Metric Name: dynamoguard_llm_completion_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken for the LLM to generate a response after receiving the (potentially sanitized) input.

What it measures:

The duration from initiating the LLM chat call until receiving the response
Includes network latency to the LLM provider and model inference time
Does NOT include input guardrail processing or output guardrail processing

APIs where it's captured:

POST /moderation/model/:modelId/chat/:session_id
- Condition: Always captured when the input is not blocked
- Measurement: Time from initiating the LLM request until response is received
- Labels/Attributes: modelId (the ID of the model being used)
- Note: Only measured if input analysis does not result in a BLOCK action

3. Output Guardrail Latency

Metric Name: dynamoguard_output_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model output (LLM responses).

What it measures:

The duration from the start of content moderation until completion
Includes policy evaluation, content analysis, and decision-making for output text
Covers the entire content moderation process for response text

APIs where it's captured:

POST /moderation/analyze
- Condition: Only when analyzing model output (when textType is MODEL_RESPONSE)
- Measurement: Time from start of content moderation until completion
- Labels/Attributes: None

Note: The output guardrail latency is NOT currently captured in the /moderation/model/:modelId/chat/:session_id endpoint, even though output analysis is performed. This is a known limitation in metrics collection.

4. End-to-End Latency

Metric Name: dynamoguard_e2e_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the total time taken for the entire request to complete, from when the request is received until the response is sent.

What it measures:

The duration from when the request handler starts until the response is successfully sent
Includes all processing: input guardrails, LLM completion (if applicable), output guardrails, and any other request processing
Represents the complete user-facing latency for the API call

APIs where it's captured:

POST /moderation/analyze
- Condition: Always captured
- Measurement: Time from request handler start until response is sent
- Labels/Attributes: modelId (extracted from request body if present)
POST /moderation/model/:modelId/chat/:session_id
- Condition: Always captured
- Measurement: Time from request handler start until response is sent
- Labels/Attributes: modelId (the ID of the model being used, from URL parameter)

5. Successful Requests

Metric Name: dynamoguard_successful_requests
Type: Counter
Unit: count
Description: Counts the number of requests that completed successfully (without errors).

What it measures:

Increments by 1 for each request that completes without throwing an error
Captured at the interceptor level, so it includes all successful responses regardless of HTTP status code (as long as no exception was thrown)

APIs where it's captured:

POST /moderation/analyze
- Condition: Always captured for successful requests
- Labels/Attributes: modelId (extracted from request body if present)
POST /moderation/model/:modelId/chat/:session_id
- Condition: Always captured for successful requests
- Labels/Attributes: modelId (the ID of the model being used, from URL parameter)

6. Failed Requests

Metric Name: dynamoguard_failed_requests
Type: Counter
Unit: count
Description: Counts the number of requests that failed with an error.

What it measures:

Increments by 1 for each request that throws an error or exception
Captured at the interceptor level when an error occurs in the request handler

APIs where it's captured:

POST /moderation/analyze
- Condition: Always captured for failed requests
- Labels/Attributes:
  - modelId (extracted from request body if present)
  - failureType (HTTP status code from error response, defaults to 500 if not available)
POST /moderation/model/:modelId/chat/:session_id
- Condition: Always captured for failed requests
- Labels/Attributes:
  - modelId (the ID of the model being used, from URL parameter)
  - failureType (HTTP status code from error response, defaults to 500 if not available)

API Endpoint Summary

POST `/moderation/analyze`

Purpose: Apply policies on messages (standalone analysis)
Metrics Captured:
- dynamoguard_input_guardrail_latency (when analyzing model input)
- dynamoguard_output_guardrail_latency (when analyzing model output)
- dynamoguard_e2e_latency (always)
- dynamoguard_successful_requests (for successful requests)
- dynamoguard_failed_requests (for failed requests)
Labels/Attributes:
- modelId (for e2e latency, successful requests, and failed requests - from request body if present)
- failureType (for failed requests only - HTTP status code)

POST `/moderation/model/:modelId/chat/:session_id`

Purpose: Guardrailed chat endpoint that applies policies, sends to LLM, and analyzes response
Metrics Captured:
- dynamoguard_input_guardrail_latency (always, with modelId attribute)
- dynamoguard_llm_completion_latency (when input is not blocked, with modelId attribute)
- dynamoguard_output_guardrail_latency (NOT currently captured - see note above)
- dynamoguard_e2e_latency (always, with modelId attribute)
- dynamoguard_successful_requests (for successful requests, with modelId attribute)
- dynamoguard_failed_requests (for failed requests, with modelId and failureType attributes)
Labels/Attributes:
- modelId (the ID of the model being used, from URL parameter)
- failureType (for failed requests only - HTTP status code)

Overview​

Understanding Latency Metrics​

Request Flow Timeline​

Metric Comparison​

Key Relationships​

Metric Details​

1. Input Guardrail Latency​

2. LLM Completion Latency​

3. Output Guardrail Latency​

4. End-to-End Latency​

5. Successful Requests​

6. Failed Requests​

API Endpoint Summary​

POST /moderation/analyze​

POST /moderation/model/:modelId/chat/:session_id​

Overview

Understanding Latency Metrics

Request Flow Timeline

Metric Comparison

Key Relationships

Metric Details

1. Input Guardrail Latency

2. LLM Completion Latency

3. Output Guardrail Latency

4. End-to-End Latency

5. Successful Requests

6. Failed Requests

API Endpoint Summary

POST `/moderation/analyze`

POST `/moderation/model/:modelId/chat/:session_id`