Application Metrics - DynamoGuard API Metrics Documentation
This document describes the Application Level OpenTelemetry metrics captured by the DynamoGuard API. These metrics are used to monitor the performance and latency of various guardrail operations.
Note: These metrics are collected by the opentelemetry-collector-application deployment and sent to OpenTelemetry, from which they can be exported to any desired backend (Prometheus in the DynamoAI package).
⚠️ Deprecation Notice: These Application Level metrics will be deprecated in the 3.25 minor release. As part of moving away from MongoDB for moderation logs and storing them in PostgreSQL DB instead, we are also removing Prometheus dependency from DynamoGuard. These metrics may be reintroduced in future releases.
Overview
The DynamoGuard API captures six key metrics:
Latency Metrics (Histograms):
- Input Guardrail Latency (
dynamoguard_input_guardrail_latency) - LLM Completion Latency (
dynamoguard_llm_completion_latency) - Output Guardrail Latency (
dynamoguard_output_guardrail_latency) - End-to-End Latency (
dynamoguard_e2e_latency)
Request Metrics (Counters):
- Successful Requests (
dynamoguard_successful_requests) - Failed Requests (
dynamoguard_failed_requests)
All latency metrics are measured in milliseconds (ms).
Understanding Latency Metrics
The four latency metrics measure different phases of a request lifecycle. Understanding their relationship helps you identify where time is being spent in your guardrail pipeline.
Request Flow Timeline
For a typical request to /moderation/model/:modelId/chat/:session_id, the metrics capture time in this sequence:
Request Start
│
├─► [Input Guardrail Latency] ──► Input analysis & policy evaluation
│
├─► [LLM Completion Latency] ────► LLM generates response (if input not blocked)
│
├─► [Output Guardrail Latency] ──► Output analysis & policy evaluation
│
└─► [End-to-End Latency] ────────► Total request time (includes all above + overhead)
Metric Comparison
| Metric | What It Measures | What It Excludes | When It's Captured |
|---|---|---|---|
| Input Guardrail Latency | Time to analyze and apply policies on user input | LLM processing, output analysis, request overhead | During input moderation phase |
| LLM Completion Latency | Time for LLM to generate response | Input guardrails, output guardrails, request overhead | Only when input is not blocked |
| Output Guardrail Latency | Time to analyze and apply policies on LLM output | Input guardrails, LLM processing, request overhead | During output moderation phase |
| End-to-End Latency | Total request time from start to finish | Nothing - includes everything | Always captured |
Key Relationships
- End-to-End Latency ≥ Input Guardrail Latency + LLM Completion Latency + Output Guardrail Latency + overhead
- The End-to-End Latency includes all processing phases plus any additional overhead (network, serialization, etc.)
- LLM Completion Latency is only measured when the input guardrail doesn't block the request
- For
/moderation/analyze, only Input or Output Guardrail Latency is captured (depending ontextType), not both
Metric Details
1. Input Guardrail Latency
Metric Name: dynamoguard_input_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model input (user prompts/messages).
What it measures:
- The duration from the start of content moderation until completion
- Includes policy evaluation, content analysis, and decision-making for input text
- Covers the entire content moderation process for input text
APIs where it's captured:
-
POST
/moderation/analyze- Condition: Only when analyzing model input (when
textTypeisMODEL_INPUT) - Measurement: Time from start of content moderation until completion
- Labels/Attributes: None
- Condition: Only when analyzing model input (when
-
POST
/moderation/model/:modelId/chat/:session_id- Condition: Always captured for input analysis
- Measurement: Time from start of input moderation (including RAG context retrieval for RAG models) until completion
- Labels/Attributes:
modelId(the ID of the model being used) - Note: Includes time for RAG context retrieval if the model is a custom-rag model
2. LLM Completion Latency
Metric Name: dynamoguard_llm_completion_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken for the LLM to generate a response after receiving the (potentially sanitized) input.
What it measures:
- The duration from initiating the LLM chat call until receiving the response
- Includes network latency to the LLM provider and model inference time
- Does NOT include input guardrail processing or output guardrail processing
APIs where it's captured:
- POST
/moderation/model/:modelId/chat/:session_id- Condition: Always captured when the input is not blocked
- Measurement: Time from initiating the LLM request until response is received
- Labels/Attributes:
modelId(the ID of the model being used) - Note: Only measured if input analysis does not result in a
BLOCKaction
3. Output Guardrail Latency
Metric Name: dynamoguard_output_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model output (LLM responses).
What it measures:
- The duration from the start of content moderation until completion
- Includes policy evaluation, content analysis, and decision-making for output text
- Covers the entire content moderation process for response text
APIs where it's captured:
- POST
/moderation/analyze- Condition: Only when analyzing model output (when
textTypeisMODEL_RESPONSE) - Measurement: Time from start of content moderation until completion
- Labels/Attributes: None
- Condition: Only when analyzing model output (when
Note: The output guardrail latency is NOT currently captured in the /moderation/model/:modelId/chat/:session_id endpoint, even though output analysis is performed. This is a known limitation in metrics collection.
4. End-to-End Latency
Metric Name: dynamoguard_e2e_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the total time taken for the entire request to complete, from when the request is received until the response is sent.
What it measures:
- The duration from when the request handler starts until the response is successfully sent
- Includes all processing: input guardrails, LLM completion (if applicable), output guardrails, and any other request processing
- Represents the complete user-facing latency for the API call
APIs where it's captured:
-
POST
/moderation/analyze- Condition: Always captured
- Measurement: Time from request handler start until response is sent
- Labels/Attributes:
modelId(extracted from request body if present)
-
POST
/moderation/model/:modelId/chat/:session_id- Condition: Always captured
- Measurement: Time from request handler start until response is sent
- Labels/Attributes:
modelId(the ID of the model being used, from URL parameter)
5. Successful Requests
Metric Name: dynamoguard_successful_requests
Type: Counter
Unit: count
Description: Counts the number of requests that completed successfully (without errors).
What it measures:
- Increments by 1 for each request that completes without throwing an error
- Captured at the interceptor level, so it includes all successful responses regardless of HTTP status code (as long as no exception was thrown)
APIs where it's captured:
-
POST
/moderation/analyze- Condition: Always captured for successful requests
- Labels/Attributes:
modelId(extracted from request body if present)
-
POST
/moderation/model/:modelId/chat/:session_id- Condition: Always captured for successful requests
- Labels/Attributes:
modelId(the ID of the model being used, from URL parameter)
6. Failed Requests
Metric Name: dynamoguard_failed_requests
Type: Counter
Unit: count
Description: Counts the number of requests that failed with an error.
What it measures:
- Increments by 1 for each request that throws an error or exception
- Captured at the interceptor level when an error occurs in the request handler
APIs where it's captured:
-
POST
/moderation/analyze- Condition: Always captured for failed requests
- Labels/Attributes:
modelId(extracted from request body if present)failureType(HTTP status code from error response, defaults to 500 if not available)
-
POST
/moderation/model/:modelId/chat/:session_id- Condition: Always captured for failed requests
- Labels/Attributes:
modelId(the ID of the model being used, from URL parameter)failureType(HTTP status code from error response, defaults to 500 if not available)
API Endpoint Summary
POST /moderation/analyze
- Purpose: Apply policies on messages (standalone analysis)
- Metrics Captured:
dynamoguard_input_guardrail_latency(when analyzing model input)dynamoguard_output_guardrail_latency(when analyzing model output)dynamoguard_e2e_latency(always)dynamoguard_successful_requests(for successful requests)dynamoguard_failed_requests(for failed requests)
- Labels/Attributes:
modelId(for e2e latency, successful requests, and failed requests - from request body if present)failureType(for failed requests only - HTTP status code)
POST /moderation/model/:modelId/chat/:session_id
- Purpose: Guardrailed chat endpoint that applies policies, sends to LLM, and analyzes response
- Metrics Captured:
dynamoguard_input_guardrail_latency(always, withmodelIdattribute)dynamoguard_llm_completion_latency(when input is not blocked, withmodelIdattribute)dynamoguard_output_guardrail_latency(NOT currently captured - see note above)dynamoguard_e2e_latency(always, withmodelIdattribute)dynamoguard_successful_requests(for successful requests, withmodelIdattribute)dynamoguard_failed_requests(for failed requests, withmodelIdandfailureTypeattributes)
- Labels/Attributes:
modelId(the ID of the model being used, from URL parameter)failureType(for failed requests only - HTTP status code)