Skip to main content

Application Metrics - DynamoGuard API Metrics Documentation

This document describes the Application Level OpenTelemetry metrics captured by the DynamoGuard API. These metrics are used to monitor the performance and latency of various guardrail operations.

Note: These metrics are collected by the opentelemetry-collector-application deployment and sent to OpenTelemetry, from which they can be exported to any desired backend (Prometheus in the DynamoAI package).

⚠️ Deprecation Notice: These Application Level metrics will be deprecated in the 3.25 minor release. As part of moving away from MongoDB for moderation logs and storing them in PostgreSQL DB instead, we are also removing Prometheus dependency from DynamoGuard. These metrics may be reintroduced in future releases.

Overview

The DynamoGuard API captures six key metrics:

Latency Metrics (Histograms):

  1. Input Guardrail Latency (dynamoguard_input_guardrail_latency)
  2. LLM Completion Latency (dynamoguard_llm_completion_latency)
  3. Output Guardrail Latency (dynamoguard_output_guardrail_latency)
  4. End-to-End Latency (dynamoguard_e2e_latency)

Request Metrics (Counters):

  1. Successful Requests (dynamoguard_successful_requests)
  2. Failed Requests (dynamoguard_failed_requests)

All latency metrics are measured in milliseconds (ms).

Understanding Latency Metrics

The four latency metrics measure different phases of a request lifecycle. Understanding their relationship helps you identify where time is being spent in your guardrail pipeline.

Request Flow Timeline

For a typical request to /moderation/model/:modelId/chat/:session_id, the metrics capture time in this sequence:

Request Start

├─► [Input Guardrail Latency] ──► Input analysis & policy evaluation

├─► [LLM Completion Latency] ────► LLM generates response (if input not blocked)

├─► [Output Guardrail Latency] ──► Output analysis & policy evaluation

└─► [End-to-End Latency] ────────► Total request time (includes all above + overhead)

Metric Comparison

MetricWhat It MeasuresWhat It ExcludesWhen It's Captured
Input Guardrail LatencyTime to analyze and apply policies on user inputLLM processing, output analysis, request overheadDuring input moderation phase
LLM Completion LatencyTime for LLM to generate responseInput guardrails, output guardrails, request overheadOnly when input is not blocked
Output Guardrail LatencyTime to analyze and apply policies on LLM outputInput guardrails, LLM processing, request overheadDuring output moderation phase
End-to-End LatencyTotal request time from start to finishNothing - includes everythingAlways captured

Key Relationships

  • End-to-End Latency ≥ Input Guardrail Latency + LLM Completion Latency + Output Guardrail Latency + overhead
  • The End-to-End Latency includes all processing phases plus any additional overhead (network, serialization, etc.)
  • LLM Completion Latency is only measured when the input guardrail doesn't block the request
  • For /moderation/analyze, only Input or Output Guardrail Latency is captured (depending on textType), not both

Metric Details

1. Input Guardrail Latency

Metric Name: dynamoguard_input_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model input (user prompts/messages).

What it measures:

  • The duration from the start of content moderation until completion
  • Includes policy evaluation, content analysis, and decision-making for input text
  • Covers the entire content moderation process for input text

APIs where it's captured:

  1. POST /moderation/analyze

    • Condition: Only when analyzing model input (when textType is MODEL_INPUT)
    • Measurement: Time from start of content moderation until completion
    • Labels/Attributes: None
  2. POST /moderation/model/:modelId/chat/:session_id

    • Condition: Always captured for input analysis
    • Measurement: Time from start of input moderation (including RAG context retrieval for RAG models) until completion
    • Labels/Attributes: modelId (the ID of the model being used)
    • Note: Includes time for RAG context retrieval if the model is a custom-rag model

2. LLM Completion Latency

Metric Name: dynamoguard_llm_completion_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken for the LLM to generate a response after receiving the (potentially sanitized) input.

What it measures:

  • The duration from initiating the LLM chat call until receiving the response
  • Includes network latency to the LLM provider and model inference time
  • Does NOT include input guardrail processing or output guardrail processing

APIs where it's captured:

  1. POST /moderation/model/:modelId/chat/:session_id
    • Condition: Always captured when the input is not blocked
    • Measurement: Time from initiating the LLM request until response is received
    • Labels/Attributes: modelId (the ID of the model being used)
    • Note: Only measured if input analysis does not result in a BLOCK action

3. Output Guardrail Latency

Metric Name: dynamoguard_output_guardrail_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the time taken to analyze and apply guardrail policies on model output (LLM responses).

What it measures:

  • The duration from the start of content moderation until completion
  • Includes policy evaluation, content analysis, and decision-making for output text
  • Covers the entire content moderation process for response text

APIs where it's captured:

  1. POST /moderation/analyze
    • Condition: Only when analyzing model output (when textType is MODEL_RESPONSE)
    • Measurement: Time from start of content moderation until completion
    • Labels/Attributes: None

Note: The output guardrail latency is NOT currently captured in the /moderation/model/:modelId/chat/:session_id endpoint, even though output analysis is performed. This is a known limitation in metrics collection.


4. End-to-End Latency

Metric Name: dynamoguard_e2e_latency
Type: Histogram
Unit: milliseconds (ms)
Description: Measures the total time taken for the entire request to complete, from when the request is received until the response is sent.

What it measures:

  • The duration from when the request handler starts until the response is successfully sent
  • Includes all processing: input guardrails, LLM completion (if applicable), output guardrails, and any other request processing
  • Represents the complete user-facing latency for the API call

APIs where it's captured:

  1. POST /moderation/analyze

    • Condition: Always captured
    • Measurement: Time from request handler start until response is sent
    • Labels/Attributes: modelId (extracted from request body if present)
  2. POST /moderation/model/:modelId/chat/:session_id

    • Condition: Always captured
    • Measurement: Time from request handler start until response is sent
    • Labels/Attributes: modelId (the ID of the model being used, from URL parameter)

5. Successful Requests

Metric Name: dynamoguard_successful_requests
Type: Counter
Unit: count
Description: Counts the number of requests that completed successfully (without errors).

What it measures:

  • Increments by 1 for each request that completes without throwing an error
  • Captured at the interceptor level, so it includes all successful responses regardless of HTTP status code (as long as no exception was thrown)

APIs where it's captured:

  1. POST /moderation/analyze

    • Condition: Always captured for successful requests
    • Labels/Attributes: modelId (extracted from request body if present)
  2. POST /moderation/model/:modelId/chat/:session_id

    • Condition: Always captured for successful requests
    • Labels/Attributes: modelId (the ID of the model being used, from URL parameter)

6. Failed Requests

Metric Name: dynamoguard_failed_requests
Type: Counter
Unit: count
Description: Counts the number of requests that failed with an error.

What it measures:

  • Increments by 1 for each request that throws an error or exception
  • Captured at the interceptor level when an error occurs in the request handler

APIs where it's captured:

  1. POST /moderation/analyze

    • Condition: Always captured for failed requests
    • Labels/Attributes:
      • modelId (extracted from request body if present)
      • failureType (HTTP status code from error response, defaults to 500 if not available)
  2. POST /moderation/model/:modelId/chat/:session_id

    • Condition: Always captured for failed requests
    • Labels/Attributes:
      • modelId (the ID of the model being used, from URL parameter)
      • failureType (HTTP status code from error response, defaults to 500 if not available)

API Endpoint Summary

POST /moderation/analyze

  • Purpose: Apply policies on messages (standalone analysis)
  • Metrics Captured:
    • dynamoguard_input_guardrail_latency (when analyzing model input)
    • dynamoguard_output_guardrail_latency (when analyzing model output)
    • dynamoguard_e2e_latency (always)
    • dynamoguard_successful_requests (for successful requests)
    • dynamoguard_failed_requests (for failed requests)
  • Labels/Attributes:
    • modelId (for e2e latency, successful requests, and failed requests - from request body if present)
    • failureType (for failed requests only - HTTP status code)

POST /moderation/model/:modelId/chat/:session_id

  • Purpose: Guardrailed chat endpoint that applies policies, sends to LLM, and analyzes response
  • Metrics Captured:
    • dynamoguard_input_guardrail_latency (always, with modelId attribute)
    • dynamoguard_llm_completion_latency (when input is not blocked, with modelId attribute)
    • dynamoguard_output_guardrail_latency (NOT currently captured - see note above)
    • dynamoguard_e2e_latency (always, with modelId attribute)
    • dynamoguard_successful_requests (for successful requests, with modelId attribute)
    • dynamoguard_failed_requests (for failed requests, with modelId and failureType attributes)
  • Labels/Attributes:
    • modelId (the ID of the model being used, from URL parameter)
    • failureType (for failed requests only - HTTP status code)