Skip to main content

Overview

Metrics provide a way of monitoring and understanding behavior in aggregate. The DynamoAI API provides two types of metrics:

  1. System Level Metrics - API captures request statistics and sends these metrics to OpenTelemetry (Available from release 3.24.0)
  2. Application Level Metrics - Currently includes DynamoGuard metrics covering latency metrics

Architecture Overview

The metrics collection architecture follows a pipeline pattern that enables flexibility in metric storage and analysis:

┌─────────────┐
│ DynamoAI │
│ API │
└──────┬──────┘

│ Sends metrics via OTLP

├─────────────────────────────┐
│ │
▼ ▼
┌─────────────────────┐ ┌─────────────────────────────┐
│ opentelemetry- │ │ opentelemetry-collector- │
│ collector │ │ application │
│ (System Metrics) │ │ (Application Metrics) │
└──────┬──────────────┘ └──────┬──────────────────────┘
│ │
│ │
└──────────┬───────────────┘

│ Exports metrics


┌─────────────────┐
│ Prometheus │
│ (Storage) │
└─────────────────┘

Flow Description

  1. API Layer: The DynamoAI API captures metrics at two levels:

    • System Metrics: Automatically captured at the HTTP level (request/response statistics, errors, latency)
    • Application Metrics: Explicitly sent by the application code (DynamoGuard latency metrics)
  2. OpenTelemetry Collectors: Metrics are sent via OTLP (OpenTelemetry Protocol) to two separate collector deployments:

    • opentelemetry-collector: Receives and processes System Level Metrics
    • opentelemetry-collector-application: Receives and processes Application Level Metrics
  3. Storage Backend: The OpenTelemetry collectors export metrics to Prometheus (when Metrics Storage is enabled in the DynamoAI package). Prometheus acts as the time-series database for metric storage and querying.

  4. Visualization: Grafana (shipped with DynamoAI package) connects to Prometheus as a data source to provide dashboards and visualizations.

This architecture provides flexibility - customers can configure the OpenTelemetry collectors to export metrics to their own backends (e.g., their own Prometheus instance, Datadog, New Relic, etc.) in addition to or instead of the DynamoAI-provided Prometheus instance.

System Metrics

Note: System Metrics were introduced in release 3.24.0.

System Level Metrics help in determining the health of the system, including:

  • API response time
  • Error rates
  • Request/response statistics

These metrics are captured at the HTTP level and provide insights into API performance and reliability.

Application Metrics

Application Level Metrics capture data around what's happening at the application level. Currently, this includes:

  • DynamoGuard latency metrics for input guardrails, LLM completion, and output guardrails

How does DynamoAI capture these metrics?

DynamoAI uses OpenTelemetry collectors as an intermediary layer to collect these metrics. This architecture allows customers to ship metrics to their own backends as well.

There are two separate OpenTelemetry collector deployments in the Kubernetes cluster:

Collector DeploymentMetric LevelPurpose
opentelemetry-collectorSystem MetricsCaptures the system level metrics regarding request stats by API
opentelemetry-collector-applicationApplication MetricsCaptures the metrics sent by the application to deliver it to the appropriate backend

Where does DynamoAI store these metrics?

All metrics are sent to OpenTelemetry first, from which they can be exported to any desired backend.

  • If you've opted for Metrics Storage from DynamoAI, you're shipped with a Prometheus instance as part of the DynamoAI package
  • All metrics (both System and Application level) collected are sent to the Prometheus instance deployed on the Kubernetes cluster