Analytics & Cost Tracking

Open Model Prism tracks every gateway request and aggregates statistics at multiple levels: per-tenant, per-model, per-user, and per-day. All analytics are written asynchronously (fire-and-forget) so they never add latency to the gateway response.

Metrics Tracked Per Request

Field Description
tenantId Which tenant made the request
userId User identifier (from API key or explicit header)
model Actual model used (after routing resolution)
requestedModel Model requested by client (may be "auto" or an alias)
category Auto-routing category (if auto-routed)
inputTokens Prompt + context tokens
outputTokens Generated tokens
inputCost Input token cost (USD)
outputCost Output token cost (USD)
actualCost Total actual cost
baselineCost Cost at list price of the originally requested model
saved baselineCost − actualCost (reflects routing savings)
durationMs End-to-end request duration
status success or error
errorMessage Error details if status = error
autoRouted Whether auto-routing was used
contextFallback Whether a context overflow fallback occurred
streaming Whether the response was streamed

Cost Calculation

Costs are calculated using a three-level pricing lookup:

1. Tenant pricing override  (set per-model on the tenant)
   ↓ not found
2. pricingDefaults.js        (flat table with fuzzy model name matching)
   ↓ not found
3. modelRegistry.js          (inputPer1M / outputPer1M fields)
   ↓ not found
4. $0.00                     (unknown model — no cost tracked)

The baseline cost is calculated at the list price of the originally requested model (before routing). This allows the dashboard to show how much was saved by routing to a cheaper model.

Example:

Client requests:  model="auto" (resolves to gpt-4o-mini via routing)
Baseline model:   gpt-4o (default if client had specified a model)

inputTokens:  1 200
outputTokens: 340

actualCost:   1200/1M × $0.15  +  340/1M × $0.60  = $0.000384
baselineCost: 1200/1M × $2.50  +  340/1M × $10.00 = $0.006400
saved:        $0.006016  (94% savings)

Daily Aggregates

For dashboard performance, per-day aggregates are maintained in DailyStat. Updates are written with $inc operators — no read-modify-write cycles.

DailyStat {
  tenantId,
  date: "2026-04-01",
  requests:        1 240,
  autoRoutedCount:   980,
  inputTokens:   4 820 000,
  outputTokens:    920 000,
  actualCost:         18.42,
  baselineCost:       74.80,
  saved:              56.38,
}

Dashboard

The dashboard provides an overview across all tenants (admin/maintainer) or scoped to assigned tenants (tenant-viewer/tenant-admin).

KPI Cards

Card Value Subtitle
Total Cost Sum of actual cost Last N days
Savings vs Baseline Sum of saved Percentage of baseline
Requests Total request count % auto-routed (count)
Input Tokens Sum of input tokens Prompt / context tokens
Output Tokens Sum of output tokens Generated tokens
Total Tokens Input + Output Breakdown in subtitle

Charts

  • Cost Over Time — Area chart with two series: Actual Cost (filled blue) and Baseline Cost (filled grey)
  • Token Usage Over Time — Stacked bar chart: Input Tokens (orange) and Output Tokens (yellow)
  • Model Usage — Bar chart: requests per model over the selected period

Filters

  • Tenant selector — filter dashboard to a single tenant (all tenants by default)
  • Time range — Last 7 / 30 / 90 days

Request Log

The request log shows individual requests with full details. Available to admin, maintainer, and finops roles.

Filters: Tenant, Model, Status (all/success/error), Auto-routed, Date range.

Special indicators: context_fallback badge (orange, shows original→fallback model), auto_routed badge (shows category and confidence), error badge (red, with truncated error on hover).

Token Estimation

For requests where the provider does not return token counts (e.g. streaming), tokens are estimated offline:

estimatedTokens ≈ totalChars / 3.5

Adjustments: code blocks counted at 1 char/token; ~4 tokens overhead per message. The token service also performs a pre-flight check — if estimated tokens exceed the model's context window, the request is automatically upgraded to a larger-context model.

Prometheus Metrics

Available at /metrics (unauthenticated — firewall in production):

  • http_requests_total — by method, path, status
  • http_request_duration_seconds — histogram
  • omp_gateway_requests_total — by tenant, model, status
  • omp_gateway_tokens_total — input/output by tenant
  • omp_gateway_cost_total — actual cost by tenant
  • Node.js default metrics (event loop lag, heap, GC)