Analytics & Cost Tracking

Open Model Prism tracks every gateway request and aggregates statistics at multiple levels: per-tenant, per-model, per-user, and per-day. All analytics are written asynchronously (fire-and-forget) so they never add latency to the gateway response.

Metrics Tracked Per Request

Field	Description
`tenantId`	Which tenant made the request
`userId`	User identifier (from API key or explicit header)
`model`	Actual model used (after routing resolution)
`requestedModel`	Model requested by client (may be "auto" or an alias)
`category`	Auto-routing category (if auto-routed)
`inputTokens`	Prompt + context tokens
`outputTokens`	Generated tokens
`inputCost`	Input token cost (USD)
`outputCost`	Output token cost (USD)
`actualCost`	Total actual cost
`baselineCost`	Cost at list price of the originally requested model
`saved`	baselineCost − actualCost (reflects routing savings)
`durationMs`	End-to-end request duration
`status`	success or error
`errorMessage`	Error details if status = error
`autoRouted`	Whether auto-routing was used
`contextFallback`	Whether a context overflow fallback occurred
`streaming`	Whether the response was streamed

Cost Calculation

Costs are calculated using a three-level pricing lookup:

1. Tenant pricing override  (set per-model on the tenant)
   ↓ not found
2. pricingDefaults.js        (flat table with fuzzy model name matching)
   ↓ not found
3. modelRegistry.js          (inputPer1M / outputPer1M fields)
   ↓ not found
4. $0.00                     (unknown model — no cost tracked)

The baseline cost is calculated at the list price of the originally requested model (before routing). This allows the dashboard to show how much was saved by routing to a cheaper model.

Example:

Client requests:  model="auto" (resolves to gpt-4o-mini via routing)
Baseline model:   gpt-4o (default if client had specified a model)

inputTokens:  1 200
outputTokens: 340

actualCost:   1200/1M × $0.15  +  340/1M × $0.60  = $0.000384
baselineCost: 1200/1M × $2.50  +  340/1M × $10.00 = $0.006400
saved:        $0.006016  (94% savings)

Daily Aggregates

For dashboard performance, per-day aggregates are maintained in DailyStat. Updates are written with $inc operators — no read-modify-write cycles.

DailyStat {
  tenantId,
  date: "2026-04-01",
  requests:        1 240,
  autoRoutedCount:   980,
  inputTokens:   4 820 000,
  outputTokens:    920 000,
  actualCost:         18.42,
  baselineCost:       74.80,
  saved:              56.38,
}

Dashboard

The dashboard provides an overview across all tenants (admin/maintainer) or scoped to assigned tenants (tenant-viewer/tenant-admin).

KPI Cards

Card	Value	Subtitle
Total Cost	Sum of actual cost	Last N days
Savings vs Baseline	Sum of saved	Percentage of baseline
Requests	Total request count	% auto-routed (count)
Input Tokens	Sum of input tokens	Prompt / context tokens
Output Tokens	Sum of output tokens	Generated tokens
Total Tokens	Input + Output	Breakdown in subtitle

Charts

Cost Over Time — Area chart with two series: Actual Cost (filled blue) and Baseline Cost (filled grey)
Token Usage Over Time — Stacked bar chart: Input Tokens (orange) and Output Tokens (yellow)
Model Usage — Bar chart: requests per model over the selected period

Filters

Tenant selector — filter dashboard to a single tenant (all tenants by default)
Time range — Last 7 / 30 / 90 days

Request Log

The request log shows individual requests with full details. Available to admin, maintainer, and finops roles.

Filters: Tenant, Model, Status (all/success/error), Auto-routed, Date range.

Special indicators: context_fallback badge (orange, shows original→fallback model), auto_routed badge (shows category and confidence), error badge (red, with truncated error on hover).

Token Estimation

For requests where the provider does not return token counts (e.g. streaming), tokens are estimated offline:

estimatedTokens ≈ totalChars / 3.5

Adjustments: code blocks counted at 1 char/token; ~4 tokens overhead per message. The token service also performs a pre-flight check — if estimated tokens exceed the model's context window, the request is automatically upgraded to a larger-context model.

Prometheus Metrics

Available at /metrics (unauthenticated — firewall in production):

http_requests_total — by method, path, status
http_request_duration_seconds — histogram
omp_gateway_requests_total — by tenant, model, status
omp_gateway_tokens_total — input/output by tenant
omp_gateway_cost_total — actual cost by tenant
Node.js default metrics (event loop lag, heap, GC)