Analytics & Cost Tracking
Open Model Prism tracks every gateway request and aggregates statistics at multiple levels: per-tenant, per-model, per-user, and per-day. All analytics are written asynchronously (fire-and-forget) so they never add latency to the gateway response.
Metrics Tracked Per Request
| Field | Description |
|---|---|
tenantId | Which tenant made the request |
userId | User identifier (from API key or explicit header) |
model | Actual model used (after routing resolution) |
requestedModel | Model requested by client (may be "auto" or an alias) |
category | Auto-routing category (if auto-routed) |
inputTokens | Prompt + context tokens |
outputTokens | Generated tokens |
inputCost | Input token cost (USD) |
outputCost | Output token cost (USD) |
actualCost | Total actual cost |
baselineCost | Cost at list price of the originally requested model |
saved | baselineCost − actualCost (reflects routing savings) |
durationMs | End-to-end request duration |
status | success or error |
errorMessage | Error details if status = error |
autoRouted | Whether auto-routing was used |
contextFallback | Whether a context overflow fallback occurred |
streaming | Whether the response was streamed |
Cost Calculation
Costs are calculated using a three-level pricing lookup:
1. Tenant pricing override (set per-model on the tenant)
↓ not found
2. pricingDefaults.js (flat table with fuzzy model name matching)
↓ not found
3. modelRegistry.js (inputPer1M / outputPer1M fields)
↓ not found
4. $0.00 (unknown model — no cost tracked) The baseline cost is calculated at the list price of the originally requested model (before routing). This allows the dashboard to show how much was saved by routing to a cheaper model.
Example:
Client requests: model="auto" (resolves to gpt-4o-mini via routing)
Baseline model: gpt-4o (default if client had specified a model)
inputTokens: 1 200
outputTokens: 340
actualCost: 1200/1M × $0.15 + 340/1M × $0.60 = $0.000384
baselineCost: 1200/1M × $2.50 + 340/1M × $10.00 = $0.006400
saved: $0.006016 (94% savings) Daily Aggregates
For dashboard performance, per-day aggregates are maintained in DailyStat. Updates are written with $inc operators — no read-modify-write cycles.
DailyStat {
tenantId,
date: "2026-04-01",
requests: 1 240,
autoRoutedCount: 980,
inputTokens: 4 820 000,
outputTokens: 920 000,
actualCost: 18.42,
baselineCost: 74.80,
saved: 56.38,
} Dashboard
The dashboard provides an overview across all tenants (admin/maintainer) or scoped to assigned tenants (tenant-viewer/tenant-admin).
KPI Cards
| Card | Value | Subtitle |
|---|---|---|
| Total Cost | Sum of actual cost | Last N days |
| Savings vs Baseline | Sum of saved | Percentage of baseline |
| Requests | Total request count | % auto-routed (count) |
| Input Tokens | Sum of input tokens | Prompt / context tokens |
| Output Tokens | Sum of output tokens | Generated tokens |
| Total Tokens | Input + Output | Breakdown in subtitle |
Charts
- Cost Over Time — Area chart with two series: Actual Cost (filled blue) and Baseline Cost (filled grey)
- Token Usage Over Time — Stacked bar chart: Input Tokens (orange) and Output Tokens (yellow)
- Model Usage — Bar chart: requests per model over the selected period
Filters
- Tenant selector — filter dashboard to a single tenant (all tenants by default)
- Time range — Last 7 / 30 / 90 days
Request Log
The request log shows individual requests with full details. Available to admin, maintainer, and finops roles.
Filters: Tenant, Model, Status (all/success/error), Auto-routed, Date range.
Special indicators: context_fallback badge (orange, shows original→fallback model), auto_routed badge (shows category and confidence), error badge (red, with truncated error on hover).
Token Estimation
For requests where the provider does not return token counts (e.g. streaming), tokens are estimated offline:
estimatedTokens ≈ totalChars / 3.5 Adjustments: code blocks counted at 1 char/token; ~4 tokens overhead per message. The token service also performs a pre-flight check — if estimated tokens exceed the model's context window, the request is automatically upgraded to a larger-context model.
Prometheus Metrics
Available at /metrics (unauthenticated — firewall in production):
http_requests_total— by method, path, statushttp_request_duration_seconds— histogramomp_gateway_requests_total— by tenant, model, statusomp_gateway_tokens_total— input/output by tenantomp_gateway_cost_total— actual cost by tenant- Node.js default metrics (event loop lag, heap, GC)