Architecture
System Overview
Open Model Prism is a Node.js/Express backend that serves both the REST API and the compiled React frontend as static files. MongoDB is the only external dependency. It can run as a single container or split into a Control Plane and one or more Worker pods for horizontal scaling.
Single-Pod (default)
┌─────────────────────────────────────────────────────────────┐
│ Browser / Client Tools │
│ (OpenWebUI · Cursor · Continue · Claude Code · SDK · ...) │
└──────────────────┬──────────────────────────────────────────┘
│ HTTP / SSE
▼
┌─────────────────────────────────────────────────────────────┐
│ Open Model Prism (Node.js 20, Express v5) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │
│ │ Admin UI │ │ Gateway API │ │ Static Frontend │ │
│ │ /api/prism/ │ │ /api/:slug │ │ React + Mantine │ │
│ └──────┬───────┘ └──────┬───────┘ └──────────────────┘ │
│ │ │ │
│ ┌──────▼─────────────────▼──────────────────────────────┐ │
│ │ Services │ │
│ │ routerEngine · analyticsEngine · pricingService │ │
│ │ tokenService · modelEnrichmentService │ │
│ └──────────────────────────┬────────────────────────────┘ │
│ │ │
│ ┌──────────────────────────▼────────────────────────────┐ │
│ │ Provider Adapters │ │
│ │ OpenAI-compat · Bedrock · Azure · Ollama │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────┬──────────────────────────────────┘
│
┌────────────────┼────────────────┐
▼ ▼ ▼
MongoDB LLM Providers models.dev
(data store) (OpenAI, Anthropic, (enrichment,
Bedrock, Ollama...) optional) Scaled Deployment (Control Plane + Workers)
For larger teams, the gateway can be split from the Admin UI and scaled independently using NODE_ROLE:
Clients (Continue · Cursor · Claude Code · Open WebUI · SDK ...)
│
├── /api/:tenant/v1/* ──────────────────────────┐
│ │
└── /* (admin UI, /api/prism/admin/*) ──────┐ │
│ │
┌────────────────────────▼─┐ │
│ Control Plane │ │
│ NODE_ROLE=control │ │
│ Admin UI + Admin API │ │
└──────────────────────────┘ │
│
┌───────────────┬──────────────────────────▼──┐
│ Worker Pod 1 │ Worker Pod 2 │ ...
│ NODE_ROLE= │ NODE_ROLE=worker │
│ worker │ Gateway only │
└───────┬───────┴──────────────────────┘
│
MongoDB (shared state: tenants, providers,
routing rules, analytics, pod metrics) Deployment Modes
Three modes are available via the NODE_ROLE environment variable:
NODE_ROLE | Serves | Use case |
|---|---|---|
full (default) | Admin UI + Gateway | Development, small teams (<50 users) |
control | Admin UI + Admin API only | Control plane in a scaled deployment |
worker | Gateway only (/api/:tenant/v1/*) | Horizontally scaled request handling |
All pods share a single MongoDB database. Worker pods are stateless and can be added or removed without downtime. See Operations for capacity planning and configuration details.
Request Flow (Gateway)
Every client request hitting /api/:tenant/v1/... passes through these stages:
POST /api/team-alpha/v1/chat/completions
│
├─ 1. Tenant Auth
│ Look up tenant by slug -> find API key hash -> verify Bearer token
│ Check key enabled, not expired
│ Load tenant config (providers, routing, model aliases) — cached 60s
│
├─ 2. Per-Tenant Rate Limiting
│ Sliding window in-memory counter (requestsPerMinute per tenant)
│ Returns 429 on breach
│
├─ 3. Model Alias Resolution
│ Resolve tenant-specific aliases e.g. "gpt-4" -> "gpt-4o"
│
├─ 4. Force-Auto-Route Check
│ If tenant has forceAutoRoute=true, override requested model with "auto"
│
├─ 5. Auto-Routing (only if model = "auto")
│ Signal extraction -> override rules -> [LLM classifier if needed]
│ -> category + confidence + target model ID
│ See: routing docs
│
├─ 6. Provider Selection
│ Find provider that has the target model
│ Select provider adapter (OpenAI-compat / Bedrock / Azure / Ollama)
│
├─ 7. Context Pre-flight
│ Estimate total token count (tokenService)
│ If count > model.contextWindow -> findLargerContextModel -> swap model
│
├─ 8. Provider Adapter Call
│ Forward request (non-streaming: await response)
│ (streaming: pipe SSE chunks directly to client)
│
├─ 9. Context Overflow Retry (non-streaming only)
│ If provider returns context-overflow error -> retry with larger model
│ Adds context_fallback field to response
│
├─ 10. Response Enrichment
│ Inject cost_info, auto_routing, context_fallback into response body
│
└─ 11. Async Analytics Logging (fire-and-forget via setImmediate)
Write RequestLog, update DailyStat, update DailyUserStat Directory Structure
open-model-prism/
├── server/ Express backend (ESM, Node 20+)
│ ├── index.js Entry point: routes, middleware, graceful shutdown
│ ├── config.js Env-var config with defaults
│ ├── db.js Mongoose connection + retry logic
│ │
│ ├── models/ Mongoose schemas
│ │ ├── Tenant.js slug, apiKeyHash, providers, routing, modelConfig
│ │ ├── Provider.js type, encrypted credentials, discoveredModels
│ │ ├── User.js roles, LDAP flag, tenants, lastLogin
│ │ ├── RoutingCategory.js key, costTier, defaultModel, examples
│ │ ├── RequestLog.js per-request log: tokens, cost, category, model
│ │ └── DailyStat.js daily aggregates: cost, tokens, requests
│ │
│ ├── routes/
│ │ ├── admin/ Provider, tenant, category, dashboard, user CRUD
│ │ ├── gateway/ OpenAI-compat proxy (/api/:tenant/v1/*)
│ │ ├── auth.js Login, JWT issue, /me
│ │ ├── setup.js First-run setup wizard endpoints
│ │ ├── tenant-portal.js Self-service API for tenant-admin role
│ │ └── metrics.js Prometheus /metrics
│ │
│ ├── middleware/ auth, RBAC, setup check
│ ├── services/ routerEngine, analyticsEngine, pricingService, ...
│ ├── providers/ OpenAI-compat, Bedrock, Azure, Ollama adapters
│ ├── data/ modelRegistry, presetProfiles, pricingDefaults
│ └── utils/ encryption, logger, sanitize
│
├── frontend/src/ React + Mantine admin UI
│ ├── App.jsx App shell, routing, nav
│ └── pages/ Dashboard, Providers, Tenants, Models, ...
│
└── docs/ Documentation Authentication & Security
Admin UI (JWT)
All /api/prism/admin/* and /api/prism/auth/* endpoints use JWT Bearer tokens:
POST /api/prism/auth/login -> { token: "eyJ..." }
GET /api/prism/admin/... -> Authorization: Bearer eyJ... Tokens are signed with JWT_SECRET, contain { id, username, role, tenants }, and have no built-in expiry (server restart invalidates all tokens).
Gateway (Per-Tenant API Key)
omp-<64 hex chars> (example: omp-a3f9b2...) The key is shown only once on creation. The stored value is SHA-256(key). Checks performed on every request:
- Extract Bearer token → compute SHA-256 → lookup tenant
keyEnabled === false→ 401key_disabledkeyExpiresAt < now→ 401key_expired- Tenant
enabled === false→ 401tenant_disabled
Credential Encryption
Provider API keys and secrets are encrypted at rest using AES-256-GCM:
Format: enc:<ivHex>:<authTagHex>:<ciphertextHex>
Key: ENCRYPTION_KEY env var (32-byte hex) RBAC Roles
| Role | Scope | Key permissions |
|---|---|---|
admin | Global | Everything — users, LDAP, providers, tenants, categories |
maintainer | Global | Providers, tenants, categories, model registry, analytics |
finops | Global | Read-only: dashboard, costs, request log (all tenants) |
tenant-viewer | Assigned tenants | Read-only: dashboard scoped to own tenants |
tenant-admin | Assigned tenants | Self-service: model access config, generate client configs |
Role checks are applied in middleware before route handlers. Admin/maintainer bypass all tenant-scoping checks.
Offline Mode
Set OFFLINE=true to disable all outbound internet calls:
modelEnrichmentServiceloadsdata/modelsDev.snapshot.jsoninstead of fetching models.dev- Token estimation uses the built-in character-based heuristic (no external tokenizer)
- All routing, gateway, and admin functionality works fully offline
No other behavior changes. Suitable for air-gapped enterprise environments.
Rate Limiting
Three independent limiters are applied at different layers:
| Layer | Limit | Scope |
|---|---|---|
| Auth endpoints | 20 req/min | Per IP — brute-force protection |
| Admin API | 300 req/min | Per IP |
| Gateway | 600 req/min | Per IP (outer layer) |
| Per-tenant gateway | Configurable req/min | Per tenant (sliding window, in-memory) |
Analytics Pipeline
Request analytics are written asynchronously (fire-and-forget via setImmediate) to avoid adding latency to the gateway response path:
Gateway response sent to client
│
└─ setImmediate -> analyticsEngine.logRequest()
│
├─ upsert RequestLog document
├─ $inc DailyStat (cost, tokens, requests)
└─ $inc DailyUserStat (per user breakdown) If the analytics write fails, the gateway request is unaffected.