Monitoring
Track requests, latency, cost, and errors for your deployed endpoints.
Once deployed, track everything from the Observe dashboard and the Usage page.
Observe dashboard
The Observe page shows real-time metrics for all your endpoints:
- Requests: Total count, success rate, error rate
- Latency: Average, P50, P95, P99
- Throughput: Requests per second
- Tokens: Input and output token counts
- Cost: Per-request and aggregate cost
Filter by time period (7d, 30d, 90d) and view per-endpoint breakdowns.
Usage & Credits
The Usage page shows:
- Daily cost chart over the last 30 days
- Token breakdown (input vs output)
- Request breakdown (success vs error)
- Cost by model (which models cost the most)
- Per-model table with request count, cost, and average latency
Deployments page
The Deployments page shows all your endpoints at a glance:
- Pipeline name and model
- GPU type and deployment mode
- Status (active, stopped, provisioning)
- Request count and cost (month-to-date)
- Quick actions (stop, start, configure)
Metrics retention
| Plan | Retention |
|---|---|
| Starter | 7 days |
| Pro | 90 days |
| Team | 1 year |
| Enterprise | Unlimited |
How is this guide?