Monitoring

Requests: Total count, success rate, error rate
Latency: Average, P50, P95, P99
Throughput: Requests per second
Tokens: Input and output token counts
Cost: Per-request and aggregate cost

Track requests, latency, cost, and errors for your deployed endpoints.

Once deployed, track everything from the Observe dashboard and the Usage page.

Observe dashboard

The Observe page shows real-time metrics for all your endpoints:

Filter by time period (7d, 30d, 90d) and view per-endpoint breakdowns.

The Usage page shows:

The Deployments page shows all your endpoints at a glance:

How is this guide?