RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Monitoring

Track requests, latency, cost, and errors for your deployed endpoints.

Once deployed, track everything from the Observe dashboard and the Usage page.

Observe dashboard

The Observe page shows real-time metrics for all your endpoints:

  • Requests: Total count, success rate, error rate
  • Latency: Average, P50, P95, P99
  • Throughput: Requests per second
  • Tokens: Input and output token counts
  • Cost: Per-request and aggregate cost

Filter by time period (7d, 30d, 90d) and view per-endpoint breakdowns.

Usage & Credits

The Usage page shows:

  • Daily cost chart over the last 30 days
  • Token breakdown (input vs output)
  • Request breakdown (success vs error)
  • Cost by model (which models cost the most)
  • Per-model table with request count, cost, and average latency

Deployments page

The Deployments page shows all your endpoints at a glance:

  • Pipeline name and model
  • GPU type and deployment mode
  • Status (active, stopped, provisioning)
  • Request count and cost (month-to-date)
  • Quick actions (stop, start, configure)

Metrics retention

PlanRetention
Starter7 days
Pro90 days
Team1 year
EnterpriseUnlimited

How is this guide?

PreviousModelsNextOptimization

On this page

Observe dashboardUsage & CreditsDeployments pageMetrics retention