RunInfra streams long-running operations (optimization sessions, chat completions, runbook execution) over Server-Sent Events (SSE). This page documents every event type the engine emits and how clients should handle them.Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
Transport
All SSE responses follow the standard wire format:| Header | Value |
|---|---|
Content-Type | text/event-stream; charset=utf-8 |
Cache-Control | no-cache, no-transform |
Connection | keep-alive |
X-Accel-Buffering | no (when behind a proxy) |
Chat completions stream
These events flow whenstream: true is set on POST /v1/chat/completions. The transport is OpenAI-compatible.
data (chunk)
Standard OpenAI streaming chunk. Each event delivers a choices[].delta partial token.
data: [DONE]
Sent as the final event when the stream is complete.
Optimization stream
These events flow when an optimization session is running. Subscribe via the chat panel’s SSE channel during a/start_optimization tool call, or directly against the optimization stream endpoint.
activity_step
A discrete tool step in the agent’s activity timeline. Used for tools that don’t render their own card (file edits, intake updates, status checks).
tool_card
A rich tool result that renders as its own card (kernel agent, quantization variants, KV cache, speculation, judge, serving, hardware, compatibility). The card type is in tool and the full payload is in result.
EXPERIMENT
Emitted at the end of an optimization session, one event per model variant tested. Each event carries the full benchmark result for that variant. Frontend persists these to rebuild the optimization summary on reload.
infra_log
Forwarded log lines from the engine and provisioning layer (Modal, RunPod). Used to render the live terminal in the chat panel during long-running profiling waits.
Heartbeats
The engine sends a comment-style heartbeat every 15 seconds while a session is idle (waiting on GPU profiling, optimization runs, or external systems):: is the SSE comment syntax. Clients should:
- Not parse the heartbeat as an event
- Reset their idle timer when one arrives
- Detect a stuck connection if three consecutive heartbeats are missed (45 s without traffic)
Reconnection
If the connection drops mid-session, the client should:- Re-issue the request to the same session/job id endpoint
- Pass
Last-Event-IDif the server emitted ids - The server replays events from that id forward; if it can’t, it replays from the beginning of the still-running step
Error events
If something fails mid-stream, the server emits anerror event and closes the connection:
retryable: true means the client can resubmit. retryable: false means the input is rejected (bad request, auth failure, plan limit).