/v1/responses is a chat-completions compatibility adapter for LLM and vision-language deployments. The gateway converts supported input and instructions fields into chat messages, forwards the request through the same serving path as /v1/chat/completions, then wraps the result in a Responses-shaped envelope.
Minimal request
Streaming
Setstream: true to receive server-sent events. The stream uses OpenAI-shaped Responses event names for text deltas and terminal completion events when the selected deployment supports streaming.
Supported request fields
Model id returned by
GET /v1/models, or default for a pipeline-scoped snippet.Prompt text, or an array of supported Responses input message objects.
System-level instruction text. The adapter maps this into a system message.
Return a server-sent event stream instead of one JSON response.
Maximum generated output tokens. The adapter maps this to the serving backend’s chat completion token limit.
Sampling temperature for compatible LLM deployments.
Nucleus sampling cutoff for compatible LLM deployments.
Structured output format passed through to the chat-completions serving path. Support depends on the selected deployment.
OpenAI chat-completions tool definitions accepted as pass-through input for compatible deployments. This adapter does not execute hosted tools or manage a stateful tool loop.
Tool-selection preference passed through to compatible chat-completions deployments.
Not shipped on this adapter
- Stateful response retrieval or deletion.
include,reasoning, hosted tools, conversation items, file search, web search, computer use, and background jobs.- Stateful tool execution or hosted tool orchestration. Use Chat completions for production tool loops.
Retry semantics
The native RunInfra SDK treats non-streamingresponses.create() as replay-safe only when you provide an idempotency key. Streaming Responses requests are sent once because a partial stream may already have reached your app.
Next steps
Chat completions
The canonical endpoint for tools, structured output, and streaming chat.
RunInfra SDK
Native request IDs, typed errors, idempotency helpers, and streaming wrappers.