Skip to main content

Documentation Index

Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

LangChain’s ChatOpenAI speaks the OpenAI REST API. Point it at RunInfra by overriding the base URL and key.

Install

pip install langchain-openai

Chat model

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="default",
    openai_api_base="https://api.runinfra.ai/v1",
    openai_api_key="YOUR_RUNINFRA_API_KEY",
)

response = llm.invoke("What is RunInfra?")
print(response.content)

Embeddings

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="default",
    openai_api_base="https://api.runinfra.ai/v1",
    openai_api_key="YOUR_RUNINFRA_API_KEY",
)

vectors = embeddings.embed_documents(["Hello", "World"])

Streaming

for chunk in llm.stream("Tell me a short story"):
    print(chunk.content, end="", flush=True)

Tool calling via LangChain agents

from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"{city}: 21C, partly cloudy"

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful weather assistant."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_openai_tools_agent(llm, [get_weather], prompt)
executor = AgentExecutor(agent=agent, tools=[get_weather])
result = executor.invoke({"input": "What's the weather in Paris?"})
print(result["output"])

RAG with LangChain + RunInfra embeddings

from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA

store = FAISS.from_texts(
    ["RunInfra cold starts under 2 seconds.", "RunInfra serves Llama, Qwen, Mistral..."],
    embeddings,
)
qa = RetrievalQA.from_chain_type(llm=llm, retriever=store.as_retriever(k=2))
print(qa.invoke({"query": "How fast are cold starts?"})["result"])

Known gotchas

  • model="default" routes to your pipeline’s verified active deployment. For multi-model pipelines, pass the alias you configured in chat.
  • Streaming callbacks (StreamingStdOutCallbackHandler) work unchanged.
  • LangChain retries use exponential backoff. Pair with max_retries=3 and let the library handle 429s.

Next steps

LlamaIndex

Same OpenAI-base pattern for LlamaIndex.

OpenAI compatibility

The underlying contract.

RAG cookbook

Runnable end-to-end RAG example.

Tool calling cookbook

Raw OpenAI tool loop (no framework).