Documentation Index
Fetch the complete documentation index at: https://runinfra.ai/docs/llms.txt
Use this file to discover all available pages before exploring further.
LangChain’s ChatOpenAI speaks the OpenAI REST API. Point it at RunInfra by overriding the base URL and key.
Install
pip install langchain-openai
Chat model
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="default",
openai_api_base="https://api.runinfra.ai/v1",
openai_api_key="YOUR_RUNINFRA_API_KEY",
)
response = llm.invoke("What is RunInfra?")
print(response.content)
Embeddings
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="default",
openai_api_base="https://api.runinfra.ai/v1",
openai_api_key="YOUR_RUNINFRA_API_KEY",
)
vectors = embeddings.embed_documents(["Hello", "World"])
Streaming
for chunk in llm.stream("Tell me a short story"):
print(chunk.content, end="", flush=True)
from langchain.agents import create_openai_tools_agent, AgentExecutor
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"{city}: 21C, partly cloudy"
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful weather assistant."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_tools_agent(llm, [get_weather], prompt)
executor = AgentExecutor(agent=agent, tools=[get_weather])
result = executor.invoke({"input": "What's the weather in Paris?"})
print(result["output"])
RAG with LangChain + RunInfra embeddings
from langchain_community.vectorstores import FAISS
from langchain.chains import RetrievalQA
store = FAISS.from_texts(
["RunInfra cold starts under 2 seconds.", "RunInfra serves Llama, Qwen, Mistral..."],
embeddings,
)
qa = RetrievalQA.from_chain_type(llm=llm, retriever=store.as_retriever(k=2))
print(qa.invoke({"query": "How fast are cold starts?"})["result"])
Known gotchas
model="default" routes to your pipeline’s verified active deployment. For multi-model pipelines, pass the alias you configured in chat.
- Streaming callbacks (
StreamingStdOutCallbackHandler) work unchanged.
- LangChain retries use exponential backoff. Pair with
max_retries=3 and let the library handle 429s.
Next steps
LlamaIndex
Same OpenAI-base pattern for LlamaIndex.
OpenAI compatibility
The underlying contract.
RAG cookbook
Runnable end-to-end RAG example.
Tool calling cookbook
Raw OpenAI tool loop (no framework).