Models
Any Hugging Face model, plus custom uploads.
RunInfra currently supports large language models (LLMs) from Hugging Face. Thousands of text generation models work out of the box. Just give the agent a model name or Hugging Face ID and it handles the rest.
Vision models, speech-to-text, text-to-speech, image generation, and embedding models are coming soon. The same chat-driven workflow will apply to the full AI stack.
Browse popular models at Models or ask the agent:
Find a good 7B model for code generationPopular model families
These are some of the most commonly used models on RunInfra, but you're not limited to this list:
| Family | Sizes | Good for |
|---|---|---|
| Llama 3.1/3.2/3.3/4 (Meta) | 1B-405B | General purpose, chat, reasoning |
| Qwen 2.5 (Alibaba) | 0.5B-72B | Multilingual, code, math |
| Mistral / Mixtral (Mistral AI) | 7B-123B | Instruction following, code |
| DeepSeek | V2, V3, R1 | Long context, reasoning |
| Gemma 2 (Google) | 2B-27B | Lightweight, edge deployment |
| Phi-3/Phi-4 (Microsoft) | 3.8B-14B | Small, fast, cost-effective |
| Cohere | Command-R/R+ | RAG, enterprise search |
Any transformer-based LLM on Hugging Face works. If the agent detects a compatibility issue with a specific model, it tells you before consuming any GPU time.
Don't know which model to use? Describe your use case and the agent recommends one:
I need a cheap, fast model for simple Q&A. What do you suggest?Selecting models
By name
Use Llama 3.1 8B for this pipelineWith @ mention
Optimize @Qwen-2.5-14B with FP8By Hugging Face ID
Deploy microsoft/Phi-3-mini-4k-instruct optimized for latencyToken pricing by model size
Estimated starting rates. Actual cost depends on your full pipeline configuration.
| Size | Input (from) | Output (from) |
|---|---|---|
| Small (1-8B) | $0.08/MTok | $0.20/MTok |
| Medium (8-30B) | $0.20/MTok | $0.80/MTok |
| Large (30-70B) | $0.45/MTok | $1.50/MTok |
| XL (70B+) | $0.80/MTok | $2.50/MTok |
See GPU and Pricing for details on what affects your cost.
Custom model uploads
Custom model uploads require Team plan or higher.
Upload your own fine-tuned models at Models. Supported formats: SafeTensors, PyTorch, GGUF, ONNX. Max 50GB.
Uploaded models go through the same optimization pipeline as catalog models. Use them in chat just like any other model:
Use my uploaded model "fine-tuned-llama-7b" for this pipelineHow is this guide?