RunInfra/Docs
GuideChangelog
Sign inGet started
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog
Documentation
Introduction
Welcome to RunInfraQuickstartPlans and PricingFAQ
Prompting
Prompting Best PracticesExample PromptsDebugging Prompts
Features
OptimizationDeploymentMonitoringModelsGPU and Pricing
Tips & Tricks
From Idea to PipelineTroubleshooting
Changelog

Models

Any Hugging Face model, plus custom uploads.

RunInfra currently supports large language models (LLMs) from Hugging Face. Thousands of text generation models work out of the box. Just give the agent a model name or Hugging Face ID and it handles the rest.

Vision models, speech-to-text, text-to-speech, image generation, and embedding models are coming soon. The same chat-driven workflow will apply to the full AI stack.

Browse popular models at Models or ask the agent:

Find a good 7B model for code generation

Popular model families

These are some of the most commonly used models on RunInfra, but you're not limited to this list:

FamilySizesGood for
Llama 3.1/3.2/3.3/4 (Meta)1B-405BGeneral purpose, chat, reasoning
Qwen 2.5 (Alibaba)0.5B-72BMultilingual, code, math
Mistral / Mixtral (Mistral AI)7B-123BInstruction following, code
DeepSeekV2, V3, R1Long context, reasoning
Gemma 2 (Google)2B-27BLightweight, edge deployment
Phi-3/Phi-4 (Microsoft)3.8B-14BSmall, fast, cost-effective
CohereCommand-R/R+RAG, enterprise search

Any transformer-based LLM on Hugging Face works. If the agent detects a compatibility issue with a specific model, it tells you before consuming any GPU time.

Don't know which model to use? Describe your use case and the agent recommends one:

I need a cheap, fast model for simple Q&A. What do you suggest?

Selecting models

By name

Use Llama 3.1 8B for this pipeline

With @ mention

Optimize @Qwen-2.5-14B with FP8

By Hugging Face ID

Deploy microsoft/Phi-3-mini-4k-instruct optimized for latency

Token pricing by model size

Estimated starting rates. Actual cost depends on your full pipeline configuration.

SizeInput (from)Output (from)
Small (1-8B)$0.08/MTok$0.20/MTok
Medium (8-30B)$0.20/MTok$0.80/MTok
Large (30-70B)$0.45/MTok$1.50/MTok
XL (70B+)$0.80/MTok$2.50/MTok

See GPU and Pricing for details on what affects your cost.

Custom model uploads

Custom model uploads require Team plan or higher.

Upload your own fine-tuned models at Models. Supported formats: SafeTensors, PyTorch, GGUF, ONNX. Max 50GB.

Uploaded models go through the same optimization pipeline as catalog models. Use them in chat just like any other model:

Use my uploaded model "fine-tuned-llama-7b" for this pipeline

How is this guide?

PreviousGPU and PricingNextMonitoring

On this page

Popular model familiesSelecting modelsBy nameWith @ mentionBy Hugging Face IDToken pricing by model sizeCustom model uploads