spec

Ollama VPS Requirements: RAM Needed for Llama 3 8B, 13B & 70B (CPU Inference)

spec · Updated June 10, 2026

You can run Ollama on a CPU-only VPS — it’s slower than a GPU, but for chat, drafting, and automation it’s often fast enough and far cheaper. The constraint is RAM: the model has to fit in memory (plus overhead), and storage has to hold the weights.

These are estimates for 4-bit quantized models, not measured benchmarks. Real tokens/second numbers per provider are being measured — see methodology.

RAM and storage by model

ModelApprox. RAMDiskRealistic CPU VPS
Phi-3 / small (~3B)~6 GB~8 GB4 vCPU / 8 GB
7B (Mistral, Gemma)~8 GB~12 GB4 vCPU / 8–16 GB
Llama 3.1 8B~10 GB~16 GB4 vCPU / 16 GB
13B~16 GB~26 GB6 vCPU / 16–24 GB
70B~64 GB+~80 GB8+ vCPU / 64 GB (heavy, slow on CPU)

Rule of thumb: budget the model’s RAM plus ~2–4 GB for the OS and Ollama, and expect single-digit-to-low-teens tokens/second on CPU for 7–8B models.

Which VPS fits

Size your exact model and concurrency, and compare live prices, with the VPS cost calculator. Running Ollama next to n8n? See the combined sizing in the calculator’s “n8n + Ollama” workload.

← All guides · Size it with the calculator →