You can run Ollama on a CPU-only VPS — it’s slower than a GPU, but for chat, drafting, and automation it’s often fast enough and far cheaper. The constraint is RAM: the model has to fit in memory (plus overhead), and storage has to hold the weights.
These are estimates for 4-bit quantized models, not measured benchmarks. Real tokens/second numbers per provider are being measured — see methodology.
RAM and storage by model
| Model | Approx. RAM | Disk | Realistic CPU VPS |
|---|---|---|---|
| Phi-3 / small (~3B) | ~6 GB | ~8 GB | 4 vCPU / 8 GB |
| 7B (Mistral, Gemma) | ~8 GB | ~12 GB | 4 vCPU / 8–16 GB |
| Llama 3.1 8B | ~10 GB | ~16 GB | 4 vCPU / 16 GB |
| 13B | ~16 GB | ~26 GB | 6 vCPU / 16–24 GB |
| 70B | ~64 GB+ | ~80 GB | 8+ vCPU / 64 GB (heavy, slow on CPU) |
Rule of thumb: budget the model’s RAM plus ~2–4 GB for the OS and Ollama, and expect single-digit-to-low-teens tokens/second on CPU for 7–8B models.
Which VPS fits
- 7B–8B models: a 16 GB box is the sweet spot — Hetzner CPX31 (~$12, editorial pick) or Contabo VPS M (16 GB, ~$11). Both comfortably hold an 8B model with room to work.
- 13B: go to 24–32 GB (Contabo VPS L / Hetzner CPX41).
- 70B on CPU: technically possible on a 64 GB box, but slow — only worth it for batch jobs, not interactive chat. A GPU instance is usually better value at that size.
Size your exact model and concurrency, and compare live prices, with the VPS cost calculator. Running Ollama next to n8n? See the combined sizing in the calculator’s “n8n + Ollama” workload.