RAGFlow GPU vs CPU: Full Explanation (2025 Edition)

Why Does RAGFlow Still Need a GPU Even When Using Ollama?

You are absolutely right to ask this question.

Most people configure RAGFlow to use external services (Ollama, Xinference, vLLM, OpenAI, etc.) for:

  • Embedding model

  • Re-ranking model

  • Inference LLM (the actual answer generator)

Ollama runs these models entirely on your GPU — RAGFlow only sends HTTP requests. So why does the official documentation and community still strongly recommend the ragflow-gpu image (or setting DEVICE=gpu)?

The answer is simple: Deep Document Understanding (DeepDoc) — the part that happens before any embedding or LLM call.

Complete RAGFlow Pipeline (with GPU usage marked)

Step

Component

Runs inside RAGFlow?

Uses GPU when?

  1. Document Upload

DeepDoc parser

Yes (core of RAGFlow)

YES — heavily

  1. Chunking

Text splitting

Yes

No (pure CPU)

  1. Embedding

Sentence-Transformers, BGE…

External (Ollama, etc.)

GPU via Ollama/vLLM

  1. Vector storage

Elasticsearch / InfiniFlow DB

Yes

No

  1. Retrieval

Vector + keyword search

Yes

No

  1. Re-ranking

Cross-encoder (optional)

External or local

GPU via external service

  1. Answer generation

LLM (Llama 3, Qwen2, etc.)

External (Ollama, etc.)

GPU via Ollama/vLLM

→ The only part that RAGFlow itself accelerates with GPU is Step 1 — but it is by far the most compute-intensive for real-world documents.

What DeepDoc Actually Does (and Why GPU Makes It 5–20× Faster)

When you upload a PDF, scanned image, or complex report, DeepDoc performs these AI-heavy tasks:

  1. Layout Detection Detects columns, headers, footers, reading order using CNN-based models (LayoutLM-style).

  2. Table Structure Recognition (TSR) Identifies table boundaries, row/column spans, merged cells — extremely important for accurate retrieval.

  3. Formula & Math Recognition Converts LaTeX/math images into readable text.

  4. Enhanced OCR For scanned PDFs: runs deep-learning OCR models (not just Tesseract CPU).

  5. Visual Language Model Tasks (charts, diagrams, screenshots) Optionally calls lightweight VLMs (Qwen2-VL, LLaVA, etc.) to describe images inside the document.

All of these run inside RAGFlow’s deepdoc module using PyTorch + CUDA when DEVICE=gpu is enabled.

Real-World Performance Numbers

When Do You Actually Need ragflow-gpu?

YES – You need it if your documents contain any of the following: - Scanned pages (images instead of selectable text) - Complex tables or financial reports - Charts, graphs, screenshots - Mixed layouts (multi-column, sidebars, footnotes) - Handwritten notes or formulas

NO – You can stay on ragflow-cpu if: - All documents are clean, born-digital text (Word → PDF, Markdown, etc.) - You only do quick prototypes with a few simple files - You have no NVIDIA GPU available

Monitoring & Verification

During document upload:

watch -n 1 nvidia-smi

You will see: - RAGFlow container using 4–12 GB VRAM during parsing - Ollama container using VRAM only when embedding or generatinging

Conclusion

  • ragflow-cpu → fine for clean text only

  • ragflow-gpu → mandatory for real-world unstructured documents

Even if Ollama handles your LLM and embeddings perfectly, without ragflow-gpu, the very first step (understanding the document) will be painfully slow or inaccurate.

Use DEVICE=gpu — it’s the difference between a toy and a true enterprise-grade RAG system.