RAGFlow GPU vs CPU: Full Explanation (2025 Edition)
Why Does RAGFlow Still Need a GPU Even When Using Ollama?
You are absolutely right to ask this question.
Most people configure RAGFlow to use external services (Ollama, Xinference, vLLM, OpenAI, etc.) for:
Embedding model
Re-ranking model
Inference LLM (the actual answer generator)
Ollama runs these models entirely on your GPU — RAGFlow only sends HTTP requests.
So why does the official documentation and community still strongly recommend the ragflow-gpu image (or setting DEVICE=gpu)?
The answer is simple: Deep Document Understanding (DeepDoc) — the part that happens before any embedding or LLM call.
Complete RAGFlow Pipeline (with GPU usage marked)
Step |
Component |
Runs inside RAGFlow? |
Uses GPU when? |
|---|---|---|---|
|
DeepDoc parser |
Yes (core of RAGFlow) |
YES — heavily |
|
Text splitting |
Yes |
No (pure CPU) |
|
Sentence-Transformers, BGE… |
External (Ollama, etc.) |
GPU via Ollama/vLLM |
|
Elasticsearch / InfiniFlow DB |
Yes |
No |
|
Vector + keyword search |
Yes |
No |
|
Cross-encoder (optional) |
External or local |
GPU via external service |
|
LLM (Llama 3, Qwen2, etc.) |
External (Ollama, etc.) |
GPU via Ollama/vLLM |
→ The only part that RAGFlow itself accelerates with GPU is Step 1 — but it is by far the most compute-intensive for real-world documents.
What DeepDoc Actually Does (and Why GPU Makes It 5–20× Faster)
When you upload a PDF, scanned image, or complex report, DeepDoc performs these AI-heavy tasks:
Layout Detection Detects columns, headers, footers, reading order using CNN-based models (LayoutLM-style).
Table Structure Recognition (TSR) Identifies table boundaries, row/column spans, merged cells — extremely important for accurate retrieval.
Formula & Math Recognition Converts LaTeX/math images into readable text.
Enhanced OCR For scanned PDFs: runs deep-learning OCR models (not just Tesseract CPU).
Visual Language Model Tasks (charts, diagrams, screenshots) Optionally calls lightweight VLMs (Qwen2-VL, LLaVA, etc.) to describe images inside the document.
All of these run inside RAGFlow’s deepdoc module using PyTorch + CUDA when DEVICE=gpu is enabled.
Real-World Performance Numbers
When Do You Actually Need ragflow-gpu?
YES – You need it if your documents contain any of the following: - Scanned pages (images instead of selectable text) - Complex tables or financial reports - Charts, graphs, screenshots - Mixed layouts (multi-column, sidebars, footnotes) - Handwritten notes or formulas
NO – You can stay on ragflow-cpu if: - All documents are clean, born-digital text (Word → PDF, Markdown, etc.) - You only do quick prototypes with a few simple files - You have no NVIDIA GPU available
Recommended Setup in 2025 (Best of Both Worlds)
# .env file
DEVICE=gpu # ← Enables DeepDoc GPU acceleration
SVR_HTTP_PORT=80
# Use Ollama (or vLLM) for embeddings + LLM
EMBEDDING_MODEL=ollama/bge-large
RERANK_MODEL=reranker (this cannnot be served by ollama, vLLM or llama.cpp is needed)
LLM_MODEL=ollama/llama3.1:70b
Result: - DeepDoc parsing → blazing fast on your NVIDIA GPU (inside RAGFlow container) - Embeddings + LLM → also blazing fast on the same GPU (inside Ollama container) - No bottlenecks anywhere
Monitoring & Verification
During document upload:
watch -n 1 nvidia-smi
You will see: - RAGFlow container using 4–12 GB VRAM during parsing - Ollama container using VRAM only when embedding or generatinging
Conclusion
ragflow-cpu → fine for clean text only
ragflow-gpu → mandatory for real-world unstructured documents
Even if Ollama handles your LLM and embeddings perfectly, without ragflow-gpu, the very first step (understanding the document) will be painfully slow or inaccurate.
Use DEVICE=gpu — it’s the difference between a toy and a true enterprise-grade RAG system.