Ragflow Gpu Cpu
RAGFlow GPU vs CPU: Full Explanation (2025 Edition)
Why Does RAGFlow Still Need a GPU Even When Using Ollama?
You are absolutely right to ask this question.
Most people configure RAGFlow to use external services (Ollama, Xinference, vLLM, OpenAI, etc.) for:
- Embedding model
- Re-ranking model
- Inference LLM (the actual answer generator)
Ollama runs these models entirely on your GPU — RAGFlow only sends HTTP requests.
So why does the official documentation and community still strongly recommend the ragflow-gpu image (or setting DEVICE=gpu)?
The answer is simple: Deep Document Understanding (DeepDoc) — the part that happens before any embedding or LLM call.
Complete RAGFlow Pipeline (with GPU usage marked)
| Step | Component | Runs inside RAGFlow? | Uses GPU when? |
|---|---|---|---|
| 1. Document Upload | DeepDoc parser | Yes (core of RAGFlow) | YES — heavily |
| 1. Chunking | Text splitting | Yes | No (pure CPU) |
| 1. Embedding | Sentence-Transformers, BGE… | External (Ollama, etc.) | GPU via Ollama/vLLM |
| 1. Vector storage | Elasticsearch / InfiniFlow DB | Yes | No |
| 1. Retrieval | Vector + keyword search | Yes | No |
| 1. Re-ranking | Cross-encoder (optional) | External or local | GPU via external service |
| 1. Answer generation | LLM (Llama 3, Qwen2, etc.) | External (Ollama, etc.) | GPU via Ollama/vLLM |
→ The only part that RAGFlow itself accelerates with GPU is Step 1 — but it is by far the most compute-intensive for real-world documents.
What DeepDoc Actually Does (and Why GPU Makes It 5–20× Faster)
When you upload a PDF, scanned image, or complex report, DeepDoc performs these AI-heavy tasks:
- Layout Detection Detects columns, headers, footers, reading order using CNN-based models (LayoutLM-style).
- Table Structure Recognition (TSR) Identifies table boundaries, row/column spans, merged cells — extremely important for accurate retrieval.
- Formula & Math Recognition Converts LaTeX/math images into readable text.
- Enhanced OCR For scanned PDFs: runs deep-learning OCR models (not just Tesseract CPU).
- Visual Language Model Tasks (charts, diagrams, screenshots) Optionally calls lightweight VLMs (Qwen2-VL, LLaVA, etc.) to describe images inside the document.
All of these run inside RAGFlow’s deepdoc module using PyTorch + CUDA when DEVICE=gpu is enabled.
Real-World Performance Numbers
When Do You Actually Need ragflow-gpu?
YES – You need it if your documents contain any of the following:
- Scanned pages (images instead of selectable text)
- Complex tables or financial reports
- Charts, graphs, screenshots
- Mixed layouts (multi-column, sidebars, footnotes)
- Handwritten notes or formulas
NO – You can stay on ragflow-cpu if:
- All documents are clean, born-digital text (Word → PDF, Markdown, etc.)
- You only do quick prototypes with a few simple files
- You have no NVIDIA GPU available
Recommended Setup in 2025 (Best of Both Worlds)
# .env file
DEVICE=gpu # ← Enables DeepDoc GPU acceleration
SVR_HTTP_PORT=80
# Use Ollama (or vLLM) for embeddings + LLM
EMBEDDING_MODEL=ollama/bge-large
RERANK_MODEL=reranker (this cannnot be served by ollama, vLLM or llama.cpp is needed)
LLM_MODEL=ollama/llama3.1:70b
Result:
- DeepDoc parsing → blazing fast on your NVIDIA GPU (inside RAGFlow container)
- Embeddings + LLM → also blazing fast on the same GPU (inside Ollama container)
- No bottlenecks anywhere
Monitoring & Verification
During document upload:
watch -n 1 nvidia-smi
You will see:
- RAGFlow container using 4–12 GB VRAM during parsing
- Ollama container using VRAM only when embedding or generatinging
Conclusion
- ragflow-cpu → fine for clean text only
- ragflow-gpu → mandatory for real-world unstructured documents
Even if Ollama handles your LLM and embeddings perfectly, without ragflow-gpu, the very first step (understanding the document) will be painfully slow or inaccurate.
Use DEVICE=gpu — it’s the difference between a toy and a true enterprise-grade RAG system.