Ragflow Gpu Cpu

RAGFlow GPU vs CPU: Full Explanation (2025 Edition)

Why Does RAGFlow Still Need a GPU Even When Using Ollama?

You are absolutely right to ask this question.

Most people configure RAGFlow to use external services (Ollama, Xinference, vLLM, OpenAI, etc.) for:

Ollama runs these models entirely on your GPU — RAGFlow only sends HTTP requests. So why does the official documentation and community still strongly recommend the ragflow-gpu image (or setting DEVICE=gpu)?

The answer is simple: Deep Document Understanding (DeepDoc) — the part that happens before any embedding or LLM call.

Complete RAGFlow Pipeline (with GPU usage marked)

StepComponentRuns inside RAGFlow?Uses GPU when?
1. Document UploadDeepDoc parserYes (core of RAGFlow)YES — heavily
1. ChunkingText splittingYesNo (pure CPU)
1. EmbeddingSentence-Transformers, BGE…External (Ollama, etc.)GPU via Ollama/vLLM
1. Vector storageElasticsearch / InfiniFlow DBYesNo
1. RetrievalVector + keyword searchYesNo
1. Re-rankingCross-encoder (optional)External or localGPU via external service
1. Answer generationLLM (Llama 3, Qwen2, etc.)External (Ollama, etc.)GPU via Ollama/vLLM

→ The only part that RAGFlow itself accelerates with GPU is Step 1 — but it is by far the most compute-intensive for real-world documents.

What DeepDoc Actually Does (and Why GPU Makes It 5–20× Faster)

When you upload a PDF, scanned image, or complex report, DeepDoc performs these AI-heavy tasks:

  1. Layout Detection Detects columns, headers, footers, reading order using CNN-based models (LayoutLM-style).
  2. Table Structure Recognition (TSR) Identifies table boundaries, row/column spans, merged cells — extremely important for accurate retrieval.
  3. Formula & Math Recognition Converts LaTeX/math images into readable text.
  4. Enhanced OCR For scanned PDFs: runs deep-learning OCR models (not just Tesseract CPU).
  5. Visual Language Model Tasks (charts, diagrams, screenshots) Optionally calls lightweight VLMs (Qwen2-VL, LLaVA, etc.) to describe images inside the document.

All of these run inside RAGFlow’s deepdoc module using PyTorch + CUDA when DEVICE=gpu is enabled.

Real-World Performance Numbers

When Do You Actually Need ragflow-gpu?

YES – You need it if your documents contain any of the following:

NO – You can stay on ragflow-cpu if:

# .env file
DEVICE=gpu                     # ← Enables DeepDoc GPU acceleration
SVR_HTTP_PORT=80

# Use Ollama (or vLLM) for embeddings + LLM
EMBEDDING_MODEL=ollama/bge-large
RERANK_MODEL=reranker  (this cannnot be served by ollama, vLLM or llama.cpp is needed)
LLM_MODEL=ollama/llama3.1:70b

Result:

Monitoring & Verification

During document upload:

watch -n 1 nvidia-smi

You will see:

Conclusion

Even if Ollama handles your LLM and embeddings perfectly, without ragflow-gpu, the very first step (understanding the document) will be painfully slow or inaccurate.

Use DEVICE=gpu — it’s the difference between a toy and a true enterprise-grade RAG system.