Security Concerns

The origin of ragflow is Chinese. Infiniflow is a Chinese company. It was open-sourced in 2024.

Chinese origin of RAGFlow is not a material security concern in practice — the project is one of the technically strongest deep-document-understanding RAG engines available right now.For organizations that must comply with very strict procurement rules, national security guidelines, or have board-level “no Chinese tech” policies → yes, it’s frequently considered a blocking factor, even if the technical risk is low.Choose according to your actual risk appetite and compliance requirements — not just origin country.

considerations:

to mitigate risc : → Self-host + build your own Docker image from the official source code → Audit the diff yourself or let your security team do it (very feasible — the codebase is not enormous) → Use only Western/local/open-source LLMs & embedding models → Result: concern level drops to same as any other active open-source project

sandbox:

RAGFlow is an excellent open-source deep-document-understanding + Agent-oriented RAG engine.It contains a very powerful feature: Code component inside Agents. Users can write Python or JavaScript code directly in the workflow/agent → this code can:Process retrieved chunks Call external APIs Do calculations / data cleaning Transform data Call other tools dynamically basically anything Python/JS can do

→ This is extremely powerful, but also extremely dangerous if you let random users (or even semi-trusted internal users) write code.

That’s where RAGFlow Sandbox + gVisor comes in

This can be enabled in “.env”.

User → Agent workflow → Code component (Python/JS)

↓

RAGFlow Sandbox Executor Manager
↓

Spawns short-lived container(s) using

runtime: runsc (gVisor)
↓

Code runs inside very strong syscall sandbox

elasticsearch:

xpack.security.enabled: true
Transport TLS enabled + working (verification_mode: certificate is ok)
HTTP TLS enabled (HTTPS) on all nodes that receive traffic
Strong passwords set for all built-in users (elastic, kibana_system, logstash_system, beats_system, apm_system, …)
Dedicated roles + users for each service (never use elastic superuser in production apps)
Keystore used for all passwords (not plaintext in yml!)
Firewall: 9300 only between nodes, 9200 only from trusted sources
Regular certificate rotation plan (at least yearly, better automated)
Monitoring of certificate expiration
Audit logging enabled (at least for authentication/security events)