Security Concerns
The origin of ragflow is Chinese. Infiniflow is a Chinese company. It was open-sourced in 2024.
Chinese origin of RAGFlow is not a material security concern in practice — the project is one of the technically strongest deep-document-understanding RAG engines available right now.For organizations that must comply with very strict procurement rules, national security guidelines, or have board-level “no Chinese tech” policies → yes, it’s frequently considered a blocking factor, even if the technical risk is low.Choose according to your actual risk appetite and compliance requirements — not just origin country.
considerations:
to mitigate risc : → Self-host + build your own Docker image from the official source code → Audit the diff yourself or let your security team do it (very feasible — the codebase is not enormous) → Use only Western/local/open-source LLMs & embedding models → Result: concern level drops to same as any other active open-source project
sandbox:
RAGFlow is an excellent open-source deep-document-understanding + Agent-oriented RAG engine.It contains a very powerful feature: Code component inside Agents. Users can write Python or JavaScript code directly in the workflow/agent → this code can:Process retrieved chunks Call external APIs Do calculations / data cleaning Transform data Call other tools dynamically basically anything Python/JS can do
→ This is extremely powerful, but also extremely dangerous if you let random users (or even semi-trusted internal users) write code.
That’s where RAGFlow Sandbox + gVisor comes in
This can be enabled in “.env”.
- User → Agent workflow → Code component (Python/JS)
↓
- RAGFlow Sandbox Executor Manager
↓
- Spawns short-lived container(s) using
- runtime: runsc (gVisor)
↓
Code runs inside very strong syscall sandbox
elasticsearch:
xpack.security.enabled: true
Transport TLS enabled + working (verification_mode: certificate is ok)
HTTP TLS enabled (HTTPS) on all nodes that receive traffic
Strong passwords set for all built-in users (elastic, kibana_system, logstash_system, beats_system, apm_system, …)
Dedicated roles + users for each service (never use elastic superuser in production apps)
Keystore used for all passwords (not plaintext in yml!)
Firewall: 9300 only between nodes, 9200 only from trusted sources
Regular certificate rotation plan (at least yearly, better automated)
Monitoring of certificate expiration
Audit logging enabled (at least for authentication/security events)