homelab
problem older hardware :
Xeon E5-2690 v2
Tesla P100 16GB
Driver CUDA 11.8
software :
CUDA 12.8
AVX512 or AV2 (> xeon v3
problem reranking:
compiled llama.cpp and gguf model which works OK
unfortunately not recognised in infiniflow/ragflow (not anymore)
possible solution:
installing xinference (cpu version) (dockercontainer for GPU too big and CUDA 12.8)
docker run -d \
--name xinference-cpu \
-p 9997:9997 \
--shm-size=8g \
--restart unless-stopped \
-v /home/jan/models:/models \
xprobe/xinference:latest-cpu \
xinference-local -H 0.0.0.0
docker exec xinference-cpu xinference launch \
--model-name qwen3-reranker-4b \
--model-format gguf \
--model-uri /models/Qwen3-Reranker-4B-q5_k_m.gguf \
--model-type rerank
docker exec -it xinference-cpu xinference launch \
--model-name Qwen3-Reranker-4B \
--model-type rerank \
-- \
--model-format gguf \
--model-uri /models/Qwen3-Reranker-4B-q5_k_m.gguf
in the container : (this seems to work OK)
xinference launch \
--model-name Qwen3-Reranker-4B \
--model-type rerank \
-- \
--model-format gguf \
--model-uri /models/Qwen3-Reranker-4B-q5_k_m.gguf