Triton CUDA Init Failure in Rootless Podman¶
Symptom:
cudaGetDeviceCount()returnserr=3(cudaErrorInitializationError) in ai-gateway container. Triton cannot load GPU models. ai-llm (llama.cpp) works on the same GPU.Environment: Ubuntu 24.04, rootless Podman, NVIDIA driver 525+, CDI (
nvidia.com/gpu=0orall)
Root Cause¶
CUDA Driver API vs Runtime API:
| Component | API | Rootless behavior |
|---|---|---|
| ai-llm (llama.cpp) | CUDA Driver API (libcuda.so) | ✅ Works |
| ai-gateway (Triton) | CUDA Runtime API (libcudart.so) | ❌ Fails with err=3 |
| nvidia-smi | NVML (Driver API) | ✅ Works |
On driver 525+, the CUDA Runtime API may require nvidia-cap device access for initialization. Rootless Podman's user namespace prevents proper nvidia-cap passthrough even when /dev/nvidia0 and other devices are present.
Solutions (in order of recommendation)¶
1. Run ai-gateway with rootful Podman (most reliable)¶
This project already uses rootful for DCGM exporter (same limitation). Run ai-gateway as a rootful service:
# Stop rootless stack
podman compose -f docker-compose.prod.yml down
# Start ai-gateway with rootful Podman
sudo podman compose -f docker-compose.prod.yml up -d ai-gateway
# Start rest with rootless (backend depends on ai-gateway)
podman compose -f docker-compose.prod.yml up -d
Alternative: Run the entire stack rootful:
2. Rootless CDI spec in user directory¶
The project's setup uses /etc/cdi/nvidia.yaml (system-wide). Rootless Podman may need the spec in the user directory:
# Generate CDI spec for rootless (no sudo)
mkdir -p ~/.config/cdi
nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml
# Verify Podman finds it
podman info | grep -A5 cdi
# Test
podman run --rm --device nvidia.com/gpu=0 nvidia/cuda:12.0-base-ubuntu22.04 \
python3 -c "import ctypes; lib=ctypes.CDLL('libcudart.so'); c=ctypes.c_int(); print('err=', lib.cudaGetDeviceCount(ctypes.byref(c)), 'count=', c.value)"
If this still returns err=3, rootless + CUDA Runtime API is the blocker.
3. nvidia-cap device permissions (driver 525+)¶
CUDA Runtime API on newer drivers may require /dev/nvidia-caps/nvidia-cap1 and nvidia-cap2:
# Check current permissions
ls -la /dev/nvidia-caps/
# Add to /etc/modprobe.d/nvidia-ai.conf (then reboot)
options nvidia NVreg_DeviceFileMode=0666
# Or udev rule for nvidia-cap (create /etc/udev/rules.d/70-nvidia-cap.rules)
# KERNEL=="nvidia-cap*", MODE="0666"
# Then: sudo udevadm control --reload-rules && sudo udevadm trigger
Note: NVreg_DeviceFileMode=0666 affects /dev/nvidia*, not necessarily nvidia-cap. Some users report nvidia-cap stays at 400 after reboot. The udev rule targets nvidia-cap specifically.
4. Verify CDI spec includes nvidia-cap¶
# Inspect CDI spec
cat /etc/cdi/nvidia.yaml | grep -A20 "containerEdits"
# or
cat ~/.config/cdi/nvidia.yaml | grep -A20 "containerEdits"
The spec should list nvidia-cap1 and nvidia-cap2 in deviceNodes for rootless to pass them. If they're missing, regenerate:
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# For rootless:
nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml
5. Explicit nvidia-cap bind mounts (compose override)¶
If CDI doesn't pass nvidia-cap in rootless, try explicit mounts. Create docker-compose.override.rootless-gpu.yml:
services:
ai-gateway:
devices:
- nvidia.com/gpu=0
- /dev/nvidia-caps/nvidia-cap1:/dev/nvidia-caps/nvidia-cap1
- /dev/nvidia-caps/nvidia-cap2:/dev/nvidia-caps/nvidia-cap2
Then:
podman compose -f docker-compose.prod.yml -f docker-compose.override.rootless-gpu.yml up -d ai-gateway
Prerequisite: nvidia-cap must be world-readable on the host (chmod 666 or udev rule). If the host has them at 400, rootless cannot access them.
Project-specific configuration¶
CDI generation (this codebase)¶
- setup_lib/nvidia_toolkit.py:
configure_podman_runtime()runssudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml - Rootless path: Not currently generated by setup. Manually run:
nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml
ai-gateway service (docker-compose.prod.yml)¶
ai-gateway:
devices:
- nvidia.com/gpu=0 # or nvidia.com/gpu=all for multi-GPU
environment:
- CUDA_VISIBLE_DEVICES=${GPU_AI_SERVICES:-0}
For single GPU (e.g. Brev A100): set GPU_AI_SERVICES=0 in .env.
USE_AI_GATEWAY fallback¶
The backend supports USE_AI_GATEWAY=false with individual service URLs. However, docker-compose.prod.yml does not define standalone ai-yolo26, ai-enrichment, ai-florence, ai-clip — those would need to be added. More importantly, standalone enrichment services also use PyTorch/ONNX Runtime (CUDA Runtime API) and would likely fail with the same error in rootless. The only service that works in rootless is ai-llm (llama.cpp, Driver API).
Quick repro¶
podman exec <ai-gateway-container> python3 -c "
import ctypes
lib = ctypes.CDLL('libcudart.so')
count = ctypes.c_int()
err = lib.cudaGetDeviceCount(ctypes.byref(count))
print(f'err={err}, count={count.value}') # err=3, count=0 when broken
"
References¶
- NVIDIA Container Toolkit CDI
- Podman #9926: CUDA under rootless
- NVIDIA #210: Insufficient Permissions rootless
- This project: GPU Setup — rootless CDI path at ~/.config/cdi/nvidia.yaml
- This project:
setup_lib/rootful_services.py— DCGM requires rootful; same pattern may apply to Triton
See Also¶
- GPU Issues - General GPU troubleshooting
- Troubleshooting Index - Back to symptom index