Triton CUDA Init Failure in Rootless Podman¶

Symptom: cudaGetDeviceCount() returns err=3 (cudaErrorInitializationError) in ai-gateway container. Triton cannot load GPU models. ai-llm (llama.cpp) works on the same GPU.

Environment: Ubuntu 24.04, rootless Podman, NVIDIA driver 525+, CDI (nvidia.com/gpu=0 or all)

Root Cause¶

CUDA Driver API vs Runtime API:

Component	API	Rootless behavior
ai-llm (llama.cpp)	CUDA Driver API (`libcuda.so`)	✅ Works
ai-gateway (Triton)	CUDA Runtime API (`libcudart.so`)	❌ Fails with err=3
nvidia-smi	NVML (Driver API)	✅ Works

On driver 525+, the CUDA Runtime API may require nvidia-cap device access for initialization. Rootless Podman's user namespace prevents proper nvidia-cap passthrough even when /dev/nvidia0 and other devices are present.

Solutions (in order of recommendation)¶

1. Run ai-gateway with rootful Podman (most reliable)¶

This project already uses rootful for DCGM exporter (same limitation). Run ai-gateway as a rootful service:

# Stop rootless stack
podman compose -f docker-compose.prod.yml down

# Start ai-gateway with rootful Podman
sudo podman compose -f docker-compose.prod.yml up -d ai-gateway

# Start rest with rootless (backend depends on ai-gateway)
podman compose -f docker-compose.prod.yml up -d

Alternative: Run the entire stack rootful:

sudo podman compose -f docker-compose.prod.yml up -d

2. Rootless CDI spec in user directory¶

The project's setup uses /etc/cdi/nvidia.yaml (system-wide). Rootless Podman may need the spec in the user directory:

# Generate CDI spec for rootless (no sudo)
mkdir -p ~/.config/cdi
nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml

# Verify Podman finds it
podman info | grep -A5 cdi

# Test
podman run --rm --device nvidia.com/gpu=0 nvidia/cuda:12.0-base-ubuntu22.04 \
  python3 -c "import ctypes; lib=ctypes.CDLL('libcudart.so'); c=ctypes.c_int(); print('err=', lib.cudaGetDeviceCount(ctypes.byref(c)), 'count=', c.value)"

If this still returns err=3, rootless + CUDA Runtime API is the blocker.

3. nvidia-cap device permissions (driver 525+)¶

CUDA Runtime API on newer drivers may require /dev/nvidia-caps/nvidia-cap1 and nvidia-cap2:

# Check current permissions
ls -la /dev/nvidia-caps/

# Add to /etc/modprobe.d/nvidia-ai.conf (then reboot)
options nvidia NVreg_DeviceFileMode=0666

# Or udev rule for nvidia-cap (create /etc/udev/rules.d/70-nvidia-cap.rules)
# KERNEL=="nvidia-cap*", MODE="0666"
# Then: sudo udevadm control --reload-rules && sudo udevadm trigger

Note: NVreg_DeviceFileMode=0666 affects /dev/nvidia*, not necessarily nvidia-cap. Some users report nvidia-cap stays at 400 after reboot. The udev rule targets nvidia-cap specifically.

4. Verify CDI spec includes nvidia-cap¶

# Inspect CDI spec
cat /etc/cdi/nvidia.yaml | grep -A20 "containerEdits"
# or
cat ~/.config/cdi/nvidia.yaml | grep -A20 "containerEdits"

The spec should list nvidia-cap1 and nvidia-cap2 in deviceNodes for rootless to pass them. If they're missing, regenerate:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# For rootless:
nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml

5. Explicit nvidia-cap bind mounts (compose override)¶

If CDI doesn't pass nvidia-cap in rootless, try explicit mounts. Create docker-compose.override.rootless-gpu.yml:

services:
  ai-gateway:
    devices:
      - nvidia.com/gpu=0
      - /dev/nvidia-caps/nvidia-cap1:/dev/nvidia-caps/nvidia-cap1
      - /dev/nvidia-caps/nvidia-cap2:/dev/nvidia-caps/nvidia-cap2

Then:

podman compose -f docker-compose.prod.yml -f docker-compose.override.rootless-gpu.yml up -d ai-gateway

Prerequisite: nvidia-cap must be world-readable on the host (chmod 666 or udev rule). If the host has them at 400, rootless cannot access them.

Project-specific configuration¶

CDI generation (this codebase)¶

setup_lib/nvidia_toolkit.py: configure_podman_runtime() runs sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
Rootless path: Not currently generated by setup. Manually run: nvidia-ctk cdi generate --output=$HOME/.config/cdi/nvidia.yaml

ai-gateway service (docker-compose.prod.yml)¶

ai-gateway:
  devices:
    - nvidia.com/gpu=0   # or nvidia.com/gpu=all for multi-GPU
  environment:
    - CUDA_VISIBLE_DEVICES=${GPU_AI_SERVICES:-0}

For single GPU (e.g. Brev A100): set GPU_AI_SERVICES=0 in .env.

USE_AI_GATEWAY fallback¶

The backend supports USE_AI_GATEWAY=false with individual service URLs. However, docker-compose.prod.yml does not define standalone ai-yolo26, ai-enrichment, ai-florence, ai-clip — those would need to be added. More importantly, standalone enrichment services also use PyTorch/ONNX Runtime (CUDA Runtime API) and would likely fail with the same error in rootless. The only service that works in rootless is ai-llm (llama.cpp, Driver API).

Quick repro¶

podman exec <ai-gateway-container> python3 -c "
import ctypes
lib = ctypes.CDLL('libcudart.so')
count = ctypes.c_int()
err = lib.cudaGetDeviceCount(ctypes.byref(count))
print(f'err={err}, count={count.value}')  # err=3, count=0 when broken
"

References¶

NVIDIA Container Toolkit CDI
Podman #9926: CUDA under rootless
NVIDIA #210: Insufficient Permissions rootless
This project: GPU Setup — rootless CDI path at ~/.config/cdi/nvidia.yaml
This project: setup_lib/rootful_services.py — DCGM requires rootful; same pattern may apply to Triton