AI Services Configuration¶

Configure environment variables for AI inference services.

Time to read: ~12 min Prerequisites: AI Installation

Environment Variables¶

Set these in your shell profile (~/.bashrc or ~/.zshrc) or in a .env file at the project root.

Required Variables¶

Variable	Description	Example
`PROJECT_ROOT`	Root directory of the project	`$HOME/github/home-security-intelligence`

AI Service Startup¶

These control the startup script (scripts/start-ai.sh):

Variable	Description	Default
`YOLO26_PORT`	Port for YOLO26 detection server	`8095`
`NEMOTRON_PORT`	Port for NVIDIA Nemotron LLM server	`8091`

Note: Log file paths are hardcoded:

YOLO26: /tmp/yolo26-detector.log
NVIDIA Nemotron: /tmp/nemotron-llm.log

YOLO26 Detection Server¶

Configuration for ai/yolo26/model.py. The YOLO26 server uses TensorRT-optimized engines for efficient GPU inference.

Core Configuration¶

Variable	Description	Default
`YOLO26_MODEL_PATH`	Path to TensorRT engine (.engine) or PyTorch model	`/models/yolo26/exports/yolo26m_fp16.engine`
`YOLO26_CONFIDENCE`	Detection confidence threshold (0.0-1.0)	`0.5`
`YOLO26_CACHE_CLEAR_FREQUENCY`	Clear CUDA cache every N detections (0 to disable)	`1`
`PORT`	Server port (direct execution)	`8095`
`HOST`	Bind address	`0.0.0.0`

TensorRT Engine Paths¶

TensorRT engines are GPU-architecture specific. Use the appropriate engine for your deployment:

Precision	Engine Path	VRAM	Latency	Use Case
FP16	`/models/yolo26/exports/yolo26m_fp16.engine`	~2 GB	10-20ms	Default, highest accuracy
INT8	`/models/yolo26/exports/yolo26m_int8.engine`	~1.5GB	5-10ms	High throughput, multi-camera
PT	`/models/yolo26/yolo26m.pt` (fallback)	~3 GB	30-50ms	Development, TensorRT unavail

Production engines should be stored at /export/ai_models/model-zoo/yolo26/exports/.

TensorRT Version Compatibility¶

TensorRT engines are version-specific. The server automatically handles version mismatches:

Variable	Description	Default
`YOLO26_AUTO_REBUILD`	Auto-rebuild engine on TensorRT version mismatch	`true`
`YOLO26_PT_MODEL_PATH`	Path to source .pt model for rebuilding (if auto-rebuild)	(derived)

When the TensorRT runtime version differs from the engine version:

The server detects the version mismatch error
If YOLO26_AUTO_REBUILD=true, it rebuilds the engine from the .pt model
If rebuild fails or is disabled, it falls back to PyTorch inference

torch.compile Optimization (PyTorch fallback only)¶

When using PyTorch models (not TensorRT engines), torch.compile provides 15-30% speedup:

Variable	Description	Default
`TORCH_COMPILE_ENABLED`	Enable PyTorch 2.0+ graph compilation	`true`
`TORCH_COMPILE_MODE`	Compilation mode	`reduce-overhead`
`TORCH_COMPILE_BACKEND`	Compilation backend	`inductor`
`TORCH_COMPILE_CACHE_DIR`	Cache directory for compiled graphs	(system default)

Compilation modes:

default - Balanced optimization
reduce-overhead - Faster compilation, good speedup (recommended)
max-autotune - Best performance, slower compilation

Note: torch.compile is automatically skipped for TensorRT engines since they are already graph-optimized.

Exporting TensorRT Engines¶

Use the export script to generate TensorRT engines for your GPU:

# Export FP16 engine (default, higher accuracy)
python ai/yolo26/export_tensorrt.py --model yolo26m.pt --output exports/

# Export INT8 engine (2x throughput, requires calibration)
python ai/yolo26/export_tensorrt.py \
    --model yolo26m.pt \
    --int8 \
    --data config/yolo26_calibration.yaml \
    --output exports/

# Benchmark exported engine
python ai/yolo26/export_tensorrt.py --benchmark exports/yolo26m_fp16.engine

INT8 calibration requirements:

100-500 representative images from your deployment environment
Cover various lighting conditions and camera angles
Include all security-relevant object classes

For detailed export options, see ai/yolo26/README.md.

NVIDIA Nemotron LLM Server¶

Configuration for ai/start_llm.sh (development) and ai/start_nemotron.sh (production):

Variable	Description	Default (Dev)
`NEMOTRON_MODEL_PATH`	Path to GGUF model file	`$PROJECT_ROOT/ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf`
`NEMOTRON_PORT`	Server port	`8091`
`GPU_LAYERS`	Layers offloaded to GPU	`35`
`CTX_SIZE`	Context window size	`131072` (128K)

Model Zoo State Machine

Model Zoo state machine showing model lifecycle transitions: loading, loaded, unloading, and error states.

Model Options:

Deployment	Model	File	VRAM	Context
Production	NVIDIA Nemotron-3-Nano-30B-A3B	`Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf`	~14.7 GB	131,072
Development	Nemotron Mini 4B Instruct	`nemotron-mini-4b-instruct-q4_k_m.gguf`	~3 GB	4,096

For comprehensive NVIDIA Nemotron documentation, see /ai/nemotron/AGENTS.md.

Backend Configuration¶

These configure how the backend connects to AI services (backend/core/config.py):

Variable	Description	Default
`YOLO26_URL`	Full URL to YOLO26 service	`http://localhost:8095`
`NEMOTRON_URL`	Full URL to NVIDIA Nemotron service	`http://localhost:8091`
`YOLO26_API_KEY`	API key for YOLO26 authentication	(none)
`NEMOTRON_API_KEY`	API key for NVIDIA Nemotron authentication	(none)

Setting Up Environment Variables¶

Option 1: Shell Profile¶

Add to ~/.bashrc or ~/.zshrc:

export PROJECT_ROOT="$HOME/github/home-security-intelligence"

Option 2: Project .env File¶

echo 'PROJECT_ROOT="$HOME/github/home-security-intelligence"' >> .env

Option 3: Current Session Only¶

export PROJECT_ROOT="/path/to/your/project"

Container Networking¶

When AI services run in containers, use appropriate host resolution:

For a decision table and copy/paste .env snippets for every deployment mode, use: Deployment Modes & AI Networking.

Platform	Runtime	AI Service URLs
macOS	Docker Desktop	`http://host.docker.internal:8095`
macOS	Podman	`http://host.containers.internal:8095`
Linux	Docker/Podman	`http://192.168.1.100:8095` (host IP)

Production compose DNS (recommended)¶

When running docker-compose.prod.yml, the backend reaches AI services by compose DNS:

YOLO26_URL=http://ai-yolo26:8095
NEMOTRON_URL=http://ai-llm:8091
FLORENCE_URL=http://ai-florence:8092
CLIP_URL=http://ai-clip:8093
ENRICHMENT_URL=http://ai-enrichment:8094

Example .env for macOS with Docker:

YOLO26_URL=http://host.docker.internal:8095
NEMOTRON_URL=http://host.docker.internal:8091

Example .env for macOS with Podman:

export AI_HOST=host.containers.internal
YOLO26_URL=http://${AI_HOST}:8095
NEMOTRON_URL=http://${AI_HOST}:8091

Example .env for Linux:

# Get your host IP
export AI_HOST=$(hostname -I | awk '{print $1}')
YOLO26_URL=http://${AI_HOST}:8095
NEMOTRON_URL=http://${AI_HOST}:8091

Detection Settings¶

Fine-tune object detection behavior:

Variable	Description	Default
`DETECTION_CONFIDENCE_THRESHOLD`	Minimum confidence to store	`0.5`
`FAST_PATH_CONFIDENCE_THRESHOLD`	Threshold for fast-path	`0.90`
`FAST_PATH_OBJECT_TYPES`	Types eligible for fast-path	`["person"]`

Confidence threshold trade-offs:

Lower (0.3-0.5): More detections, more false positives
Higher (0.6-0.8): Fewer detections, fewer false positives

Timeout Settings¶

Control connection and read timeouts:

Variable	Description	Default
`AI_CONNECT_TIMEOUT`	Connection timeout (seconds)	`10.0`
`YOLO26_READ_TIMEOUT`	Detection read timeout	`60.0`
`NEMOTRON_READ_TIMEOUT`	LLM analysis read timeout	`120.0`

Systemd Services¶

For production deployment with systemd:

Variable	Description	Default
`AI_SERVICE_USER`	User to run systemd services as	Current user

Complete Example .env¶

# Project root
PROJECT_ROOT=/home/user/home-security-intelligence

# AI service ports (for startup scripts)
YOLO26_PORT=8095
NEMOTRON_PORT=8091

# AI service URLs (for backend)
YOLO26_URL=http://localhost:8095
NEMOTRON_URL=http://localhost:8091

# YOLO26 TensorRT configuration
YOLO26_MODEL_PATH=/models/yolo26/exports/yolo26m_fp16.engine
YOLO26_CONFIDENCE=0.5
YOLO26_CACHE_CLEAR_FREQUENCY=1
YOLO26_AUTO_REBUILD=true
# YOLO26_PT_MODEL_PATH=/models/yolo26/yolo26m.pt  # Optional: explicit fallback path

# torch.compile (PyTorch fallback only)
TORCH_COMPILE_ENABLED=true
TORCH_COMPILE_MODE=reduce-overhead

# Detection tuning
DETECTION_CONFIDENCE_THRESHOLD=0.5
FAST_PATH_CONFIDENCE_THRESHOLD=0.90
FAST_PATH_OBJECT_TYPES=["person"]

# Timeouts
AI_CONNECT_TIMEOUT=10.0
YOLO26_READ_TIMEOUT=60.0
NEMOTRON_READ_TIMEOUT=120.0

Next Steps¶

AI Services - Start and verify services
AI Troubleshooting - Common issues and solutions
AI Performance - Performance tuning