AI Documentation - Agent Guide¶

Purpose¶

This directory contains architecture and design documentation for the AI model zoo and inference pipeline. It provides comprehensive documentation for understanding, configuring, and extending the AI subsystem.

Directory Contents¶

docs/ai/
├── AGENTS.md          # This file - navigation guide
└── model-zoo.md       # AI Model Zoo Architecture documentation

Key Documents¶

Model Zoo Architecture (`model-zoo.md`)¶

Comprehensive documentation covering:

Architecture Overview - Detection pipeline and service topology
Always-Loaded Models - YOLO26, Florence-2, CLIP, Nemotron
On-Demand Models - Threat detection, pose estimation, demographics, clothing, vehicle, pet, re-ID, depth, action recognition
VRAM Management - On-demand loading with LRU eviction and priority-based ordering
API Reference - Unified enrichment endpoint and model management APIs
Environment Variables - Configuration for all AI services
Adding New Models - Step-by-step guide for extending the model zoo

Quick Links¶

Topic	Location
Model zoo architecture	model-zoo.md
AI service implementation	ai/AGENTS.md
YOLO26 detection	ai/yolo26/AGENTS.md
Florence-2 vision-language	ai/florence/AGENTS.md
CLIP embeddings	ai/clip/AGENTS.md
Enrichment service	ai/enrichment/AGENTS.md
Nemotron LLM	ai/nemotron/AGENTS.md

Common Tasks¶

Understanding Model Capabilities¶

Read model-zoo.md for the complete model inventory
Each model section includes:
Model source (HuggingFace link)
VRAM requirements
Input/output formats
Trigger conditions

Configuring VRAM Budget¶

See the "VRAM Management" section in model-zoo.md:

Default budget: 6.8GB for on-demand models
Configure via VRAM_BUDGET_GB environment variable
Priority system controls eviction order

Adding New Models¶

Follow the "Adding New Models" guide in model-zoo.md:

Create model wrapper in ai/enrichment/models/
Register in model_registry.py
Add trigger conditions
(Optional) Add API endpoint
Update docker-compose volumes
Update documentation

Debugging Model Loading¶

Check model status via API:

curl http://localhost:8094/models/status

Review logs for loading/eviction events:

docker logs ai-enrichment 2>&1 | grep -E "(Loading|Evicting|Unloaded)"

Architecture Overview: docs/architecture/overview.md
AI Pipeline API: docs/developer/api/ai-pipeline.md
Deployment Guide: docs/operator/deployment/README.md
Troubleshooting: docs/reference/troubleshooting/ai-issues.md

Entry Points¶

Model zoo documentation: model-zoo.md - Start here for AI architecture
Implementation code: ai/AGENTS.md - For service implementation details
Backend integration: backend/services/ - Client code for AI services