AI Orchestration Hub¶
This hub documents the AI model infrastructure that powers the home security intelligence system. The system uses a multi-model architecture with dedicated services for object detection, risk analysis, and context enrichment.
Model Inventory¶
| Model | Port | Container | VRAM | Purpose |
|---|---|---|---|---|
| YOLO26 | 8095 | ai-yolo26 | ~650MB | Primary object detection |
| Nemotron 30B | 8091 | ai-llm | ~14.7GB | Risk analysis and reasoning |
| Florence-2 | 8092 | ai-florence | ~1.2GB | Vision-language captioning |
| Enrichment Service | 8094 | ai-enrichment | ~6.8GB (budget) | Multi-model enrichment |
Architecture Overview¶
%%{init: {
'theme': 'dark',
'themeVariables': {
'primaryColor': '#3B82F6',
'primaryTextColor': '#FFFFFF',
'primaryBorderColor': '#60A5FA',
'secondaryColor': '#A855F7',
'tertiaryColor': '#009688',
'background': '#121212',
'mainBkg': '#1a1a2e',
'lineColor': '#666666'
}
}}%%
flowchart TB
subgraph Backend["Backend API"]
DC[detector_client]
NA[nemotron_analyzer]
EC[enrichment_client]
AF[ai_fallback]
end
subgraph Clients["Client Layer"]
DCL[DetectorClient]
NAL[NemotronAnalyzer]
ECL[EnrichmentClient]
end
subgraph AI["AI Services"]
YOLO["YOLO26<br/>Port 8095<br/>Object Detection"]
NEM["Nemotron-3-Nano-30B-A3B<br/>Port 8091<br/>Risk Analysis"]
ENR["Enrichment Svc<br/>Port 8094<br/>Multi-model Zoo"]
end
DC --> DCL
NA --> NAL
EC --> ECL
DCL --> YOLO
YOLO --> NAL
NAL --> NEM
NEM --> ENR VRAM Budget Allocation¶
Total GPU VRAM: ~24GB (RTX 3090/4090)
| Component | VRAM | Notes |
|---|---|---|
| Nemotron-3-Nano-30B-A3B (Q4_K_M) | ~14,700 MB | Always loaded via llama.cpp |
| YOLO26 | 650 MB | Always loaded |
| Enrichment Model Zoo | 1,650 MB | On-demand loading with LRU eviction |
The enrichment service manages its own VRAM budget of ~6.8GB with LRU eviction for its internal models. See model-zoo.md for details.
Documents¶
| Document | Purpose |
|---|---|
| model-zoo.md | Model registry, VRAM management, LRU eviction |
| yolo26-client.md | YOLO26 detection client interface |
| nemotron-analyzer.md | LLM-based risk analysis service |
| enrichment-pipeline.md | Multi-model enrichment flow |
| fallback-strategies.md | Graceful degradation patterns |
Key Source Files¶
| File | Purpose |
|---|---|
backend/services/model_zoo.py | Backend-side model zoo registry |
backend/services/detector_client.py | YOLO26 HTTP client |
backend/services/nemotron_analyzer.py | Nemotron LLM analyzer |
backend/services/enrichment_client.py | Enrichment service client |
backend/services/ai_fallback.py | Fallback and degradation management |
ai/enrichment/model_manager.py | On-demand model manager |
ai/enrichment/model_registry.py | Enrichment model configurations |
Processing Pipeline¶
- Detection Phase: Images sent to YOLO26 for object detection
- Enrichment Phase: Detections enriched with additional context (pose, clothing, vehicle type, etc.)
- Analysis Phase: Nemotron LLM analyzes enriched detections and assigns risk scores
- Fallback Phase: If any service fails, graceful degradation provides default values
Circuit Breaker Integration¶
All AI clients integrate with the circuit breaker pattern to prevent cascade failures:
- Closed: Normal operation, requests pass through
- Open: Service unhealthy, requests rejected immediately
- Half-Open: Recovery testing with limited requests
See fallback-strategies.md for detailed degradation behavior.
Metrics and Observability¶
Key Prometheus metrics:
# Detection pipeline
hsi_detection_processed_total
hsi_detection_filtered_total
hsi_ai_request_duration_seconds{service="yolo26|nemotron|enrichment"}
# Model management
enrichment_vram_usage_bytes
enrichment_vram_utilization_percent
enrichment_model_evictions_total{model_name, priority}
enrichment_model_load_time_seconds{model_name}
# Circuit breakers
hsi_circuit_breaker_state{service}
hsi_circuit_breaker_trips_total{service}