AI Pipeline Architecture¶

AI-generated diagram illustrating the end-to-end AI pipeline architecture with detection, batching, and analysis components.

AI-generated visualization of the AI detection pipeline from camera input through YOLO26 and Nemotron to security events.

End-to-end AI pipeline showing FileWatcher, queues, YOLO26, batch aggregation, Nemotron analysis, and event creation.
This document provides comprehensive technical documentation for the AI-powered detection and analysis pipeline in the Home Security Intelligence system. It is intended for maintainers who need to debug, extend, or optimize the AI processing flow.
Table of Contents¶
- Pipeline Overview
- File Watcher
- YOLO26 Integration
- Batching Logic
- Nemotron Analysis
- Risk Score Calculation
- Error Handling
- Data Models
- Configuration Reference
Pipeline Overview¶
The AI pipeline transforms raw camera images into risk-scored security events through a multi-stage process. The pipeline is designed for efficiency (batching similar detections) and responsiveness (fast-path for critical detections).
High-Level Flow¶

The pipeline processes images through two paths:
- Normal Path: Detections are batched over 30-90 second windows before LLM analysis
- Fast Path: High-confidence (≥90%) critical detections bypass batching for immediate analysis
Complete Pipeline Sequence Diagram¶
End-to-end sequence showing FTP upload, FileWatcher processing, YOLO26 detection, batch aggregation with fast-path logic, Nemotron LLM analysis, and WebSocket broadcast to dashboard.
Diagram: Pipeline Sequence¶
sequenceDiagram
participant Camera as Foscam Camera
participant FTP as FTP Server
participant FW as FileWatcher
participant DQ as detection_queue
participant DW as DetectionQueueWorker
participant RT as YOLO26 (8095)
participant DB as PostgreSQL
participant BA as BatchAggregator
participant AQ as analysis_queue
participant AW as AnalysisQueueWorker
participant NEM as Nemotron LLM (8091)
participant WS as WebSocket
Note over Camera,WS: Normal Path (batched)
Camera->>FTP: Upload image via FTP
FTP->>FW: inotify/FSEvents trigger
FW->>FW: Debounce (0.5s)
FW->>FW: Validate image integrity
FW->>DQ: Queue {camera_id, file_path}
DW->>DQ: BLPOP (5s timeout)
DQ-->>DW: Detection job
DW->>RT: POST /detect (multipart image)
Note right of RT: ~30-50ms inference
RT-->>DW: JSON {detections, inference_time_ms}
DW->>DB: INSERT Detection records
Note over DW,NEM: Enrichment (best-effort)
DW->>BA: add_detection(camera_id, detection_id, confidence, object_type)
alt Fast Path (confidence >= 0.90 AND object_type in fast_path_types)
BA->>NEM: Immediate analysis (may include enrichment/context)
NEM-->>BA: Risk assessment JSON
BA->>DB: INSERT Event (is_fast_path=true)
BA->>WS: Broadcast event
else Normal Batching
BA->>BA: Add to batch, update last_activity
end
Note over BA,AQ: Batch Timeout Check (every 10s)
BA->>BA: check_batch_timeouts()
alt Window expired (90s) OR Idle timeout (30s)
BA->>AQ: Push {batch_id, camera_id, detection_ids}
BA->>BA: Cleanup Redis keys
end
AW->>AQ: BLPOP (5s timeout)
AQ-->>AW: Analysis job
AW->>DB: SELECT Detection details
AW->>AW: Load detections + run ContextEnricher + EnrichmentPipeline (best-effort)
AW->>NEM: POST /completion (prompt with enrichment)
Note right of NEM: ~2-5s inference
NEM-->>AW: JSON {risk_score, risk_level, summary, reasoning}
AW->>DB: INSERT Event record
AW->>WS: Broadcast new event Pipeline Timing Characteristics¶
| Stage | Typical Duration | Notes |
|---|---|---|
| File upload detection | ~10ms | Native filesystem notifications (inotify/FSEvents) |
| Debounce delay | 500ms | Configurable, prevents duplicate processing |
| Image validation | ~5-10ms | PIL verify() |
| YOLO26 inference | 30-50ms | GPU accelerated, RTX A5500 |
| Database write (detections) | ~5-10ms | PostgreSQL async |
| Batch aggregation | ~1ms | Redis operations |
| Batch window | 30-90s | Collects related detections |
| Nemotron LLM inference | 2-5s | GPU accelerated |
| Event creation + broadcast | ~10ms | Database + WebSocket |
Total latency:
- Fast path (critical detections): ~3-6 seconds
- Normal path: 30-95 seconds (dominated by batch window)
File Watcher¶
The FileWatcher service monitors camera upload directories for new images and initiates the processing pipeline.
Source Files¶
/backend/services/file_watcher.py- Main watcher implementation/backend/services/dedupe.py- Duplicate file prevention
How It Works¶
-
Filesystem Monitoring: Uses watchdog library with native OS backends:
-
Linux: inotify (kernel-level notifications)
- macOS: FSEvents
-
Windows: ReadDirectoryChangesW
-
Event Handling: Monitors
on_createdandon_modifiedevents for image files (.jpg, .jpeg, .png) -
Debounce Logic: Waits 0.5 seconds after the last modification before processing. This handles:
-
FTP uploads that write in chunks
- Multiple watchdog events for same file
-
Incomplete file writes
-
Thread-Safe Async Bridge: Watchdog runs in a separate thread, so events are scheduled to the asyncio event loop via
asyncio.run_coroutine_threadsafe()
Directory Structure¶
/export/foscam/ # FOSCAM_BASE_PATH
front_door/ # Camera ID = "front_door"
MDAlarm_20251228_120000.jpg
MDAlarm_20251228_120030.jpg
backyard/ # Camera ID = "backyard"
snap_20251228_120015.jpg
Deduplication¶
Files are deduplicated using SHA256 content hashes stored in Redis:
This prevents duplicate processing from:
- Watchdog create/modify event bursts
- Service restarts during file processing
- FTP upload retries
Queue Output Format¶
{
"camera_id": "front_door",
"file_path": "/export/foscam/front_door/MDAlarm_20251228_120000.jpg",
"timestamp": "2025-12-28T12:00:00.500000",
"file_hash": "a3f9c8b2d1e4..."
}
YOLO26 Integration¶
YOLO26 (Real-Time Detection Transformer v2) performs object detection on camera images, identifying security-relevant objects with bounding boxes and confidence scores.
Source Files¶
/ai/yolo26/model.py- FastAPI inference server/backend/services/detector_client.py- HTTP client
What It Does¶
YOLO26 is a state-of-the-art transformer-based object detector that combines DETR's end-to-end detection approach with real-time inference speeds:
- Receives camera images via HTTP POST
- Preprocesses images (RGB conversion, normalization)
- Runs inference on GPU with PyTorch
- Post-processes outputs to extract bounding boxes
- Filters to security-relevant classes only
- Returns JSON with detections
API Format¶
Endpoint: POST http://localhost:8095/detect
Request (multipart/form-data):
Request (JSON with base64):
Response:
{
"detections": [
{
"class": "person",
"confidence": 0.95,
"bbox": {
"x": 100,
"y": 150,
"width": 200,
"height": 400
}
},
{
"class": "car",
"confidence": 0.87,
"bbox": {
"x": 500,
"y": 300,
"width": 350,
"height": 200
}
}
],
"inference_time_ms": 45.2,
"image_width": 1920,
"image_height": 1080
}
Confidence Scores and Thresholds¶
| Parameter | Default | Description |
|---|---|---|
YOLO26_CONFIDENCE | 0.5 | Server-side minimum confidence |
DETECTION_CONFIDENCE_THRESHOLD | 0.5 | Backend filtering threshold |
Detections below the threshold are discarded. Higher thresholds reduce false positives but may miss legitimate detections.
Bounding Box Format¶
Bounding boxes use top-left origin with width/height:
x: Top-left X coordinate (pixels from left edge)y: Top-left Y coordinate (pixels from top edge)width: Box width in pixelsheight: Box height in pixels
Security-Relevant Classes¶
Only these 9 COCO classes are returned (all others filtered):
| Class | Description |
|---|---|
person | Human detection |
car | Passenger vehicle |
truck | Large vehicle |
dog | Canine |
cat | Feline |
bird | Avian |
bicycle | Bike |
motorcycle | Motorbike |
bus | Public transport |
Performance Characteristics¶
| Metric | Value |
|---|---|
| Inference time | 30-50ms per image |
| VRAM usage | ~4GB |
| Throughput | ~20-30 images/second |
| Model warmup | ~1-2 seconds (3 iterations) |
| Batch processing | Sequential (one image at a time) |
Health Check¶
{
"status": "healthy",
"model_loaded": true,
"device": "cuda:0",
"cuda_available": true,
"model_name": "/export/ai_models/yolo26v2/yolo26_v2_r101vd",
"vram_used_gb": 4.2
}
Batching Logic¶

AI-generated visualization of the 90-second batch processing timeline showing detection accumulation and LLM analysis.
The BatchAggregator groups related detections into batches before sending them to the LLM for analysis. This provides better context and reduces noise.
Source Files¶
/backend/services/batch_aggregator.py- Batch management/backend/services/pipeline_workers.py- BatchTimeoutWorker
Why We Batch¶
-
Better LLM Context: A "person approaching door" event might span 30 seconds across 15 images. Analyzing them together gives the LLM full context.
-
Reduced Noise: Individual frame detections can be noisy. Batching smooths out false positives and provides a clearer picture.
-
Efficiency: One LLM call for 10 detections is more efficient than 10 separate calls.
-
Meaningful Events: Users want to see "Person at front door for 45 seconds" not "15 separate person detections."
Batch Lifecycle State Machine¶
State machine showing batch lifecycle from creation through collection to closure, with timeout triggers and Redis cleanup.
Diagram: Batch Lifecycle State Machine¶
stateDiagram-v2
[*] --> NoActiveBatch: No batch exists
NoActiveBatch --> BatchActive: First detection arrives
note right of BatchActive: Create batch with UUID<br/>Set started_at = now()<br/>Set last_activity = now()
BatchActive --> BatchActive: Detection added
note right of BatchActive: Append detection_id<br/>Update last_activity
BatchActive --> BatchClosed: Window timeout (90s from start)
BatchActive --> BatchClosed: Idle timeout (30s no activity)
BatchActive --> BatchClosed: Force close (API call)
BatchClosed --> [*]: Push to analysis_queue<br/>Cleanup Redis keys
state BatchActive {
[*] --> Collecting
Collecting --> Collecting: add_detection()
Collecting --> [*]: Timeout check triggers
} Timing Parameters¶
Redis Key Structure¶
All batch keys have a 1-hour TTL for orphan cleanup if service crashes:
batch:{camera_id}:current -> batch_id (string)
batch:{batch_id}:camera_id -> camera_id (string)
batch:{batch_id}:detections -> ["det_1", "det_2", ...] (JSON array)
batch:{batch_id}:started_at -> 1703764800.123 (Unix timestamp float)
batch:{batch_id}:last_activity -> 1703764845.456 (Unix timestamp float)
Fast-Path for Critical Detections¶
High-confidence detections of critical object types bypass normal batching for immediate analysis:
Decision flowchart showing how high-confidence critical detections bypass batching for immediate LLM analysis.
Diagram: Fast-Path Decision Flow¶
flowchart TD
A[Detection arrives] --> B{confidence >= 0.90?}
B -->|No| D[Normal batching]
B -->|Yes| C{object_type in fast_path_types?}
C -->|No| D
C -->|Yes| E[FAST PATH]
E --> F[Immediate LLM analysis]
F --> G[Create Event with is_fast_path=true]
G --> H[WebSocket broadcast]
D --> I[Add to batch]
I --> J[Wait for batch timeout]
J --> K[Normal analysis flow] Fast-Path Configuration:
| Parameter | Default | Environment Variable |
|---|---|---|
| Confidence threshold | 0.90 | FAST_PATH_CONFIDENCE_THRESHOLD |
| Object types | ["person"] | FAST_PATH_OBJECT_TYPES |
When Fast-Path Triggers:
- Detection confidence >= 0.90
- AND object_type is in the fast_path_types list
- Creates event with
is_fast_path=Trueflag - Returns
fast_path_{detection_id}as batch_id
Analysis Queue Output¶
When a batch closes, it's pushed to the analysis queue:
{
"batch_id": "a3f9c8b2d1e4f5g6h7i8j9k0",
"camera_id": "front_door",
"detection_ids": ["1", "2", "3", "4", "5"],
"timestamp": 1703764890.123
}
NVIDIA Nemotron Analysis¶
NVIDIA Nemotron (via llama.cpp server) is the LLM that analyzes detections and generates risk assessments with natural language explanations.
Model Options¶
| Deployment | Model | VRAM | Context |
|---|---|---|---|
| Production | NVIDIA Nemotron-3-Nano-30B-A3B | ~14.7 GB | 131,072 |
| Development | Nemotron Mini 4B Instruct | ~3 GB | 4,096 |
See docker-compose.prod.yml and docs/reference/config/env-reference.md for deployment configuration.
Source Files¶
/ai/nemotron/AGENTS.md- Comprehensive model documentation/ai/nemotron/- Model files and configuration/backend/services/nemotron_analyzer.py- Analysis service/backend/services/prompts.py- Prompt templates (5 tiers)
What It Does¶
- Receives batch of detections with context (camera name, time window)
- Enriches context with zone analysis, baselines, cross-camera data (when available)
- Formats a ChatML-structured prompt with detection details
- Generates a risk assessment as JSON
- Strips
<think>...</think>reasoning blocks and validates response - Creates an Event record in the database
- Broadcasts via WebSocket for real-time updates
Prompt Structure (ChatML Format)¶
NVIDIA Nemotron uses ChatML format for message structuring:
<|im_start|>system
{system message}
<|im_end|>
<|im_start|>user
{user message with detection context}
<|im_end|>
<|im_start|>assistant
{model response begins here}
The backend uses 5 prompt templates with increasing sophistication (see /backend/services/prompts.py):
| Template | When Used |
|---|---|
basic | Fallback when no enrichment available |
enriched | Zone/baseline/cross-camera context |
full_enriched | Enriched + license plates/faces |
vision | Florence-2 extraction + context enrichment |
model_zoo | Full model zoo (violence, weather, etc.) |
Context Provided to LLM¶
The formatted prompt includes (depending on template):
Basic Context:
- Camera Name: Human-readable camera identifier (e.g., "Front Door")
- Time Window: ISO format timestamps for batch start and end
- Detection List: Timestamps, object types, confidence scores
Enriched Context Additions: 4. Zone Analysis: Entry points, high-security areas 5. Baseline Comparison: Expected vs. actual activity patterns 6. Deviation Score: Statistical anomaly measure (0=normal, 1=unusual) 7. Cross-Camera Activity: Correlated detections across cameras
Model Zoo Context Additions: 8. Violence Detection: ViT violence classifier alerts 9. Weather/Visibility: Environmental conditions 10. Clothing Analysis: FashionCLIP + SegFormer (suspicious attire, face coverings) 11. Vehicle/Pet Classification: Type identification, false positive filtering
LLM API Call¶
Endpoint: POST http://localhost:8091/completion
Request:
{
"prompt": "<ChatML formatted prompt>",
"temperature": 0.7,
"top_p": 0.95,
"max_tokens": 1536,
"stop": ["<|im_end|>", "<|im_start|>"]
}
Response:
{
"content": "<think>Analyzing detection patterns...</think>{\"risk_score\": 65, \"risk_level\": \"high\", ...}",
"model": "Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf",
"tokens_predicted": 287,
"tokens_evaluated": 1245
}
Note: NVIDIA Nemotron-3-Nano outputs <think>...</think> reasoning blocks before the JSON response. The backend strips these before parsing.
Output Format¶
The LLM produces JSON with these fields:
{
"risk_score": 65,
"risk_level": "high",
"summary": "Unknown person detected approaching front door at night",
"reasoning": "Single person detection at 2:15 AM is unusual. The person appeared to be approaching the entrance. Time of day and approach pattern warrant elevated concern.",
"recommended_action": "Review camera footage and verify identity"
}
Performance Characteristics¶
| Metric | Production (30B) | Development (4B) |
|---|---|---|
| Inference time | 2-5 seconds per batch | 1-3 seconds per batch |
| Token generation | ~50-100 tokens/second | ~100-200 tokens/second |
| Context processing | ~1000 tokens/second | ~2000 tokens/second |
| Concurrent requests | 1-2 (configured) | 2-4 (configurable) |
| VRAM usage | ~14.7 GB | ~3 GB |
| Context window | 131,072 tokens (128K) | 4,096 tokens |
The production 30B model's 128K context enables analyzing hours of detection history in a single prompt.
Risk Score Calculation¶
The risk score is determined entirely by the LLM based on the prompt guidelines. The backend validates and normalizes the output.
Risk Level Mapping¶
See Risk Levels Reference for the canonical definition.
Validation and Normalization¶
The _validate_risk_data() method ensures valid output:
def _validate_risk_data(self, data: dict) -> dict:
# Validate risk_score (0-100, integer)
risk_score = data.get("risk_score", 50)
risk_score = max(0, min(100, int(risk_score)))
# Validate risk_level
valid_levels = ["low", "medium", "high", "critical"]
risk_level = str(data.get("risk_level", "medium")).lower()
if risk_level not in valid_levels:
# Infer from risk_score
if risk_score <= 25:
risk_level = "low"
elif risk_score <= 50:
risk_level = "medium"
elif risk_score <= 75:
risk_level = "high"
else:
risk_level = "critical"
return {
"risk_score": risk_score,
"risk_level": risk_level,
"summary": data.get("summary", "Risk analysis completed"),
"reasoning": data.get("reasoning", "No detailed reasoning provided"),
}
JSON Extraction from LLM Output¶
LLM output may contain extra text. The analyzer uses regex to extract JSON:
json_pattern = r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}"
matches = re.findall(json_pattern, text, re.DOTALL)
Error Handling¶

Event feedback loop showing how user reviews and dismissed events inform future risk assessments.
The pipeline is designed for graceful degradation - failures at any stage should not crash the system.
Detection Errors Diagram¶
Flowchart showing graceful error handling through validation stages, with all errors returning empty arrays.
Diagram: Detection Error Handling Flow¶
flowchart TD
A[Detection Request] --> B{File exists?}
B -->|No| C[Log error, return []]
B -->|Yes| D{YOLO26 reachable?}
D -->|No| E[Log connection error, return []]
D -->|Yes| F{Request timeout?}
F -->|Yes| G[Log timeout, return []]
F -->|No| H{HTTP 200?}
H -->|No| I[Log HTTP error, return []]
H -->|Yes| J{Valid JSON?}
J -->|No| K[Log parse error, return []]
J -->|Yes| L[Process detections]
L --> M{Above confidence threshold?}
M -->|No| N[Filter out, continue]
M -->|Yes| O[Create Detection record] Error Scenarios and Responses¶
| Error | Stage | Response | Recovery |
|---|---|---|---|
| File not found | DetectorClient | Return [], log error | Skip this image |
| YOLO26 unreachable | DetectorClient | Return [], log error | Retry next image |
| YOLO26 timeout (30s) | DetectorClient | Return [], log error | Skip this image |
| HTTP error (non-200) | DetectorClient | Return [], log error | Skip this image |
| Invalid JSON response | DetectorClient | Return [], log error | Skip this image |
| Redis unavailable | FileWatcher/Dedupe | Fail open (process anyway) | Service recovers |
| Batch not found | NemotronAnalyzer | Raise ValueError | Log warning, skip batch |
| Nemotron unreachable | NemotronAnalyzer | Use fallback risk data | Event still created |
| Nemotron timeout (60s) | NemotronAnalyzer | Use fallback risk data | Event still created |
| Invalid LLM JSON | NemotronAnalyzer | Use fallback risk data | Event still created |
Fallback Risk Data¶
When LLM analysis fails, the analyzer creates an event with default values:
{
"risk_score": 50,
"risk_level": "medium",
"summary": "Analysis unavailable - LLM service error",
"reasoning": "Failed to analyze detections due to service error"
}
Dead Letter Queue (DLQ)¶
Failed jobs can be moved to a DLQ for later inspection:
DLQ job format:
{
"original_job": {...},
"error": "error message",
"attempt_count": 3,
"first_failed_at": "2025-12-28T10:30:00",
"last_failed_at": "2025-12-28T10:30:30",
"queue_name": "detection_queue"
}
Data Models¶
Detection Model¶
Database table: detections
| Field | Type | Description |
|---|---|---|
id | INTEGER | Primary key, auto-increment |
camera_id | STRING | Foreign key to cameras |
file_path | STRING | Path to source image |
file_type | STRING | MIME type (e.g., "image/jpeg") |
detected_at | DATETIME | Detection timestamp |
object_type | STRING | Detected class (person, car, etc.) |
confidence | FLOAT | Detection confidence (0.0-1.0) |
bbox_x | INTEGER | Bounding box X coordinate |
bbox_y | INTEGER | Bounding box Y coordinate |
bbox_width | INTEGER | Bounding box width |
bbox_height | INTEGER | Bounding box height |
thumbnail_path | STRING | Path to thumbnail with bbox overlay |
Event Model¶
Database table: events
| Field | Type | Description |
|---|---|---|
id | INTEGER | Primary key, auto-increment |
batch_id | STRING | Batch identifier |
camera_id | STRING | Foreign key to cameras |
started_at | DATETIME | Batch start time |
ended_at | DATETIME | Batch end time |
risk_score | INTEGER | Risk score (0-100) |
risk_level | STRING | Risk level (low/medium/high/critical) |
summary | TEXT | LLM-generated summary |
reasoning | TEXT | LLM-generated reasoning |
detection_ids | TEXT | JSON array of detection IDs |
reviewed | BOOLEAN | Whether event was reviewed by user |
notes | TEXT | User-added notes |
is_fast_path | BOOLEAN | Whether event was processed via fast path |
Detection to Event Transformation¶
Visualization showing how multiple raw detections aggregate into batches, which transform into risk-scored events through LLM analysis.
Diagram: Detection to Event Transformation¶
flowchart TB
subgraph "Raw Detections"
D1[Detection 1<br/>person, 0.95]
D2[Detection 2<br/>person, 0.92]
D3[Detection 3<br/>car, 0.87]
D4[Detection 4<br/>person, 0.89]
D5[Detection 5<br/>person, 0.91]
end
subgraph "Batch"
B[Batch<br/>batch_id: abc123<br/>camera_id: front_door<br/>started_at: 14:30:00<br/>ended_at: 14:31:30<br/>detection_ids: [1,2,3,4,5]]
end
subgraph "Event"
E[Event<br/>id: 42<br/>batch_id: abc123<br/>risk_score: 65<br/>risk_level: high<br/>summary: Multiple persons...<br/>reasoning: Activity pattern...]
end
D1 --> B
D2 --> B
D3 --> B
D4 --> B
D5 --> B
B --> E Configuration Reference¶
Environment Variables¶
| Variable | Default | Description |
|---|---|---|
FOSCAM_BASE_PATH | /export/foscam | Camera FTP upload directory |
YOLO26_URL | http://localhost:8095 | YOLO26 service URL |
NEMOTRON_URL | http://localhost:8091 | Nemotron LLM service URL |
FLORENCE_URL | http://localhost:8092 | Florence-2 vision-language service URL |
CLIP_URL | http://localhost:8093 | CLIP embedding service URL |
ENRICHMENT_URL | http://localhost:8094 | Enrichment service URL |
DETECTION_CONFIDENCE_THRESHOLD | 0.5 | Minimum confidence to store detection |
BATCH_WINDOW_SECONDS | 90 | Maximum batch duration |
BATCH_IDLE_TIMEOUT_SECONDS | 30 | Idle timeout before closing batch |
FAST_PATH_CONFIDENCE_THRESHOLD | 0.90 | Confidence threshold for fast path |
FAST_PATH_OBJECT_TYPES | ["person"] | Object types eligible for fast path |
DEDUPE_TTL_SECONDS | 300 | File hash deduplication TTL |
Enrichment Feature Toggles¶
| Variable | Default | Description |
|---|---|---|
VISION_EXTRACTION_ENABLED | true | Enable Florence-2 based extraction |
REID_ENABLED | true | Enable CLIP-based re-identification |
SCENE_CHANGE_ENABLED | true | Enable scene change detection |
AI Service Ports¶
| Service | Port | Protocol | Description |
|---|---|---|---|
| YOLO26 | 8095 | HTTP | Object detection |
| NVIDIA Nemotron | 8091 | HTTP | LLM risk analysis |
| Florence | 8092 | HTTP | Vision-language |
| CLIP | 8093 | HTTP | Embeddings / re-ID |
| Enrichment | 8094 | HTTP | Enrichment helpers |
VRAM Requirements¶
VRAM varies by model choice and enabled enrichment. For "minimal dev" vs "full prod" guidance, see docs/operator/ai-installation.md and ai/AGENTS.md.
AI Service Interaction Diagram¶
Host machine architecture showing GPU-accelerated AI services (YOLO26, Nemotron), Docker containers (Backend, Frontend, Redis), and data flow from cameras through the processing pipeline.
*Production uses NVIDIA Nemotron-3-Nano-30B-A3B (~14.7GB); development uses Nemotron Mini 4B (~3GB).
Diagram: AI Service Interaction¶
flowchart TB
subgraph "Host Machine"
subgraph "GPU (RTX A5500 24GB)"
RT[YOLO26 Server<br/>Port 8095<br/>~4GB VRAM]
NEM[NVIDIA Nemotron llama.cpp<br/>Port 8091<br/>~14.7GB VRAM*]
end
subgraph "Docker Containers"
BE[Backend FastAPI<br/>Port 8000]
FE[Frontend React<br/>Port 5173]
RD[Redis<br/>Port 6379]
end
end
CAM[Foscam Cameras] -->|FTP Upload| FTP[/export/foscam/]
FTP -->|inotify| BE
BE -->|POST /detect<br/>multipart image| RT
RT -->|JSON detections| BE
BE -->|POST /completion<br/>ChatML prompt| NEM
NEM -->|JSON risk assessment| BE
BE <-->|Queue & Cache| RD
BE -->|WebSocket| FE
style RT fill:#76B900
style NEM fill:#76B900