AI Pipeline Architecture¶

AI Pipeline Architecture Diagram

AI-generated diagram illustrating the end-to-end AI pipeline architecture with detection, batching, and analysis components.

AI Pipeline Hero

AI-generated visualization of the AI detection pipeline from camera input through YOLO26 and Nemotron to security events.

AI Pipeline Flow

End-to-end AI pipeline showing FileWatcher, queues, YOLO26, batch aggregation, Nemotron analysis, and event creation.

This document provides comprehensive technical documentation for the AI-powered detection and analysis pipeline in the Home Security Intelligence system. It is intended for maintainers who need to debug, extend, or optimize the AI processing flow.

Pipeline Overview¶

The AI pipeline transforms raw camera images into risk-scored security events through a multi-stage process. The pipeline is designed for efficiency (batching similar detections) and responsiveness (fast-path for critical detections).

High-Level Flow¶

The pipeline processes images through two paths:

Normal Path: Detections are batched over 30-90 second windows before LLM analysis
Fast Path: High-confidence (≥90%) critical detections bypass batching for immediate analysis

Complete Pipeline Sequence Diagram¶

AI Pipeline Sequence Diagram

End-to-end sequence showing FTP upload, FileWatcher processing, YOLO26 detection, batch aggregation with fast-path logic, Nemotron LLM analysis, and WebSocket broadcast to dashboard.

Diagram: Pipeline Sequence¶

sequenceDiagram
    participant Camera as Foscam Camera
    participant FTP as FTP Server
    participant FW as FileWatcher
    participant DQ as detection_queue
    participant DW as DetectionQueueWorker
    participant RT as YOLO26 (8095)
    participant DB as PostgreSQL
    participant BA as BatchAggregator
    participant AQ as analysis_queue
    participant AW as AnalysisQueueWorker
    participant NEM as Nemotron LLM (8091)
    participant WS as WebSocket

    Note over Camera,WS: Normal Path (batched)
    Camera->>FTP: Upload image via FTP
    FTP->>FW: inotify/FSEvents trigger
    FW->>FW: Debounce (0.5s)
    FW->>FW: Validate image integrity
    FW->>DQ: Queue {camera_id, file_path}

    DW->>DQ: BLPOP (5s timeout)
    DQ-->>DW: Detection job

    DW->>RT: POST /detect (multipart image)
    Note right of RT: ~30-50ms inference
    RT-->>DW: JSON {detections, inference_time_ms}
    DW->>DB: INSERT Detection records

    Note over DW,NEM: Enrichment (best-effort)
    DW->>BA: add_detection(camera_id, detection_id, confidence, object_type)

    alt Fast Path (confidence >= 0.90 AND object_type in fast_path_types)
        BA->>NEM: Immediate analysis (may include enrichment/context)
        NEM-->>BA: Risk assessment JSON
        BA->>DB: INSERT Event (is_fast_path=true)
        BA->>WS: Broadcast event
    else Normal Batching
        BA->>BA: Add to batch, update last_activity
    end

    Note over BA,AQ: Batch Timeout Check (every 10s)
    BA->>BA: check_batch_timeouts()
    alt Window expired (90s) OR Idle timeout (30s)
        BA->>AQ: Push {batch_id, camera_id, detection_ids}
        BA->>BA: Cleanup Redis keys
    end

    AW->>AQ: BLPOP (5s timeout)
    AQ-->>AW: Analysis job
    AW->>DB: SELECT Detection details
    AW->>AW: Load detections + run ContextEnricher + EnrichmentPipeline (best-effort)
    AW->>NEM: POST /completion (prompt with enrichment)
    Note right of NEM: ~2-5s inference
    NEM-->>AW: JSON {risk_score, risk_level, summary, reasoning}
    AW->>DB: INSERT Event record
    AW->>WS: Broadcast new event

Pipeline Timing Characteristics¶

Stage	Typical Duration	Notes
File upload detection	~10ms	Native filesystem notifications (inotify/FSEvents)
Debounce delay	500ms	Configurable, prevents duplicate processing
Image validation	~5-10ms	PIL verify()
YOLO26 inference	30-50ms	GPU accelerated, RTX A5500
Database write (detections)	~5-10ms	PostgreSQL async
Batch aggregation	~1ms	Redis operations
Batch window	30-90s	Collects related detections
Nemotron LLM inference	2-5s	GPU accelerated
Event creation + broadcast	~10ms	Database + WebSocket

Total latency:

Fast path (critical detections): ~3-6 seconds
Normal path: 30-95 seconds (dominated by batch window)

File Watcher¶

The FileWatcher service monitors camera upload directories for new images and initiates the processing pipeline.

Source Files¶

/backend/services/file_watcher.py - Main watcher implementation
/backend/services/dedupe.py - Duplicate file prevention

How It Works¶

Filesystem Monitoring: Uses watchdog library with native OS backends:
Linux: inotify (kernel-level notifications)
macOS: FSEvents
Windows: ReadDirectoryChangesW
Event Handling: Monitors on_created and on_modified events for image files (.jpg, .jpeg, .png)
Debounce Logic: Waits 0.5 seconds after the last modification before processing. This handles:
FTP uploads that write in chunks
Multiple watchdog events for same file
Incomplete file writes
Thread-Safe Async Bridge: Watchdog runs in a separate thread, so events are scheduled to the asyncio event loop via asyncio.run_coroutine_threadsafe()

Directory Structure¶

/export/foscam/ # FOSCAM_BASE_PATH
front_door/ # Camera ID = "front_door"
MDAlarm_20251228_120000.jpg
MDAlarm_20251228_120030.jpg
backyard/ # Camera ID = "backyard"
snap_20251228_120015.jpg

Deduplication¶

Files are deduplicated using SHA256 content hashes stored in Redis:

Redis Key: dedupe:{sha256_hash}
Value: file_path
TTL: 300 seconds (5 minutes, configurable)

This prevents duplicate processing from:

Watchdog create/modify event bursts
Service restarts during file processing
FTP upload retries

Queue Output Format¶

{
  "camera_id": "front_door",
  "file_path": "/export/foscam/front_door/MDAlarm_20251228_120000.jpg",
  "timestamp": "2025-12-28T12:00:00.500000",
  "file_hash": "a3f9c8b2d1e4..."
}

YOLO26 Integration¶

YOLO26 (Real-Time Detection Transformer v2) performs object detection on camera images, identifying security-relevant objects with bounding boxes and confidence scores.

Source Files¶

/ai/yolo26/model.py - FastAPI inference server
/backend/services/detector_client.py - HTTP client

What It Does¶

YOLO26 is a state-of-the-art transformer-based object detector that combines DETR's end-to-end detection approach with real-time inference speeds:

Receives camera images via HTTP POST
Preprocesses images (RGB conversion, normalization)
Runs inference on GPU with PyTorch
Post-processes outputs to extract bounding boxes
Filters to security-relevant classes only
Returns JSON with detections

API Format¶

Endpoint: POST http://localhost:8095/detect

Request (multipart/form-data):

file: <image binary>

Request (JSON with base64):

{
  "image_base64": "<base64 encoded image>"
}

Response:

{
  "detections": [
    {
      "class": "person",
      "confidence": 0.95,
      "bbox": {
        "x": 100,
        "y": 150,
        "width": 200,
        "height": 400
      }
    },
    {
      "class": "car",
      "confidence": 0.87,
      "bbox": {
        "x": 500,
        "y": 300,
        "width": 350,
        "height": 200
      }
    }
  ],
  "inference_time_ms": 45.2,
  "image_width": 1920,
  "image_height": 1080
}

Confidence Scores and Thresholds¶

Parameter	Default	Description
`YOLO26_CONFIDENCE`	0.5	Server-side minimum confidence
`DETECTION_CONFIDENCE_THRESHOLD`	0.5	Backend filtering threshold

Detections below the threshold are discarded. Higher thresholds reduce false positives but may miss legitimate detections.

Bounding Box Format¶

Bounding boxes use top-left origin with width/height:

(x, y) --------+
   |           |
   |  Object   | height
   |           |
   +-----------+
      width

x: Top-left X coordinate (pixels from left edge)
y: Top-left Y coordinate (pixels from top edge)
width: Box width in pixels
height: Box height in pixels

Security-Relevant Classes¶

Only these 9 COCO classes are returned (all others filtered):

Class	Description
`person`	Human detection
`car`	Passenger vehicle
`truck`	Large vehicle
`dog`	Canine
`cat`	Feline
`bird`	Avian
`bicycle`	Bike
`motorcycle`	Motorbike
`bus`	Public transport

Performance Characteristics¶

Metric	Value
Inference time	30-50ms per image
VRAM usage	~4GB
Throughput	~20-30 images/second
Model warmup	~1-2 seconds (3 iterations)
Batch processing	Sequential (one image at a time)

Health Check¶

curl http://localhost:8095/health

{
  "status": "healthy",
  "model_loaded": true,
  "device": "cuda:0",
  "cuda_available": true,
  "model_name": "/export/ai_models/yolo26v2/yolo26_v2_r101vd",
  "vram_used_gb": 4.2
}

Batching Logic¶

Batch Processing

AI-generated visualization of the 90-second batch processing timeline showing detection accumulation and LLM analysis.

The BatchAggregator groups related detections into batches before sending them to the LLM for analysis. This provides better context and reduces noise.

Source Files¶

/backend/services/batch_aggregator.py - Batch management
/backend/services/pipeline_workers.py - BatchTimeoutWorker

Why We Batch¶

Better LLM Context: A "person approaching door" event might span 30 seconds across 15 images. Analyzing them together gives the LLM full context.
Reduced Noise: Individual frame detections can be noisy. Batching smooths out false positives and provides a clearer picture.
Efficiency: One LLM call for 10 detections is more efficient than 10 separate calls.
Meaningful Events: Users want to see "Person at front door for 45 seconds" not "15 separate person detections."

Batch Lifecycle State Machine¶

Batch Lifecycle State Machine

State machine showing batch lifecycle from creation through collection to closure, with timeout triggers and Redis cleanup.

Diagram: Batch Lifecycle State Machine¶

stateDiagram-v2
    [*] --> NoActiveBatch: No batch exists

    NoActiveBatch --> BatchActive: First detection arrives
    note right of BatchActive: Create batch with UUID<br/>Set started_at = now()<br/>Set last_activity = now()

    BatchActive --> BatchActive: Detection added
    note right of BatchActive: Append detection_id<br/>Update last_activity

    BatchActive --> BatchClosed: Window timeout (90s from start)
    BatchActive --> BatchClosed: Idle timeout (30s no activity)
    BatchActive --> BatchClosed: Force close (API call)

    BatchClosed --> [*]: Push to analysis_queue<br/>Cleanup Redis keys

    state BatchActive {
        [*] --> Collecting
        Collecting --> Collecting: add_detection()
        Collecting --> [*]: Timeout check triggers
    }

Timing Parameters¶

Redis Key Structure¶

All batch keys have a 1-hour TTL for orphan cleanup if service crashes:

batch:{camera_id}:current -> batch_id (string)
batch:{batch_id}:camera_id -> camera_id (string)
batch:{batch_id}:detections -> ["det_1", "det_2", ...] (JSON array)
batch:{batch_id}:started_at -> 1703764800.123 (Unix timestamp float)
batch:{batch_id}:last_activity -> 1703764845.456 (Unix timestamp float)

Fast-Path for Critical Detections¶

High-confidence detections of critical object types bypass normal batching for immediate analysis:

Fast-Path Decision Flow

Decision flowchart showing how high-confidence critical detections bypass batching for immediate LLM analysis.

Diagram: Fast-Path Decision Flow¶

flowchart TD
    A[Detection arrives] --> B{confidence >= 0.90?}
    B -->|No| D[Normal batching]
    B -->|Yes| C{object_type in fast_path_types?}
    C -->|No| D
    C -->|Yes| E[FAST PATH]
    E --> F[Immediate LLM analysis]
    F --> G[Create Event with is_fast_path=true]
    G --> H[WebSocket broadcast]

    D --> I[Add to batch]
    I --> J[Wait for batch timeout]
    J --> K[Normal analysis flow]

Fast-Path Configuration:

Parameter	Default	Environment Variable
Confidence threshold	0.90	`FAST_PATH_CONFIDENCE_THRESHOLD`
Object types	`["person"]`	`FAST_PATH_OBJECT_TYPES`

When Fast-Path Triggers:

Detection confidence >= 0.90
AND object_type is in the fast_path_types list
Creates event with is_fast_path=True flag
Returns fast_path_{detection_id} as batch_id

Analysis Queue Output¶

When a batch closes, it's pushed to the analysis queue:

{
  "batch_id": "a3f9c8b2d1e4f5g6h7i8j9k0",
  "camera_id": "front_door",
  "detection_ids": ["1", "2", "3", "4", "5"],
  "timestamp": 1703764890.123
}

NVIDIA Nemotron Analysis¶

NVIDIA Nemotron (via llama.cpp server) is the LLM that analyzes detections and generates risk assessments with natural language explanations.

Model Options¶

Deployment	Model	VRAM	Context
Production	NVIDIA Nemotron-3-Nano-30B-A3B	~14.7 GB	131,072
Development	Nemotron Mini 4B Instruct	~3 GB	4,096

See docker-compose.prod.yml and docs/reference/config/env-reference.md for deployment configuration.

Source Files¶

/ai/nemotron/AGENTS.md - Comprehensive model documentation
/ai/nemotron/ - Model files and configuration
/backend/services/nemotron_analyzer.py - Analysis service
/backend/services/prompts.py - Prompt templates (5 tiers)

What It Does¶

Receives batch of detections with context (camera name, time window)
Enriches context with zone analysis, baselines, cross-camera data (when available)
Formats a ChatML-structured prompt with detection details
Generates a risk assessment as JSON
Strips <think>...</think> reasoning blocks and validates response
Creates an Event record in the database
Broadcasts via WebSocket for real-time updates

Prompt Structure (ChatML Format)¶

NVIDIA Nemotron uses ChatML format for message structuring:

<|im_start|>system
{system message}
<|im_end|>
<|im_start|>user
{user message with detection context}
<|im_end|>
<|im_start|>assistant
{model response begins here}

The backend uses 5 prompt templates with increasing sophistication (see /backend/services/prompts.py):

Template	When Used
`basic`	Fallback when no enrichment available
`enriched`	Zone/baseline/cross-camera context
`full_enriched`	Enriched + license plates/faces
`vision`	Florence-2 extraction + context enrichment
`model_zoo`	Full model zoo (violence, weather, etc.)

Context Provided to LLM¶

The formatted prompt includes (depending on template):

Basic Context:

Camera Name: Human-readable camera identifier (e.g., "Front Door")
Time Window: ISO format timestamps for batch start and end
Detection List: Timestamps, object types, confidence scores

Enriched Context Additions: 4. Zone Analysis: Entry points, high-security areas 5. Baseline Comparison: Expected vs. actual activity patterns 6. Deviation Score: Statistical anomaly measure (0=normal, 1=unusual) 7. Cross-Camera Activity: Correlated detections across cameras

Model Zoo Context Additions: 8. Violence Detection: ViT violence classifier alerts 9. Weather/Visibility: Environmental conditions 10. Clothing Analysis: FashionCLIP + SegFormer (suspicious attire, face coverings) 11. Vehicle/Pet Classification: Type identification, false positive filtering

LLM API Call¶

Endpoint: POST http://localhost:8091/completion

Request:

{
  "prompt": "<ChatML formatted prompt>",
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 1536,
  "stop": ["<|im_end|>", "<|im_start|>"]
}

Response:

{
  "content": "<think>Analyzing detection patterns...</think>{\"risk_score\": 65, \"risk_level\": \"high\", ...}",
  "model": "Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf",
  "tokens_predicted": 287,
  "tokens_evaluated": 1245
}

Note: NVIDIA Nemotron-3-Nano outputs <think>...</think> reasoning blocks before the JSON response. The backend strips these before parsing.

Output Format¶

The LLM produces JSON with these fields:

{
  "risk_score": 65,
  "risk_level": "high",
  "summary": "Unknown person detected approaching front door at night",
  "reasoning": "Single person detection at 2:15 AM is unusual. The person appeared to be approaching the entrance. Time of day and approach pattern warrant elevated concern.",
  "recommended_action": "Review camera footage and verify identity"
}

Performance Characteristics¶

Metric	Production (30B)	Development (4B)
Inference time	2-5 seconds per batch	1-3 seconds per batch
Token generation	~50-100 tokens/second	~100-200 tokens/second
Context processing	~1000 tokens/second	~2000 tokens/second
Concurrent requests	1-2 (configured)	2-4 (configurable)
VRAM usage	~14.7 GB	~3 GB
Context window	131,072 tokens (128K)	4,096 tokens

The production 30B model's 128K context enables analyzing hours of detection history in a single prompt.

Risk Score Calculation¶

The risk score is determined entirely by the LLM based on the prompt guidelines. The backend validates and normalizes the output.

Risk Level Mapping¶

See Risk Levels Reference for the canonical definition.

Validation and Normalization¶

The _validate_risk_data() method ensures valid output:

def _validate_risk_data(self, data: dict) -> dict:
    # Validate risk_score (0-100, integer)
    risk_score = data.get("risk_score", 50)
    risk_score = max(0, min(100, int(risk_score)))

    # Validate risk_level
    valid_levels = ["low", "medium", "high", "critical"]
    risk_level = str(data.get("risk_level", "medium")).lower()

    if risk_level not in valid_levels:
        # Infer from risk_score
        if risk_score <= 25:
            risk_level = "low"
        elif risk_score <= 50:
            risk_level = "medium"
        elif risk_score <= 75:
            risk_level = "high"
        else:
            risk_level = "critical"

    return {
        "risk_score": risk_score,
        "risk_level": risk_level,
        "summary": data.get("summary", "Risk analysis completed"),
        "reasoning": data.get("reasoning", "No detailed reasoning provided"),
    }

JSON Extraction from LLM Output¶

LLM output may contain extra text. The analyzer uses regex to extract JSON:

json_pattern = r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}"
matches = re.findall(json_pattern, text, re.DOTALL)

Error Handling¶

Event Feedback Loop

Event feedback loop showing how user reviews and dismissed events inform future risk assessments.

The pipeline is designed for graceful degradation - failures at any stage should not crash the system.

Detection Errors Diagram¶

Detection Error Handling Flow

Flowchart showing graceful error handling through validation stages, with all errors returning empty arrays.

Diagram: Detection Error Handling Flow¶

flowchart TD
    A[Detection Request] --> B{File exists?}

    B -->|No| C[Log error, return []]
    B -->|Yes| D{YOLO26 reachable?}
    D -->|No| E[Log connection error, return []]
    D -->|Yes| F{Request timeout?}
    F -->|Yes| G[Log timeout, return []]
    F -->|No| H{HTTP 200?}
    H -->|No| I[Log HTTP error, return []]
    H -->|Yes| J{Valid JSON?}
    J -->|No| K[Log parse error, return []]
    J -->|Yes| L[Process detections]
    L --> M{Above confidence threshold?}
    M -->|No| N[Filter out, continue]
    M -->|Yes| O[Create Detection record]

Error Scenarios and Responses¶

Error	Stage	Response	Recovery
File not found	DetectorClient	Return `[]`, log error	Skip this image
YOLO26 unreachable	DetectorClient	Return `[]`, log error	Retry next image
YOLO26 timeout (30s)	DetectorClient	Return `[]`, log error	Skip this image
HTTP error (non-200)	DetectorClient	Return `[]`, log error	Skip this image
Invalid JSON response	DetectorClient	Return `[]`, log error	Skip this image
Redis unavailable	FileWatcher/Dedupe	Fail open (process anyway)	Service recovers
Batch not found	NemotronAnalyzer	Raise `ValueError`	Log warning, skip batch
Nemotron unreachable	NemotronAnalyzer	Use fallback risk data	Event still created
Nemotron timeout (60s)	NemotronAnalyzer	Use fallback risk data	Event still created
Invalid LLM JSON	NemotronAnalyzer	Use fallback risk data	Event still created

Fallback Risk Data¶

When LLM analysis fails, the analyzer creates an event with default values:

{
    "risk_score": 50,
    "risk_level": "medium",
    "summary": "Analysis unavailable - LLM service error",
    "reasoning": "Failed to analyze detections due to service error"
}

Dead Letter Queue (DLQ)¶

Failed jobs can be moved to a DLQ for later inspection:

dlq:detection_queue  - Failed detection jobs
dlq:analysis_queue   - Failed LLM analysis jobs

DLQ job format:

{
    "original_job": {...},
    "error": "error message",
    "attempt_count": 3,
    "first_failed_at": "2025-12-28T10:30:00",
    "last_failed_at": "2025-12-28T10:30:30",
    "queue_name": "detection_queue"
}

Data Models¶

Detection Model¶

Database table: detections

Field	Type	Description
`id`	INTEGER	Primary key, auto-increment
`camera_id`	STRING	Foreign key to cameras
`file_path`	STRING	Path to source image
`file_type`	STRING	MIME type (e.g., "image/jpeg")
`detected_at`	DATETIME	Detection timestamp
`object_type`	STRING	Detected class (person, car, etc.)
`confidence`	FLOAT	Detection confidence (0.0-1.0)
`bbox_x`	INTEGER	Bounding box X coordinate
`bbox_y`	INTEGER	Bounding box Y coordinate
`bbox_width`	INTEGER	Bounding box width
`bbox_height`	INTEGER	Bounding box height
`thumbnail_path`	STRING	Path to thumbnail with bbox overlay

Event Model¶

Database table: events

Field	Type	Description
`id`	INTEGER	Primary key, auto-increment
`batch_id`	STRING	Batch identifier
`camera_id`	STRING	Foreign key to cameras
`started_at`	DATETIME	Batch start time
`ended_at`	DATETIME	Batch end time
`risk_score`	INTEGER	Risk score (0-100)
`risk_level`	STRING	Risk level (low/medium/high/critical)
`summary`	TEXT	LLM-generated summary
`reasoning`	TEXT	LLM-generated reasoning
`detection_ids`	TEXT	JSON array of detection IDs
`reviewed`	BOOLEAN	Whether event was reviewed by user
`notes`	TEXT	User-added notes
`is_fast_path`	BOOLEAN	Whether event was processed via fast path

Detection to Event Transformation¶

Detection to Event Transformation

Visualization showing how multiple raw detections aggregate into batches, which transform into risk-scored events through LLM analysis.

Diagram: Detection to Event Transformation¶

flowchart TB
    subgraph "Raw Detections"
        D1[Detection 1<br/>person, 0.95]
        D2[Detection 2<br/>person, 0.92]
        D3[Detection 3<br/>car, 0.87]
        D4[Detection 4<br/>person, 0.89]
        D5[Detection 5<br/>person, 0.91]
    end

    subgraph "Batch"
        B[Batch<br/>batch_id: abc123<br/>camera_id: front_door<br/>started_at: 14:30:00<br/>ended_at: 14:31:30<br/>detection_ids: [1,2,3,4,5]]
    end

    subgraph "Event"
        E[Event<br/>id: 42<br/>batch_id: abc123<br/>risk_score: 65<br/>risk_level: high<br/>summary: Multiple persons...<br/>reasoning: Activity pattern...]
    end

    D1 --> B

    D2 --> B
    D3 --> B
    D4 --> B
    D5 --> B
    B --> E

Configuration Reference¶

Environment Variables¶

Variable	Default	Description
`FOSCAM_BASE_PATH`	`/export/foscam`	Camera FTP upload directory
`YOLO26_URL`	`http://localhost:8095`	YOLO26 service URL
`NEMOTRON_URL`	`http://localhost:8091`	Nemotron LLM service URL
`FLORENCE_URL`	`http://localhost:8092`	Florence-2 vision-language service URL
`CLIP_URL`	`http://localhost:8093`	CLIP embedding service URL
`ENRICHMENT_URL`	`http://localhost:8094`	Enrichment service URL
`DETECTION_CONFIDENCE_THRESHOLD`	`0.5`	Minimum confidence to store detection
`BATCH_WINDOW_SECONDS`	`90`	Maximum batch duration
`BATCH_IDLE_TIMEOUT_SECONDS`	`30`	Idle timeout before closing batch
`FAST_PATH_CONFIDENCE_THRESHOLD`	`0.90`	Confidence threshold for fast path
`FAST_PATH_OBJECT_TYPES`	`["person"]`	Object types eligible for fast path
`DEDUPE_TTL_SECONDS`	`300`	File hash deduplication TTL

Enrichment Feature Toggles¶

Variable	Default	Description
`VISION_EXTRACTION_ENABLED`	`true`	Enable Florence-2 based extraction
`REID_ENABLED`	`true`	Enable CLIP-based re-identification
`SCENE_CHANGE_ENABLED`	`true`	Enable scene change detection

AI Service Ports¶

Service	Port	Protocol	Description
YOLO26	8095	HTTP	Object detection
NVIDIA Nemotron	8091	HTTP	LLM risk analysis
Florence	8092	HTTP	Vision-language
CLIP	8093	HTTP	Embeddings / re-ID
Enrichment	8094	HTTP	Enrichment helpers

VRAM Requirements¶

VRAM varies by model choice and enabled enrichment. For "minimal dev" vs "full prod" guidance, see docs/operator/ai-installation.md and ai/AGENTS.md.

AI Service Interaction Diagram¶

AI Service Interaction Diagram

Host machine architecture showing GPU-accelerated AI services (YOLO26, Nemotron), Docker containers (Backend, Frontend, Redis), and data flow from cameras through the processing pipeline.

*Production uses NVIDIA Nemotron-3-Nano-30B-A3B (~14.7GB); development uses Nemotron Mini 4B (~3GB).

Diagram: AI Service Interaction¶

flowchart TB
    subgraph "Host Machine"
        subgraph "GPU (RTX A5500 24GB)"
            RT[YOLO26 Server<br/>Port 8095<br/>~4GB VRAM]
            NEM[NVIDIA Nemotron llama.cpp<br/>Port 8091<br/>~14.7GB VRAM*]
        end

        subgraph "Docker Containers"
            BE[Backend FastAPI<br/>Port 8000]
            FE[Frontend React<br/>Port 5173]
            RD[Redis<br/>Port 6379]
        end
    end

    CAM[Foscam Cameras] -->|FTP Upload| FTP[/export/foscam/]
    FTP -->|inotify| BE

    BE -->|POST /detect<br/>multipart image| RT
    RT -->|JSON detections| BE

    BE -->|POST /completion<br/>ChatML prompt| NEM
    NEM -->|JSON risk assessment| BE

    BE <-->|Queue & Cache| RD
    BE -->|WebSocket| FE

    style RT fill:#76B900
    style NEM fill:#76B900