Video Analytics Guide¶

Comprehensive guide to the AI-powered video analytics features in Home Security Intelligence.

Overview¶

Home Security Intelligence provides a multi-model AI pipeline that transforms raw camera footage into actionable security insights. The video analytics system processes images through multiple specialized models to detect, classify, track, and assess security risks in real-time.

Key Capabilities¶

Feature	Description	Models Used
Object Detection	Detect people, vehicles, animals, and objects	YOLO26
Scene Understanding	Generate captions and descriptions	Florence-2
Anomaly Detection	Compare against learned baselines	CLIP ViT-L/14
Threat Detection	Identify weapons and dangerous items	Threat-Detection-YOLOv8n
Person Analysis	Pose, demographics, clothing, re-identification	Multiple models
Vehicle Analysis	Vehicle type, damage, license plates	Multiple models
Risk Assessment	LLM-based contextual risk analysis	Nemotron-3-Nano-30B

Architecture¶

Detection Pipeline¶

Camera Upload -> File Watcher -> Object Detection -> Batch Aggregator -> Enrichment -> Risk Analysis -> Event
     (1)            (2)              (3)                  (4)              (5)            (6)          (7)

%%{init: {
  'theme': 'dark',
  'themeVariables': {
    'primaryColor': '#3B82F6',
    'primaryTextColor': '#FFFFFF',
    'primaryBorderColor': '#60A5FA',
    'secondaryColor': '#A855F7',
    'tertiaryColor': '#009688',
    'background': '#121212',
    'mainBkg': '#1a1a2e',
    'lineColor': '#666666'
  }
}}%%
flowchart LR
    subgraph Input["1. Image Input"]
        CAM[Camera]
        FTP[FTP Server<br/>/export/foscam/]
        FW[FileWatcher<br/>inotify + debounce]
    end

    subgraph Detection["2. Object Detection"]
        DQ[(detection_queue)]
        YOLO[YOLO26<br/>port 8095<br/>~30-50ms]
    end

    subgraph Batching["3. Batch Aggregation"]
        BA[BatchAggregator<br/>90s window / 30s idle]
        AQ[(analysis_queue)]
    end

    subgraph Enrichment["4. Context Enrichment"]
        FLOR[Florence-2<br/>Scene captions<br/>port 8092]
        CLIP[CLIP ViT-L/14<br/>Anomaly detection<br/>port 8093]
        MZ[Model Zoo<br/>On-demand models]
    end

    subgraph Analysis["5. Risk Assessment"]
        NEM[Nemotron-3-Nano-30B<br/>LLM risk scoring]
    end

    subgraph Output["6. Event Output"]
        DB[(PostgreSQL<br/>Event storage)]
        WS[WebSocket<br/>Real-time broadcast]
        UI[React Dashboard]
    end

    CAM -->|FTP upload| FTP
    FTP -->|inotify| FW
    FW -->|queue job| DQ
    DQ --> YOLO
    YOLO -->|detections| BA
    BA -->|batch ready| AQ
    AQ --> FLOR
    AQ --> CLIP
    AQ --> MZ
    FLOR -->|captions| NEM
    CLIP -->|embeddings| NEM
    MZ -->|enrichment| NEM
    NEM -->|risk score| DB
    NEM -->|event| WS
    WS --> UI

Camera Upload: Cameras upload images via FTP to /export/foscam/{camera_name}/
File Watcher: Monitors directories for new images with deduplication
Object Detection: YOLO26 identifies objects and their bounding boxes
Batch Aggregator: Groups detections into 90-second time windows
Enrichment: Model Zoo extracts additional context (clothing, pose, etc.)
Risk Analysis: Nemotron LLM evaluates the complete context
Event Creation: Security events are created and broadcast via WebSocket

Always-Loaded Models (~4GB VRAM)¶

These models are permanently loaded for real-time processing:

Model	Purpose	VRAM	Port
YOLO26	Primary object detection	~2GB	8095
Florence-2-large	Scene understanding, captions	~1.2GB	8092
CLIP ViT-L/14	Anomaly detection baseline	~800MB	8093

On-Demand Models (~6.8GB Budget)¶

Loaded when needed and evicted using LRU with priority ordering:

Model	Purpose	VRAM	Priority
Threat Detector	Weapon detection	~400MB	CRITICAL
Pose Estimator	Body posture analysis	~300MB	HIGH
Demographics	Age/gender estimation	~500MB	HIGH
FashionCLIP	Clothing analysis	~800MB	HIGH
OSNet Re-ID	Person re-identification	~100MB	MEDIUM
Vehicle Classifier	Vehicle type	~1.5GB	MEDIUM
Pet Classifier	Cat/dog detection	~200MB	MEDIUM
Depth Anything v2	Distance estimation	~150MB	LOW
X-CLIP	Action recognition	~1.5GB	LOW

Object Detection¶

YOLO26 Detection¶

The primary object detector uses YOLO26 for fast, accurate detection:

# Check detector health
curl http://localhost:8095/health

# Detection endpoint (internal use)
POST http://localhost:8095/detect
Content-Type: multipart/form-data

Detected Object Classes:

People: person
Vehicles: car, truck, bus, motorcycle, bicycle
Animals: dog, cat, bird
Objects: backpack, handbag, suitcase, umbrella
And 80+ COCO classes

Detection Response:

{
  "detections": [
    {
      "class": "person",
      "confidence": 0.92,
      "bbox": [120, 80, 280, 450],
      "center": [200, 265]
    }
  ],
  "inference_time_ms": 5.76
}

Detection Filtering¶

Detections are filtered by:

Confidence threshold: Configurable minimum confidence (default: 0.5)
Object classes: Filter to security-relevant objects
Zone filtering: Only process detections in defined zones

Scene Understanding¶

Florence-2 Captioning¶

Florence-2 provides rich scene descriptions:

# Health check
curl http://localhost:8092/health

# Caption endpoint
POST http://localhost:8092/caption
{
  "image": "<base64>",
  "task": "detailed_caption"
}

Available Tasks:

Task	Description
`caption`	Brief scene description
`detailed_caption`	Comprehensive scene analysis
`more_detailed_caption`	Extended detailed description
`ocr`	Text detection and recognition
`dense_region_caption`	Per-region descriptions
`object_detection`	Bounding box detection

Response Example:

{
  "caption": "A person in a blue jacket approaches the front door carrying a package",
  "inference_time_ms": 145.2
}

Anomaly Detection¶

CLIP Baseline Comparison¶

The system learns normal activity patterns and detects anomalies:

How It Works:

CLIP generates embeddings for each scene
Embeddings are compared against historical baselines
Significant deviations trigger anomaly flags

Baseline Metrics:

Metric	Description
`hourly_pattern`	Expected activity by hour (24 buckets)
`day_of_week_pattern`	Expected activity by day (7 buckets)
`typical_dwell_time`	Average time objects stay in view
`typical_crossing_rate`	Expected zone crossings per hour

Anomaly Types:

Unusual time: Activity outside normal hours
Unusual frequency: Detection spike or drop (3+ std deviations)
Unusual dwell: Object lingering 2x longer than typical
Unusual entity: First-time visitor to sensitive zone

Baseline Visualization¶

The dashboard provides four visualization components for understanding learned activity patterns:

24-Hour Activity Pattern (HourlyPatternChart)¶

A line chart showing average detections for each hour of the day (0-23):

Green line: Average detections per hour
Shaded band: Confidence interval (+/- 1 standard deviation)
Orange dot: Peak activity hour
Point opacity: Data quality indicator (more samples = more opaque)

Interpreting the chart:

Full opacity points have 20+ samples (high confidence)
Faded points have fewer samples (still learning)
Hover over points to see exact values and sample counts

Weekly Activity Pattern (DailyPatternChart)¶

A bar chart showing activity levels for each day of the week:

Bar height: Average detections for that day
Bar color intensity: Activity level relative to the busiest day
Orange dot on bar: Peak hour for that day
Weekend bars: Highlighted in blue

Interpreting the chart:

Hover over bars to see average detections, peak hour, and total samples
"Busiest day" and "Quietest day" badges identify patterns

Current Deviation Status (BaselineDeviationCard)¶

A color-coded card showing how current activity compares to baseline:

Color	Interpretation	Deviation Range
Blue	Far below / Below normal	< -1.5 std dev
Green	Normal	-1.5 to +1.5 std dev
Yellow	Slightly above normal	+1.5 to +2.0 std dev
Orange	Above normal	+2.0 to +3.0 std dev
Red	Far above normal	> +3.0 std dev

The card displays:

Deviation score: Number of standard deviations from baseline
Contributing factors: What's causing the deviation (e.g., "high_person_count")
Last updated: When the deviation was calculated

Object Baseline Chart (ObjectBaselineChart)¶

Per-class detection statistics showing frequency patterns by object type:

Grouped bars per class: Average hourly rate, peak hour, total detections
Color-coded by class: Person (blue), Vehicle (green), Animal (orange), etc.
Sort options: By frequency, total detections, peak hour, or alphabetically

Baseline Tuning¶

For per-camera baseline configuration, see the Baseline Tuning Panel in the camera settings:

Sensitivity threshold: Adjusts how many standard deviations trigger an anomaly (0.5-5.0)
Minimum samples: Sets how many data points are needed before detection is reliable
Reset baseline: Clears all learned data to start fresh

API endpoints for programmatic control are documented in Baseline Configuration API.

Person Analysis¶

Pose Estimation¶

YOLOv8n-pose detects 17 COCO keypoints:

{
  "keypoints": [
    { "name": "nose", "x": 0.45, "y": 0.12, "confidence": 0.95 },
    { "name": "left_shoulder", "x": 0.42, "y": 0.25, "confidence": 0.92 }
  ],
  "posture": "standing",
  "is_suspicious": false
}

Posture Classifications:

Posture	Description
`standing`	Upright position
`walking`	Moving, upright
`running`	Fast movement
`crouching`	Low position (suspicious)
`lying_down`	Horizontal position
`reaching_up`	Arms raised (potential climbing)

Suspicious Poses:

crouching - Potential hiding/break-in behavior
crawling - Unusual movement pattern
hiding - Concealment attempt
reaching_up - Potential climbing/entry

Demographics¶

ViT-based age and gender estimation:

{
  "age_range": "21-35",
  "gender": "male",
  "confidence": 0.87
}

Age Ranges: 0-10, 11-20, 21-35, 36-50, 51-65, 65+

Clothing Analysis¶

FashionCLIP zero-shot clothing classification:

{
  "type": "casual",
  "colors": ["blue", "black"],
  "is_suspicious": false,
  "description": "Blue jacket, black pants"
}

Person Re-Identification¶

OSNet generates 512-dimensional embeddings for tracking across cameras:

{
  "embedding": [0.12, -0.34, ...],
  "embedding_hash": "abc123...",
  "match_threshold": 0.7
}

Use Cases:

Track individuals across multiple cameras
Identify repeat visitors
Link detections to known household members

Vehicle Analysis¶

Vehicle Classification¶

ViT-based vehicle type classification:

{
  "vehicle_type": "car",
  "display_name": "Sedan",
  "confidence": 0.91
}

Vehicle Classes:

articulated_truck, bus, car, motorcycle, bicycle
pickup_truck, single_unit_truck, work_van
non_motorized_vehicle

License Plate Detection¶

YOLO-based license plate detection with PaddleOCR:

{
  "plates": [
    {
      "text": "ABC 1234",
      "confidence": 0.88,
      "bbox": [100, 200, 200, 240]
    }
  ]
}

Threat Detection¶

Weapon Detection¶

CRITICAL priority detection for security threats:

{
  "threats": [
    {
      "threat_type": "knife",
      "confidence": 0.85,
      "bbox": [150, 200, 180, 280],
      "severity": "high"
    }
  ],
  "has_threat": true,
  "max_severity": "high"
}

Threat Classes:

Class	Severity
gun, rifle, pistol	CRITICAL
knife	HIGH
bat, crowbar	MEDIUM

Risk Assessment¶

Nemotron LLM Analysis¶

The Nemotron-3-Nano-30B model provides contextual risk assessment:

Input Context:

All detections in the batch
Florence captions and descriptions
Zone information and types
Historical baseline comparison
Household member matching
Time of day and patterns

Output:

{
  "risk_score": 45,
  "risk_level": "medium",
  "summary": "Unknown person approached front door at unusual hour",
  "reasoning": "Activity at 2:14 AM when no family members are expected...",
  "recommended_actions": ["Review footage", "Check if visitor expected"]
}

Risk Score Mapping:

Score	Level	Color	Action
0-29	Low	Green	Informational only
30-59	Medium	Yellow	Review when convenient
60-84	High	Orange	Prompt review
85-100	Critical	Red	Immediate attention

API Reference¶

Analytics Endpoints¶

Endpoint	Method	Description
`/api/analytics/detection-trends`	GET	Daily detection counts
`/api/analytics/risk-history`	GET	Risk level distribution over time
`/api/analytics/camera-uptime`	GET	Uptime percentage per camera
`/api/analytics/object-distribution`	GET	Detection counts by object type
`/api/analytics/risk-score-distribution`	GET	Risk score histogram
`/api/analytics/risk-score-trends`	GET	Average risk score over time

Query Parameters¶

All analytics endpoints accept:

Parameter	Type	Description
`start_date`	Date	Start date (ISO format, required)
`end_date`	Date	End date (ISO format, required)
`camera_id`	String	Filter by camera (optional)

Example Request¶

curl "http://localhost:8000/api/analytics/detection-trends?start_date=2026-01-01&end_date=2026-01-26"

Response Format¶

{
  "data_points": [
    { "date": "2026-01-01", "count": 156 },
    { "date": "2026-01-02", "count": 203 }
  ],
  "total_detections": 4521,
  "start_date": "2026-01-01",
  "end_date": "2026-01-26"
}

Model Status API¶

Check Model Status¶

# Get all model statuses
curl http://localhost:8094/models/status

# Response
{
  "vram_budget_mb": 6963.2,
  "vram_used_mb": 2500,
  "vram_utilization_percent": 35.9,
  "loaded_models": [
    {"name": "pose_estimator", "vram_mb": 300, "priority": "HIGH"}
  ]
}

Preload Models¶

# Preload a model before use
curl -X POST "http://localhost:8094/models/preload?model_name=threat_detector"

Best Practices¶

Optimizing Detection Quality¶

Camera Placement: Ensure cameras have clear views of entry points
Lighting: Good lighting improves detection accuracy
Resolution: Higher resolution enables better detail extraction
Zone Configuration: Focus analysis on important areas

Managing VRAM¶

Priority Models: Keep critical models (threat detection) always ready
Preloading: Preload expected models before high-activity periods
Monitoring: Watch VRAM utilization via /models/status

Reducing False Positives¶

Zone Configuration: Exclude high-motion areas (trees, roads)
Household Registration: Add known people and vehicles
Baseline Learning: Allow system to learn normal patterns
Feedback: Use the feedback system to improve calibration

Troubleshooting¶

No Detections¶

Check camera is uploading to correct directory
Verify file watcher is running: curl http://localhost:8000/api/system/pipeline
Check YOLO26 health: curl http://localhost:8095/health
Review detection queue depth in system telemetry

Slow Analysis¶

Check GPU utilization: curl http://localhost:8000/api/system/gpu
Review pipeline latency: curl http://localhost:8000/api/system/pipeline-latency
Consider adjusting batch window settings
Check for VRAM pressure in model status

Inaccurate Risk Scores¶

Review recent events for patterns
Check zone configuration is accurate
Register household members to reduce false positives
Allow baseline learning time (7+ days recommended)

Zone Configuration Guide - Configure detection zones
Face Recognition Guide - Person identification
Analytics Endpoints - API reference
AI Performance - Model monitoring