Video Analytics Guide¶
Comprehensive guide to the AI-powered video analytics features in Home Security Intelligence.
Overview¶
Home Security Intelligence provides a multi-model AI pipeline that transforms raw camera footage into actionable security insights. The video analytics system processes images through multiple specialized models to detect, classify, track, and assess security risks in real-time.
Key Capabilities¶
| Feature | Description | Models Used |
|---|---|---|
| Object Detection | Detect people, vehicles, animals, and objects | YOLO26 |
| Scene Understanding | Generate captions and descriptions | Florence-2 |
| Anomaly Detection | Compare against learned baselines | CLIP ViT-L/14 |
| Threat Detection | Identify weapons and dangerous items | Threat-Detection-YOLOv8n |
| Person Analysis | Pose, demographics, clothing, re-identification | Multiple models |
| Vehicle Analysis | Vehicle type, damage, license plates | Multiple models |
| Risk Assessment | LLM-based contextual risk analysis | Nemotron-3-Nano-30B |
Architecture¶
Detection Pipeline¶
Camera Upload -> File Watcher -> Object Detection -> Batch Aggregator -> Enrichment -> Risk Analysis -> Event
(1) (2) (3) (4) (5) (6) (7)
%%{init: {
'theme': 'dark',
'themeVariables': {
'primaryColor': '#3B82F6',
'primaryTextColor': '#FFFFFF',
'primaryBorderColor': '#60A5FA',
'secondaryColor': '#A855F7',
'tertiaryColor': '#009688',
'background': '#121212',
'mainBkg': '#1a1a2e',
'lineColor': '#666666'
}
}}%%
flowchart LR
subgraph Input["1. Image Input"]
CAM[Camera]
FTP[FTP Server<br/>/export/foscam/]
FW[FileWatcher<br/>inotify + debounce]
end
subgraph Detection["2. Object Detection"]
DQ[(detection_queue)]
YOLO[YOLO26<br/>port 8095<br/>~30-50ms]
end
subgraph Batching["3. Batch Aggregation"]
BA[BatchAggregator<br/>90s window / 30s idle]
AQ[(analysis_queue)]
end
subgraph Enrichment["4. Context Enrichment"]
FLOR[Florence-2<br/>Scene captions<br/>port 8092]
CLIP[CLIP ViT-L/14<br/>Anomaly detection<br/>port 8093]
MZ[Model Zoo<br/>On-demand models]
end
subgraph Analysis["5. Risk Assessment"]
NEM[Nemotron-3-Nano-30B<br/>LLM risk scoring]
end
subgraph Output["6. Event Output"]
DB[(PostgreSQL<br/>Event storage)]
WS[WebSocket<br/>Real-time broadcast]
UI[React Dashboard]
end
CAM -->|FTP upload| FTP
FTP -->|inotify| FW
FW -->|queue job| DQ
DQ --> YOLO
YOLO -->|detections| BA
BA -->|batch ready| AQ
AQ --> FLOR
AQ --> CLIP
AQ --> MZ
FLOR -->|captions| NEM
CLIP -->|embeddings| NEM
MZ -->|enrichment| NEM
NEM -->|risk score| DB
NEM -->|event| WS
WS --> UI - Camera Upload: Cameras upload images via FTP to
/export/foscam/{camera_name}/ - File Watcher: Monitors directories for new images with deduplication
- Object Detection: YOLO26 identifies objects and their bounding boxes
- Batch Aggregator: Groups detections into 90-second time windows
- Enrichment: Model Zoo extracts additional context (clothing, pose, etc.)
- Risk Analysis: Nemotron LLM evaluates the complete context
- Event Creation: Security events are created and broadcast via WebSocket
Always-Loaded Models (~4GB VRAM)¶
These models are permanently loaded for real-time processing:
| Model | Purpose | VRAM | Port |
|---|---|---|---|
| YOLO26 | Primary object detection | ~2GB | 8095 |
| Florence-2-large | Scene understanding, captions | ~1.2GB | 8092 |
| CLIP ViT-L/14 | Anomaly detection baseline | ~800MB | 8093 |
On-Demand Models (~6.8GB Budget)¶
Loaded when needed and evicted using LRU with priority ordering:
| Model | Purpose | VRAM | Priority |
|---|---|---|---|
| Threat Detector | Weapon detection | ~400MB | CRITICAL |
| Pose Estimator | Body posture analysis | ~300MB | HIGH |
| Demographics | Age/gender estimation | ~500MB | HIGH |
| FashionCLIP | Clothing analysis | ~800MB | HIGH |
| OSNet Re-ID | Person re-identification | ~100MB | MEDIUM |
| Vehicle Classifier | Vehicle type | ~1.5GB | MEDIUM |
| Pet Classifier | Cat/dog detection | ~200MB | MEDIUM |
| Depth Anything v2 | Distance estimation | ~150MB | LOW |
| X-CLIP | Action recognition | ~1.5GB | LOW |
Object Detection¶
YOLO26 Detection¶
The primary object detector uses YOLO26 for fast, accurate detection:
# Check detector health
curl http://localhost:8095/health
# Detection endpoint (internal use)
POST http://localhost:8095/detect
Content-Type: multipart/form-data
Detected Object Classes:
- People: person
- Vehicles: car, truck, bus, motorcycle, bicycle
- Animals: dog, cat, bird
- Objects: backpack, handbag, suitcase, umbrella
- And 80+ COCO classes
Detection Response:
{
"detections": [
{
"class": "person",
"confidence": 0.92,
"bbox": [120, 80, 280, 450],
"center": [200, 265]
}
],
"inference_time_ms": 5.76
}
Detection Filtering¶
Detections are filtered by:
- Confidence threshold: Configurable minimum confidence (default: 0.5)
- Object classes: Filter to security-relevant objects
- Zone filtering: Only process detections in defined zones
Scene Understanding¶
Florence-2 Captioning¶
Florence-2 provides rich scene descriptions:
# Health check
curl http://localhost:8092/health
# Caption endpoint
POST http://localhost:8092/caption
{
"image": "<base64>",
"task": "detailed_caption"
}
Available Tasks:
| Task | Description |
|---|---|
caption | Brief scene description |
detailed_caption | Comprehensive scene analysis |
more_detailed_caption | Extended detailed description |
ocr | Text detection and recognition |
dense_region_caption | Per-region descriptions |
object_detection | Bounding box detection |
Response Example:
{
"caption": "A person in a blue jacket approaches the front door carrying a package",
"inference_time_ms": 145.2
}
Anomaly Detection¶
CLIP Baseline Comparison¶
The system learns normal activity patterns and detects anomalies:
How It Works:
- CLIP generates embeddings for each scene
- Embeddings are compared against historical baselines
- Significant deviations trigger anomaly flags
Baseline Metrics:
| Metric | Description |
|---|---|
hourly_pattern | Expected activity by hour (24 buckets) |
day_of_week_pattern | Expected activity by day (7 buckets) |
typical_dwell_time | Average time objects stay in view |
typical_crossing_rate | Expected zone crossings per hour |
Anomaly Types:
- Unusual time: Activity outside normal hours
- Unusual frequency: Detection spike or drop (3+ std deviations)
- Unusual dwell: Object lingering 2x longer than typical
- Unusual entity: First-time visitor to sensitive zone
Baseline Visualization¶
The dashboard provides four visualization components for understanding learned activity patterns:
24-Hour Activity Pattern (HourlyPatternChart)¶
A line chart showing average detections for each hour of the day (0-23):
- Green line: Average detections per hour
- Shaded band: Confidence interval (+/- 1 standard deviation)
- Orange dot: Peak activity hour
- Point opacity: Data quality indicator (more samples = more opaque)
Interpreting the chart:
- Full opacity points have 20+ samples (high confidence)
- Faded points have fewer samples (still learning)
- Hover over points to see exact values and sample counts
Weekly Activity Pattern (DailyPatternChart)¶
A bar chart showing activity levels for each day of the week:
- Bar height: Average detections for that day
- Bar color intensity: Activity level relative to the busiest day
- Orange dot on bar: Peak hour for that day
- Weekend bars: Highlighted in blue
Interpreting the chart:
- Hover over bars to see average detections, peak hour, and total samples
- "Busiest day" and "Quietest day" badges identify patterns
Current Deviation Status (BaselineDeviationCard)¶
A color-coded card showing how current activity compares to baseline:
| Color | Interpretation | Deviation Range |
|---|---|---|
| Blue | Far below / Below normal | < -1.5 std dev |
| Green | Normal | -1.5 to +1.5 std dev |
| Yellow | Slightly above normal | +1.5 to +2.0 std dev |
| Orange | Above normal | +2.0 to +3.0 std dev |
| Red | Far above normal | > +3.0 std dev |
The card displays:
- Deviation score: Number of standard deviations from baseline
- Contributing factors: What's causing the deviation (e.g., "high_person_count")
- Last updated: When the deviation was calculated
Object Baseline Chart (ObjectBaselineChart)¶
Per-class detection statistics showing frequency patterns by object type:
- Grouped bars per class: Average hourly rate, peak hour, total detections
- Color-coded by class: Person (blue), Vehicle (green), Animal (orange), etc.
- Sort options: By frequency, total detections, peak hour, or alphabetically
Baseline Tuning¶
For per-camera baseline configuration, see the Baseline Tuning Panel in the camera settings:
- Sensitivity threshold: Adjusts how many standard deviations trigger an anomaly (0.5-5.0)
- Minimum samples: Sets how many data points are needed before detection is reliable
- Reset baseline: Clears all learned data to start fresh
API endpoints for programmatic control are documented in Baseline Configuration API.
Person Analysis¶
Pose Estimation¶
YOLOv8n-pose detects 17 COCO keypoints:
{
"keypoints": [
{ "name": "nose", "x": 0.45, "y": 0.12, "confidence": 0.95 },
{ "name": "left_shoulder", "x": 0.42, "y": 0.25, "confidence": 0.92 }
],
"posture": "standing",
"is_suspicious": false
}
Posture Classifications:
| Posture | Description |
|---|---|
standing | Upright position |
walking | Moving, upright |
running | Fast movement |
crouching | Low position (suspicious) |
lying_down | Horizontal position |
reaching_up | Arms raised (potential climbing) |
Suspicious Poses:
crouching- Potential hiding/break-in behaviorcrawling- Unusual movement patternhiding- Concealment attemptreaching_up- Potential climbing/entry
Demographics¶
ViT-based age and gender estimation:
Age Ranges: 0-10, 11-20, 21-35, 36-50, 51-65, 65+
Clothing Analysis¶
FashionCLIP zero-shot clothing classification:
{
"type": "casual",
"colors": ["blue", "black"],
"is_suspicious": false,
"description": "Blue jacket, black pants"
}
Person Re-Identification¶
OSNet generates 512-dimensional embeddings for tracking across cameras:
Use Cases:
- Track individuals across multiple cameras
- Identify repeat visitors
- Link detections to known household members
Vehicle Analysis¶
Vehicle Classification¶
ViT-based vehicle type classification:
Vehicle Classes:
- articulated_truck, bus, car, motorcycle, bicycle
- pickup_truck, single_unit_truck, work_van
- non_motorized_vehicle
License Plate Detection¶
YOLO-based license plate detection with PaddleOCR:
Threat Detection¶
Weapon Detection¶
CRITICAL priority detection for security threats:
{
"threats": [
{
"threat_type": "knife",
"confidence": 0.85,
"bbox": [150, 200, 180, 280],
"severity": "high"
}
],
"has_threat": true,
"max_severity": "high"
}
Threat Classes:
| Class | Severity |
|---|---|
| gun, rifle, pistol | CRITICAL |
| knife | HIGH |
| bat, crowbar | MEDIUM |
Risk Assessment¶
Nemotron LLM Analysis¶
The Nemotron-3-Nano-30B model provides contextual risk assessment:
Input Context:
- All detections in the batch
- Florence captions and descriptions
- Zone information and types
- Historical baseline comparison
- Household member matching
- Time of day and patterns
Output:
{
"risk_score": 45,
"risk_level": "medium",
"summary": "Unknown person approached front door at unusual hour",
"reasoning": "Activity at 2:14 AM when no family members are expected...",
"recommended_actions": ["Review footage", "Check if visitor expected"]
}
Risk Score Mapping:
| Score | Level | Color | Action |
|---|---|---|---|
| 0-29 | Low | Green | Informational only |
| 30-59 | Medium | Yellow | Review when convenient |
| 60-84 | High | Orange | Prompt review |
| 85-100 | Critical | Red | Immediate attention |
API Reference¶
Analytics Endpoints¶
| Endpoint | Method | Description |
|---|---|---|
/api/analytics/detection-trends | GET | Daily detection counts |
/api/analytics/risk-history | GET | Risk level distribution over time |
/api/analytics/camera-uptime | GET | Uptime percentage per camera |
/api/analytics/object-distribution | GET | Detection counts by object type |
/api/analytics/risk-score-distribution | GET | Risk score histogram |
/api/analytics/risk-score-trends | GET | Average risk score over time |
Query Parameters¶
All analytics endpoints accept:
| Parameter | Type | Description |
|---|---|---|
start_date | Date | Start date (ISO format, required) |
end_date | Date | End date (ISO format, required) |
camera_id | String | Filter by camera (optional) |
Example Request¶
curl "http://localhost:8000/api/analytics/detection-trends?start_date=2026-01-01&end_date=2026-01-26"
Response Format¶
{
"data_points": [
{ "date": "2026-01-01", "count": 156 },
{ "date": "2026-01-02", "count": 203 }
],
"total_detections": 4521,
"start_date": "2026-01-01",
"end_date": "2026-01-26"
}
Model Status API¶
Check Model Status¶
# Get all model statuses
curl http://localhost:8094/models/status
# Response
{
"vram_budget_mb": 6963.2,
"vram_used_mb": 2500,
"vram_utilization_percent": 35.9,
"loaded_models": [
{"name": "pose_estimator", "vram_mb": 300, "priority": "HIGH"}
]
}
Preload Models¶
# Preload a model before use
curl -X POST "http://localhost:8094/models/preload?model_name=threat_detector"
Best Practices¶
Optimizing Detection Quality¶
- Camera Placement: Ensure cameras have clear views of entry points
- Lighting: Good lighting improves detection accuracy
- Resolution: Higher resolution enables better detail extraction
- Zone Configuration: Focus analysis on important areas
Managing VRAM¶
- Priority Models: Keep critical models (threat detection) always ready
- Preloading: Preload expected models before high-activity periods
- Monitoring: Watch VRAM utilization via
/models/status
Reducing False Positives¶
- Zone Configuration: Exclude high-motion areas (trees, roads)
- Household Registration: Add known people and vehicles
- Baseline Learning: Allow system to learn normal patterns
- Feedback: Use the feedback system to improve calibration
Troubleshooting¶
No Detections¶
- Check camera is uploading to correct directory
- Verify file watcher is running:
curl http://localhost:8000/api/system/pipeline - Check YOLO26 health:
curl http://localhost:8095/health - Review detection queue depth in system telemetry
Slow Analysis¶
- Check GPU utilization:
curl http://localhost:8000/api/system/gpu - Review pipeline latency:
curl http://localhost:8000/api/system/pipeline-latency - Consider adjusting batch window settings
- Check for VRAM pressure in model status
Inaccurate Risk Scores¶
- Review recent events for patterns
- Check zone configuration is accurate
- Register household members to reduce false positives
- Allow baseline learning time (7+ days recommended)
Related Documentation¶
- Zone Configuration Guide - Configure detection zones
- Face Recognition Guide - Person identification
- Analytics Endpoints - API reference
- AI Performance - Model monitoring