Operator Hub¶
Deploy, configure, and maintain Home Security Intelligence.
This hub is for sysadmins, DevOps engineers, and technically savvy users who deploy and maintain the system. For end-user documentation, see the User Hub. For development and contribution, see the Developer Hub.
Quick Navigation¶
| Section | Description | Link |
|---|---|---|
| Deployment | Docker/Podman setup, GPU passthrough, AI services | deployment/ |
| Monitoring | Health checks, GPU metrics, SLOs, alerting | monitoring/ |
| Administration | Configuration, secrets, security | admin/ |
Quick Deploy¶
Estimated deployment time: 30-45 minutes (including model downloads)
# 1. Clone and setup
git clone https://github.com/your-org/home-security-intelligence.git
cd home-security-intelligence
python setup.py # Quick mode - generates .env with secure passwords
# 2. Download AI models (~2.7GB)
./ai/download_models.sh
# 3. Start services
docker compose -f docker-compose.prod.yml up -d
# 4. Verify
curl http://localhost:8000/api/system/health/ready
System Requirements¶
| Component | Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA 8GB VRAM | NVIDIA 12GB+ VRAM |
| CPU | 4 cores | 8+ cores |
| RAM | 16GB | 32GB+ |
| Storage | 50GB | 100GB+ SSD |
| CUDA | 11.8+ | 12.x |
AI VRAM Usage (Production):
Supported GPUs: RTX 30/40 series, RTX A-series, Tesla/V100/A100
Service Architecture¶
Camera uploads --> backend FileWatcher --> detection_queue
--> YOLO26 (8095) --> detections (DB)
--> batching + enrichment
--> Nemotron (8091) --> events (DB)
--> WebSocket dashboard
Deployment Architecture Diagram¶
The following diagram shows the complete container topology, network connections, and data flows:
%%{init: {
'theme': 'dark',
'themeVariables': {
'primaryColor': '#3B82F6',
'primaryTextColor': '#FFFFFF',
'primaryBorderColor': '#60A5FA',
'secondaryColor': '#A855F7',
'tertiaryColor': '#009688',
'background': '#121212',
'mainBkg': '#1a1a2e',
'lineColor': '#666666'
}
}}%%
flowchart TB
subgraph External["External Access"]
Browser["Browser<br/>:5173 / :8443"]
Camera["Foscam Cameras<br/>FTP Upload"]
end
subgraph Frontend["Frontend Layer"]
FE["frontend<br/>nginx :8080<br/>Ports: 5173, 8443"]
end
subgraph Backend["Backend Layer"]
BE["backend<br/>FastAPI :8000<br/>Port: 8000"]
end
subgraph AI["AI Services (GPU)"]
YOLO["ai-yolo26<br/>TensorRT :8095<br/>GPU: YOLO26"]
LLM["ai-llm<br/>Nemotron :8091<br/>GPU: LLM"]
FLOR["ai-florence<br/>Florence-2 :8092<br/>GPU: Florence"]
CLIP["ai-clip<br/>CLIP :8093<br/>GPU: CLIP"]
ENR["ai-enrichment<br/>Heavy Models :8094<br/>GPU: Enrichment"]
ENRL["ai-enrichment-light<br/>Light Models :8096<br/>GPU: CLIP"]
end
subgraph Data["Data Layer"]
PG[("postgres<br/>PostgreSQL :5432")]
RD[("redis<br/>Redis :6379")]
ES[("elasticsearch<br/>ES :9200")]
end
subgraph Monitoring["Monitoring Stack"]
PROM["prometheus<br/>:9090"]
GRAF["grafana<br/>:3002"]
JAEG["jaeger<br/>:16686"]
LOKI["loki<br/>:3100"]
PYRO["pyroscope<br/>:4040"]
ALLOY["alloy<br/>:12345"]
AM["alertmanager<br/>:9093"]
BB["blackbox-exporter<br/>:9115"]
RE["redis-exporter<br/>:9121"]
JE["json-exporter<br/>:7979"]
end
%% External connections
Browser --> FE
Camera --> BE
%% Frontend to Backend
FE -->|"HTTP/WS"| BE
%% Backend to Data
BE -->|"asyncpg"| PG
BE -->|"aioredis"| RD
%% Backend to AI Services
BE -->|"HTTP"| YOLO
BE -->|"HTTP"| LLM
BE -->|"HTTP"| FLOR
BE -->|"HTTP"| CLIP
BE -->|"HTTP"| ENR
BE -->|"HTTP"| ENRL
%% Monitoring connections
PROM --> BE
PROM --> YOLO
PROM --> LLM
PROM --> RE
PROM --> JE
PROM --> BB
PROM --> AM
GRAF --> PROM
GRAF --> LOKI
GRAF --> JAEG
GRAF --> PYRO
JAEG --> ES
ALLOY --> LOKI
ALLOY --> PYRO
BE -->|"OTLP"| ALLOY MQTT Integration¶
The backend publishes detection events to an MQTT broker, enabling integration with Home Assistant, Node-RED, and custom automation scripts. Home Assistant auto-discovery is supported via the homeassistant/discovery topic.
flowchart TB
subgraph Backend["Backend Services"]
PUB["MQTT Publisher"]
CMD["Command Handler"]
HA["Home Assistant<br/>Auto-Discovery"]
end
subgraph Broker["MQTT Broker"]
T1["events/detections"]
T2["commands/#"]
T3["homeassistant/discovery"]
end
subgraph External["External Consumers"]
HASS["Home Assistant"]
NR["Node-RED"]
CUSTOM["Custom Scripts"]
end
PUB --> T1
T2 --> CMD
HA --> T3
T1 --> HASS & NR & CUSTOM
T3 --> HASS
HASS -->|Commands| T2 Network: All services connect via the security-net bridge network for internal DNS resolution.
Volume Mounts:
| Service | Volume | Purpose |
|---|---|---|
| postgres | postgres_data | Database persistence |
| redis | redis_data | Cache persistence |
| elasticsearch | elasticsearch_data | Trace storage |
| prometheus | prometheus_data | Metrics storage |
| grafana | grafana_data | Dashboard persistence |
| loki | loki_data | Log storage |
| pyroscope | pyroscope_data | Profile storage |
| alertmanager | alertmanager_data | Alert state |
| frontend | frontend_certs | SSL certificates |
| ai-clip | clip-tensorrt-cache | TensorRT engine cache |
| ai-enrichment-light | enrichment-light-tensorrt-cache | TensorRT engine cache |
| ai-florence, ai-enrichment | hf_cache | HuggingFace model cache |
| backend | /cameras (bind mount) | Camera FTP directory |
| backend | /models/model-zoo (bind) | AI model files |
Ports Reference¶
| Service | Port | Purpose |
|---|---|---|
| Frontend | 80 | Web dashboard (production) |
| Frontend | 5173 | Web dashboard (development) |
| Backend | 8000 | REST API + WebSocket |
| YOLO26 | 8095 | Object detection service |
| Nemotron | 8091 | LLM risk analysis service |
| Florence-2 | 8092 | Vision extraction (optional) |
| CLIP | 8093 | Re-identification (optional) |
| Enrichment | 8094 | Vehicle/pet classification (optional) |
| Enrichment Light | 8096 | Lightweight enrichment (optional) |
| PostgreSQL | 5432 | Database |
| Redis | 6379 | Cache + message broker |
Quick Commands¶
Service Management¶
# Start all services (production)
docker compose -f docker-compose.prod.yml up -d
# Stop all services
docker compose -f docker-compose.prod.yml down
# View logs
docker compose -f docker-compose.prod.yml logs -f
docker compose -f docker-compose.prod.yml logs -f backend
# Restart a service
docker compose -f docker-compose.prod.yml restart backend
Health Checks¶
# System health
curl http://localhost:8000/api/system/health/ready
# Full health with circuit breakers
curl http://localhost:8000/api/system/health/full
# AI services
curl http://localhost:8095/health # YOLO26
curl http://localhost:8091/health # Nemotron
# Database
docker compose exec postgres pg_isready
# Redis
docker compose exec redis redis-cli ping
GPU Management¶
# GPU status
nvidia-smi
# GPU memory usage
nvidia-smi --query-gpu=memory.used,memory.total --format=csv
# Kill GPU processes (emergency)
fuser -k /dev/nvidia*
Detailed Guides¶
Deployment¶
- Complete Deployment Guide - Docker/Podman setup, compose files, GHCR images
- GPU Setup Guide - NVIDIA drivers, container toolkit, CDI
- AI Services Guide - YOLO26, Nemotron, optional services
- Deployment Modes - AI networking for different setups
Monitoring¶
- Monitoring Guide - Health checks, GPU metrics, DLQ
- Prometheus Alerting - Alert rules, Alertmanager
- Service Level Objectives - SLIs, SLOs, error budgets
Administration¶
- Administration Guide - Configuration, secrets, security
- Backup and Recovery - Database backup, disaster recovery
- Redis Setup - Authentication, persistence
Troubleshooting¶
Quick Diagnostics¶
# Comprehensive health check
curl http://localhost:8000/api/system/health/full | jq
# Container status
docker compose -f docker-compose.prod.yml ps
# Recent logs
docker compose -f docker-compose.prod.yml logs --tail=100 backend
# GPU availability
nvidia-smi
Common Issues¶
| Issue | Quick Fix |
|---|---|
| AI services unreachable | Check Deployment Modes for correct URLs |
| GPU out of memory | Close other GPU apps, restart AI services |
| Database connection failed | Verify DATABASE_URL, check PostgreSQL is running |
| Redis auth failed | Check REDIS_PASSWORD in .env, see Redis Setup |
| WebSocket won't connect | Check CORS settings, verify backend is healthy |
| Images not processing | Check FOSCAM_BASE_PATH, enable FILE_WATCHER_POLLING for Docker Desktop |
| DLQ jobs accumulating | Verify AI services healthy, check DLQ Management |
Getting Help¶
When reporting issues, collect:
# System health
curl http://localhost:8000/api/system/health | jq
# GPU info
nvidia-smi
# Container status
docker compose -f docker-compose.prod.yml ps
# Recent logs
docker compose -f docker-compose.prod.yml logs --tail=100 backend
See Also¶
- User Hub - End-user documentation
- Developer Hub - Development and contribution
- API Reference - REST and WebSocket APIs
- Architecture Overview - System design