Skip to content

Deployment Guide

Complete guide for deploying Home Security Intelligence with Docker/Podman and GPU-accelerated AI services.


Table of Contents


Deployment Architecture

The following diagram shows the complete production deployment topology with all containers, their connections, ports, and GPU assignments.

%%{init: {
  'theme': 'dark',
  'themeVariables': {
    'primaryColor': '#3B82F6',
    'primaryTextColor': '#FFFFFF',
    'primaryBorderColor': '#60A5FA',
    'secondaryColor': '#A855F7',
    'tertiaryColor': '#009688',
    'background': '#121212',
    'mainBkg': '#1a1a2e',
    'lineColor': '#666666'
  }
}}%%
flowchart TB
    subgraph External["External Access Points"]
        direction LR
        USER["User Browser"]
        CAM["Foscam Cameras<br/>(FTP Upload)"]
        ADMN["Admin/Ops"]
    end

    subgraph Network["security-net (Bridge Network)"]
        subgraph FrontendLayer["Frontend Layer"]
            FE["<b>frontend</b><br/>nginx-unprivileged<br/>Internal: 8080<br/>External: 5173 (HTTP), 8443 (HTTPS)<br/>Memory: 512M"]
        end

        subgraph BackendLayer["Backend Layer"]
            BE["<b>backend</b><br/>FastAPI + Uvicorn<br/>Port: 8000<br/>Memory: 6G<br/>CPU: 2 cores"]
        end

        subgraph AILayer["AI Services Layer (GPU Required)"]
            direction LR
            subgraph GPU0["GPU 0 (Primary - High VRAM)"]
                LLM["<b>ai-llm</b><br/>Nemotron 30B<br/>Port: 8091<br/>~14.7GB VRAM"]
                FLOR["<b>ai-florence</b><br/>Florence-2<br/>Port: 8092<br/>~530MB VRAM"]
            end
            subgraph GPU1["GPU 1 (Secondary - 4GB+)"]
                YOLO["<b>ai-yolo26</b><br/>YOLO26 TensorRT<br/>Port: 8095<br/>~2GB VRAM"]
                CLIP["<b>ai-clip</b><br/>CLIP ViT-L<br/>Port: 8093<br/>~722MB VRAM"]
                ENRL["<b>ai-enrichment-light</b><br/>Pose, Threat, ReID, Pet, Depth<br/>Port: 8096<br/>~1.2GB VRAM"]
            end
            ENR["<b>ai-enrichment</b><br/>Vehicle, Fashion, Age, Gender, Action<br/>Port: 8094<br/>GPU: Configurable<br/>~4.3GB VRAM"]
        end

        subgraph DataLayer["Data Layer"]
            PG[("<b>postgres</b><br/>PostgreSQL 16-alpine<br/>Port: 5432<br/>Memory: 1G")]
            RD[("<b>redis</b><br/>Redis 7.4-alpine<br/>Port: 6379<br/>Memory: 512M")]
            ES[("<b>elasticsearch</b><br/>ES 8.12<br/>Port: 9200<br/>Memory: 4G")]
        end

        subgraph MonitoringLayer["Monitoring & Observability"]
            direction TB
            subgraph MetricsTracing["Metrics & Tracing"]
                PROM["<b>prometheus</b><br/>Port: 9090<br/>Memory: 512M"]
                JAEG["<b>jaeger</b><br/>Port: 16686<br/>Memory: 512M"]
                GRAF["<b>grafana</b><br/>Port: 3002<br/>Memory: 256M"]
            end
            subgraph LogsProfiling["Logs & Profiling"]
                LOKI["<b>loki</b><br/>Port: 3100<br/>Memory: 512M"]
                PYRO["<b>pyroscope</b><br/>Port: 4040<br/>Memory: 512M"]
                ALLOY["<b>alloy</b><br/>Port: 12345<br/>Memory: 768M"]
            end
            subgraph Exporters["Exporters & Alerting"]
                AM["<b>alertmanager</b><br/>Port: 9093<br/>Memory: 128M"]
                BB["<b>blackbox-exporter</b><br/>Port: 9115"]
                RE["<b>redis-exporter</b><br/>Port: 9121"]
                JE["<b>json-exporter</b><br/>Port: 7979"]
            end
        end
    end

    %% External connections
    USER -->|"HTTP :5173<br/>HTTPS :8443"| FE
    CAM -->|"FTP to<br/>/cameras mount"| BE
    ADMN -->|"Grafana :3002<br/>Prometheus :9090<br/>Jaeger :16686"| MonitoringLayer

    %% Frontend to Backend
    FE -->|"Proxy /api, /ws"| BE

    %% Backend to Data
    BE -->|"asyncpg"| PG
    BE -->|"aioredis"| RD

    %% Backend to AI (HTTP inference calls)
    BE -->|"POST /detect"| YOLO
    BE -->|"POST /v1/completions"| LLM
    BE -->|"POST /caption"| FLOR
    BE -->|"POST /embed"| CLIP
    BE -->|"POST /analyze"| ENR
    BE -->|"POST /analyze"| ENRL

    %% Monitoring data flows
    PROM -.->|"scrape /metrics"| BE
    PROM -.->|"scrape"| YOLO
    PROM -.->|"scrape"| LLM
    PROM -.->|"scrape"| RE
    PROM -.->|"scrape"| BB
    PROM -.->|"scrape"| JE
    PROM -->|"alert rules"| AM
    AM -->|"webhooks"| BE

    GRAF -->|"query"| PROM
    GRAF -->|"query"| LOKI
    GRAF -->|"query"| JAEG
    GRAF -->|"query"| PYRO

    JAEG -->|"store spans"| ES

    ALLOY -->|"push logs"| LOKI
    ALLOY -->|"push profiles"| PYRO
    BE -->|"OTLP traces"| ALLOY

    %% Health check dependencies (startup order)
    BE -.->|"depends_on<br/>healthy"| PG
    BE -.->|"depends_on<br/>healthy"| RD
    BE -.->|"depends_on<br/>healthy"| YOLO
    BE -.->|"depends_on<br/>healthy"| LLM
    FE -.->|"depends_on<br/>healthy"| BE
    PROM -.->|"depends_on<br/>healthy"| AM
    JAEG -.->|"depends_on<br/>healthy"| ES

Architecture Summary

Layer Services Resource Profile
Frontend nginx reverse proxy 512M RAM, 1 CPU
Backend FastAPI application server 6G RAM, 2 CPUs, GPU access
AI Services YOLO26, Nemotron, Florence-2, CLIP, Enrichment (light + heavy) GPU required (~19GB total)
Data PostgreSQL, Redis, Elasticsearch 5.5G RAM total
Monitoring Prometheus, Grafana, Jaeger, Loki, Pyroscope, Alloy ~3G RAM total

GPU Assignment Strategy

The default GPU assignment distributes models across two GPUs:

GPU Services Total VRAM Typical GPU
GPU 0 Nemotron LLM, Florence-2 ~15.2GB RTX A5500/RTX 4090
GPU 1 YOLO26, CLIP, Enrichment-Light ~2.9GB RTX A400/RTX 3060

Enrichment (heavy models) defaults to GPU 1 but can be configured via GPU_ENRICHMENT environment variable.


Quick Start

# 1. Clone repository
git clone https://github.com/your-org/home-security-intelligence.git
cd home-security-intelligence

# 2. Run setup (generates .env with secure passwords)
python setup.py              # Quick mode
python setup.py --guided     # Guided mode with explanations

# 3. Download AI models (~2.7GB)
./ai/download_models.sh

# 4. Start services
docker compose -f docker-compose.prod.yml up -d

# 5. Verify deployment
curl http://localhost:8000/api/system/health/ready

Prerequisites

Hardware Requirements

Resource Minimum Recommended Purpose
CPU 4 cores 8 cores Backend workers, AI inference
RAM 16 GB 32 GB Services + AI model loading
GPU VRAM 8 GB 24 GB YOLO26 + Nemotron + optional models
Disk Space 100 GB 500 GB Database, logs, media files
Camera Storage 50 GB 200 GB FTP upload directory

Software Requirements

Software Version Purpose Installation
Docker or Podman 20.10+ Container runtime See Container Runtime Setup
NVIDIA Driver 535+ GPU support apt install nvidia-driver-535
nvidia-container-toolkit 1.13+ GPU passthrough See GPU Passthrough
PostgreSQL Client 15+ Database administration apt install postgresql-client

Network Requirements

Port Service Protocol Access
80/5173 Frontend HTTP Browser
8000 Backend API HTTP/WS Frontend
8095 YOLO26 HTTP Backend
8091 Nemotron HTTP Backend
8092 Florence-2 HTTP Backend (optional)
8093 CLIP HTTP Backend (optional)
8094 Enrichment HTTP Backend (optional)
5432 PostgreSQL TCP Backend
6379 Redis TCP Backend

Container Runtime Setup

This project supports Docker Engine, Docker Desktop, and Podman.

Runtime Platform License Installation
Docker Engine Linux Free apt install docker.io
Docker Desktop macOS, Windows, Linux Commercial docker.com
Podman Linux, macOS Free (Apache 2.0) brew install podman or dnf install podman

Docker Setup

# Install Docker Engine (Linux)
sudo apt install docker.io docker-compose-plugin

# Verify installation
docker --version
docker compose version

Podman Setup

# macOS
brew install podman podman-compose
podman machine init
podman machine start

# Linux (Fedora/RHEL)
sudo dnf install podman podman-compose

# Verify installation
podman info

Command Equivalents

Docker Podman
docker compose up -d podman-compose up -d
docker compose down podman-compose down
docker compose logs podman-compose logs
docker ps podman ps
docker build podman build

GPU Passthrough

AI services require NVIDIA GPU access via Container Device Interface (CDI).

Prerequisites

  1. NVIDIA driver 535+
  2. NVIDIA Container Toolkit

Verify GPU Access

# Verify NVIDIA driver
nvidia-smi

# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi

# Test Podman GPU access (CDI)
podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.0-base-ubuntu22.04 nvidia-smi

Install NVIDIA Container Toolkit

# Ubuntu/Debian
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit

# Configure Docker
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Compose Files

File Purpose AI Services Use Case
docker-compose.yml Development Host (native) Local development, hot reload
docker-compose.prod.yml Production Containerized Full deployment with GPU
docker-compose.ghcr.yml Pre-built External Fast deploy from GHCR images

Deployment Mode Selection Guide

Choose your deployment mode based on your needs:

Question Recommended Mode
First time deploying / Want simplest setup? Production (docker-compose.prod.yml)
Developing locally with code hot-reload? Development (docker-compose.yml + host AI)
Need GPU debugging / AI runs better on host? Hybrid (container backend + host AI)
Have a dedicated GPU server? Remote AI host mode
Want fastest deployment from pre-built images? GHCR (docker-compose.ghcr.yml)

Decision flowchart:

  1. Production deployment? Use docker-compose.prod.yml - everything containerized, no networking complexity
  2. Active development? Use docker-compose.yml with host AI for hot-reload and easier debugging
  3. GPU issues in containers? Run AI services on host, backend in container (see Deployment Modes)

Tip: If AI services are unreachable, it's usually a networking mode mismatch. See Deployment Modes & AI Networking for URL configuration by mode.

Production Deployment

# Start all services
docker compose -f docker-compose.prod.yml up -d

# View logs
docker compose -f docker-compose.prod.yml logs -f

# Stop services
docker compose -f docker-compose.prod.yml down

Development with Host AI

# Terminal 1: Start YOLO26
./ai/start_detector.sh

# Terminal 2: Start Nemotron
./ai/start_llm.sh

# Terminal 3: Start application stack
docker compose up -d

Deploy from GHCR

# Set image location
export GHCR_OWNER=your-org
export GHCR_REPO=home-security-intelligence
export IMAGE_TAG=latest

# Authenticate (requires GitHub token with read:packages)
echo $GITHUB_TOKEN | docker login ghcr.io -u YOUR_USERNAME --password-stdin

# Deploy
docker compose -f docker-compose.ghcr.yml up -d

Deployment Options

Cross-Platform Host Resolution

When backend is containerized but AI runs on the host:

Platform Container Runtime Host Resolution
macOS Docker Desktop host.docker.internal (default)
macOS Podman host.containers.internal
Linux Docker Engine Host IP address
Linux Podman Host IP address

Container Networking Resolution Flowchart

flowchart TD
    Start([Start: Resolve AI Host]) --> Platform{What platform?}

    Platform -->|macOS| MacRuntime{Container Runtime?}
    Platform -->|Linux| LinuxRuntime{Container Runtime?}

    MacRuntime -->|Docker Desktop| MacDocker[Use: host.docker.internal]
    MacRuntime -->|Podman| MacPodman[Use: host.containers.internal]

    LinuxRuntime -->|Docker Engine| LinuxDocker[Use: Host IP Address]
    LinuxRuntime -->|Podman| LinuxPodman[Use: Host IP Address]

    MacDocker --> SetEnv1["export AI_HOST=host.docker.internal"]
    MacPodman --> SetEnv2["export AI_HOST=host.containers.internal"]
    LinuxDocker --> GetIP["AI_HOST=$(hostname -I | awk '{print $1}')"]
    LinuxPodman --> GetIP

    SetEnv1 --> Verify{Test Connection}
    SetEnv2 --> Verify
    GetIP --> SetEnv3["export AI_HOST=$AI_HOST"] --> Verify

    Verify -->|Success| Done([AI Services Reachable])
    Verify -->|Fail| Debug[Check firewall and service status]
    Debug --> Verify

    style Start fill:#e1f5fe
    style Done fill:#c8e6c9
    style Debug fill:#ffecb3
# macOS with Docker Desktop (default, no action needed)
docker compose up -d

# macOS with Podman
export AI_HOST=host.containers.internal
podman-compose up -d

# Linux (Docker or Podman)
export AI_HOST=$(hostname -I | awk '{print $1}')
docker compose up -d

AI Service URLs by Deployment Mode

Production (docker-compose.prod.yml):

# AI services on compose network (internal DNS)
YOLO26_URL=http://ai-yolo26:8095
NEMOTRON_URL=http://ai-llm:8091
FLORENCE_URL=http://ai-florence:8092
CLIP_URL=http://ai-clip:8093
ENRICHMENT_URL=http://ai-enrichment:8094

Development with host AI:

YOLO26_URL=http://localhost:8095
NEMOTRON_URL=http://localhost:8091

Docker Desktop (macOS/Windows):

YOLO26_URL=http://host.docker.internal:8095
NEMOTRON_URL=http://host.docker.internal:8091

AI Services Setup

AI Architecture

The system supports a multi-service AI stack:

Service Port VRAM Purpose
YOLO26 8095 ~2GB Object detection
Nemotron 8091 ~3GB (4B) / ~14.7GB (30B) Risk reasoning
Florence-2 8092 ~2GB Vision extraction (optional)
CLIP 8093 ~2GB Re-identification (optional)
Enrichment 8094 ~4GB Vehicle/pet/clothing (optional)

Model Downloads

# Automated download
./ai/download_models.sh

# What it downloads:
# - Nemotron Mini 4B (~2.5GB) for development
# - YOLO26 auto-downloads on first use via HuggingFace

Production Model Specifications

Model File Size VRAM Context
NVIDIA Nemotron-3-Nano-30B-A3B Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf ~18 GB ~14.7 GB 131,072
Nemotron Mini 4B (dev) nemotron-mini-4b-instruct-q4_k_m.gguf ~2.5 GB ~3 GB 4,096

Verify AI Services

# Health checks
curl http://localhost:8095/health   # YOLO26
curl http://localhost:8091/health   # Nemotron
curl http://localhost:8092/health   # Florence-2 (optional)
curl http://localhost:8093/health   # CLIP (optional)
curl http://localhost:8094/health   # Enrichment (optional)

Service Dependencies

Startup Order

Services start in dependency order via Docker Compose health checks.

Service Startup Sequence Diagram

sequenceDiagram
    autonumber
    participant DC as Docker Compose
    participant PG as PostgreSQL
    participant RD as Redis
    participant RT as YOLO26
    participant NM as Nemotron
    participant BE as Backend
    participant FE as Frontend

    Note over DC,FE: Phase 1: Data Infrastructure (0-15s)
    DC->>PG: Start PostgreSQL
    DC->>RD: Start Redis
    PG-->>DC: Healthy (10-15s)
    RD-->>DC: Healthy (5-10s)

    Note over DC,FE: Phase 2: AI Services (60-180s)
    DC->>RT: Start YOLO26
    DC->>NM: Start Nemotron
    RT-->>DC: Healthy (60-90s, model loading)
    NM-->>DC: Healthy (90-120s, VRAM allocation)

    Note over DC,FE: Phase 3: Application (30-60s)
    DC->>BE: Start Backend (depends: PG, RD)
    BE->>PG: Connect
    BE->>RD: Connect
    BE-->>DC: Healthy (30-60s)

    Note over DC,FE: Phase 4: Frontend (10-20s)
    DC->>FE: Start Frontend (depends: BE)
    FE->>BE: Health check
    FE-->>DC: Healthy (10-20s)

Phase 1: Data Infrastructure (0-15s)

  • PostgreSQL (~10-15s)
  • Redis (~5-10s)

Phase 2: AI Services (60-180s)

  • YOLO26 (~60-90s, model loading)
  • Nemotron (~90-120s, VRAM allocation)
  • Florence-2, CLIP, Enrichment (optional)

Phase 3: Application (30-60s)

  • Backend (~30-60s, waits for DB + Redis)

Phase 4: Frontend (10-20s)

  • Frontend (~10-20s, waits for Backend)

Health Check Configuration

# docker-compose.prod.yml example
backend:
  healthcheck:
    test:
      [
        'CMD',
        'python',
        '-c',
        "import httpx; r = httpx.get('http://localhost:8000/api/system/health/ready'); exit(0 if r.status_code == 200 else 1)",
      ]
    interval: 10s
    timeout: 5s
    retries: 3
    start_period: 30s
  depends_on:
    postgres:
      condition: service_healthy
    redis:
      condition: service_healthy

Dependency Matrix

Service Hard Dependencies Soft Dependencies Auto-Recovers
PostgreSQL None None N/A
Redis None None N/A
YOLO26 GPU None No
Nemotron GPU None No
Backend PostgreSQL, Redis AI Services AI via monitor
Frontend Backend None No

Deployment Checklist

Pre-Deployment

  • [ ] Docker/Podman installed and running
  • [ ] NVIDIA driver and container toolkit installed (nvidia-smi works)
  • [ ] Camera FTP directory exists and is accessible
  • [ ] AI models downloaded (./ai/download_models.sh)
  • [ ] Network ports are not in use by other services
  • [ ] Firewall rules allow required traffic
  • [ ] .env file created via python setup.py

Deployment Steps

  1. Start services:
docker compose -f docker-compose.prod.yml up -d
  1. Monitor startup:
docker compose -f docker-compose.prod.yml logs -f
  1. Verify health:
# Wait for services (Redis: ~5s, Postgres: ~15s, AI: ~120s, Backend: ~60s)
curl http://localhost:8000/api/system/health/ready
  1. Test AI pipeline:
# Copy test image
cp backend/data/test_images/sample.jpg /export/foscam/test_camera/test_$(date +%s).jpg

# Monitor processing
docker compose -f docker-compose.prod.yml logs -f backend | grep -E "detect|batch|analyze"
  1. Access dashboard:
  2. Open browser to http://localhost:5173 (dev) or http://localhost (prod)
  3. Verify WebSocket connection status
  4. Check camera grid and activity feed

Post-Deployment

  • [ ] Dashboard accessible
  • [ ] Health endpoint returns healthy
  • [ ] WebSocket connection working
  • [ ] Test image processed successfully
  • [ ] GPU metrics displaying

Upgrade Procedures

Pre-Upgrade

  • [ ] Read release notes for breaking changes
  • [ ] Backup database
  • [ ] Check disk space (at least 10 GB free)
  • [ ] Review new environment variables in .env.example

Upgrade Steps

# 1. Backup
docker compose -f docker-compose.prod.yml exec -T postgres pg_dump -U security -d security -F c > backup-pre-upgrade-$(date +%Y%m%d).dump
cp .env .env.backup-$(date +%Y%m%d)

# 2. Pull updates
git fetch origin
git pull origin main

# 3. Review config changes
diff .env.example .env

# 4. Stop services
docker compose -f docker-compose.prod.yml down

# 5. Apply database migrations
docker compose -f docker-compose.prod.yml up -d postgres
until docker compose -f docker-compose.prod.yml exec postgres pg_isready -U security; do sleep 1; done
docker compose -f docker-compose.prod.yml run --rm backend alembic upgrade head

# 6. Rebuild and start
docker compose -f docker-compose.prod.yml build --no-cache
docker compose -f docker-compose.prod.yml up -d

# 7. Verify
curl http://localhost:8000/api/system/health/ready

Rollback Procedures

Application Rollback (Database Intact)

# 1. Stop current version
docker compose -f docker-compose.prod.yml down

# 2. Checkout previous version
git checkout <previous-commit-sha>

# 3. Restore config if needed
cp .env.backup-<date> .env

# 4. Restart
docker compose -f docker-compose.prod.yml up -d

# 5. Verify
curl http://localhost:8000/api/system/health/ready

Database Rollback

# 1. Stop all services
docker compose -f docker-compose.prod.yml down

# 2. Start only PostgreSQL
docker compose -f docker-compose.prod.yml up -d postgres
until docker compose -f docker-compose.prod.yml exec postgres pg_isready -U security; do sleep 1; done

# 3. Drop and recreate database
docker compose -f docker-compose.prod.yml exec postgres psql -U security -d postgres -c "DROP DATABASE security;"
docker compose -f docker-compose.prod.yml exec postgres psql -U security -d postgres -c "CREATE DATABASE security;"

# 4. Restore from backup
docker compose -f docker-compose.prod.yml exec -T postgres pg_restore -U security -d security < backup-pre-upgrade-<date>.dump

# 5. Checkout previous code and restart
git checkout <previous-commit>
docker compose -f docker-compose.prod.yml up -d

Rollback Decision Matrix

Symptom Action Downtime
Frontend UI broken Rollback frontend only 1-2 min
API errors, DB intact Rollback backend only 2-5 min
Database corruption Restore DB backup + rollback code 10-30 min
AI service crash loop Check GPU, restart AI services 5-10 min
Complete system failure Full rollback (all services + DB) 15-45 min

Troubleshooting

Service Won't Start

# Check container status
docker compose -f docker-compose.prod.yml ps

# Check logs for specific service
docker compose -f docker-compose.prod.yml logs backend
docker compose -f docker-compose.prod.yml logs ai-yolo26

# Check health endpoint
curl -v http://localhost:8000/health

AI Services Unreachable

  1. Check AI container status:
docker ps --filter name=ai-
  1. Test health endpoints directly:
curl http://localhost:8095/health
curl http://localhost:8091/health
  1. Check GPU access:
nvidia-smi
docker compose -f docker-compose.prod.yml exec ai-yolo26 nvidia-smi
  1. Verify URL configuration:
  2. Check Deployment Modes for correct URLs

GPU Out of Memory

# Check GPU usage
nvidia-smi

# Kill GPU processes
nvidia-smi --query-compute-apps=pid --format=csv,noheader | xargs kill

# Restart AI services
docker compose -f docker-compose.prod.yml restart ai-yolo26 ai-llm

Database Connection Failed

# Check PostgreSQL status
docker compose -f docker-compose.prod.yml exec postgres pg_isready -U security

# Check logs
docker compose -f docker-compose.prod.yml logs postgres

# Verify DATABASE_URL in .env
grep DATABASE_URL .env

Health Check Timeout

# Increase start_period for slow model loading
# Edit docker-compose.prod.yml:
# healthcheck:
#   start_period: 120s  # Increase from 60s

See Also