Design Decisions¶

This document captures the key architectural decisions in ADR (Architecture Decision Record) format, with source code citations for verification.

Table of Contents¶

DD-001: LLM-Determined Risk Scoring
DD-002: 90-Second Batch Windows
DD-003: PostgreSQL with Async SQLAlchemy
DD-004: Redis Multi-Pool Architecture
DD-005: On-Demand Model Loading (Model Zoo)
DD-006: WebSocket + Redis Pub/Sub
DD-007: Single-User No-Auth MVP

DD-001: LLM-Determined Risk Scoring¶

Status: Accepted Date: 2024-12-21

Context¶

Each security event needs a risk score (0-100) and risk level (low/medium/high/critical). This could be calculated algorithmically based on rules (e.g., "person at night = high") or determined by the LLM based on contextual understanding.

Decision¶

Let the Nemotron LLM determine risk scores based on contextual analysis rather than using algorithmic rules.

Source: backend/services/nemotron_analyzer.py - NemotronAnalyzer processes batches and extracts risk scores from LLM responses.

Rationale¶

Context-aware: "person at 2am approaching back door" scores higher than "person at 2pm on sidewalk"
Reasoning: LLM provides human-readable explanation of WHY it scored the event
Flexibility: No hardcoded rules to maintain as scenarios evolve
Batch context: LLM sees multiple detections together, understanding motion sequences

Alternatives Rejected¶

Alternative	Why Rejected
Algorithmic rules	Cannot understand context, requires extensive rule maintenance
ML classifier	Needs labeled training data we don't have, still a black box
Hybrid approach	Added complexity without clear benefit given LLM capabilities

Consequences¶

Positive: Contextual understanding, natural language explanations, zero rule maintenance
Negative: 2-5s latency per batch (vs milliseconds for rules), potential inconsistency

DD-002: 90-Second Batch Windows¶

Status: Accepted Date: 2024-12-21

Context¶

A single "person walks to door" scenario generates 15+ camera images over 30 seconds. Processing each image independently would waste GPU resources, generate noisy alerts, and miss contextual story.

Decision¶

Batch detections into 90-second time windows with 30-second idle timeout, then analyze as a single event.

Source: backend/services/batch_aggregator.py:7-13

# Batching Logic:
#     - Create new batch when first detection arrives for a camera
#     - Add subsequent detections within 90-second window
#     - Close batch if:
#         * 90 seconds elapsed from batch start (window timeout)
#         * 30 seconds with no new detections (idle timeout)

Configuration Source: backend/core/config.py:626-634

batch_window_seconds: int = Field(
    default=90,
    gt=0,
    description="Time window for batch processing detections",
)
batch_idle_timeout_seconds: int = Field(
    default=30,
    description="Idle timeout before processing incomplete batch",
)

Rationale¶

Natural grouping: A single activity becomes one coherent event
Reduced API calls: One LLM call per event vs per frame (~90% reduction)
Better context: LLM reasons about sequences ("approached, paused, left")
Configurable: Can tune via BATCH_WINDOW_SECONDS and BATCH_IDLE_TIMEOUT_SECONDS

Alternatives Rejected¶

Alternative	Why Rejected
Immediate per-image	Noisy (15 alerts vs 1), expensive (15 LLM calls), no context
Fixed 60-second windows	May split natural events, delays short events
Event-driven clustering	Complex ML required, harder to implement and debug

Fast Path Exception¶

High-confidence person detections (>90%) bypass batching for immediate alerts.

Source: backend/core/config.py:343-344

FAST_PATH_CONFIDENCE_THRESHOLD=${FAST_PATH_CONFIDENCE_THRESHOLD:-0.90}
FAST_PATH_OBJECT_TYPES=["person"]

DD-003: PostgreSQL with Async SQLAlchemy¶

Status: Accepted Date: 2024-12-28

Context¶

The system needs a database for storing security events, detections, camera configurations, and GPU statistics. Multiple pipeline workers (FileWatcher, BatchAggregator, CleanupService) perform concurrent writes.

Decision¶

Use PostgreSQL with asyncpg async driver via SQLAlchemy 2.0.

Source: backend/core/config.py:313-316

database_url: str = Field(
    default="",
    description="PostgreSQL database URL (format: postgresql+asyncpg://user:pass@host:port/db)...",  # pragma: allowlist secret,
)

Pool Configuration Source: backend/core/config.py:321-344

database_pool_size: int = Field(
    default=20,
    ge=5,
    le=100,
    description="Base number of database connections to maintain in pool",
)
database_pool_overflow: int = Field(
    default=30,
    ge=0,
    le=100,
    description="Additional connections beyond pool_size when under load",
)

Rationale¶

Concurrency: Multiple workers need concurrent writes without blocking
Production-ready: Proven for concurrent workloads, proper transaction isolation
Testing reliability: SQLite's concurrency issues caused test flakiness
Modern features: JSONB for flexible data, full-text search, proper indexing

Alternatives Rejected¶

Alternative	Why Rejected
SQLite	Single-writer limitation causes bottlenecks with parallel workers
MongoDB	Overkill for structured data, additional complexity
In-memory	No persistence, data loss on restart

DD-004: Redis Multi-Pool Architecture¶

Status: Accepted Date: 2025-01-15

Context¶

Redis serves multiple workload types: cache operations (fast, short-lived), queue operations (may block on BLPOP), pub/sub (long-lived connections), and rate limiting (high frequency). A single connection pool cannot optimize for all patterns.

Decision¶

Implement dedicated connection pools for each workload type.

Source: backend/core/redis.py:50-66

class PoolType(str, Enum):
    """Redis connection pool types for workload isolation (NEM-3368)."""

    CACHE = "cache"
    """Pool for cache operations (get/set/delete) - fast, high availability."""

    QUEUE = "queue"
    """Pool for queue operations (BLPOP/RPUSH) - can block."""

    PUBSUB = "pubsub"
    """Pool for pub/sub operations - long-lived connections."""

    RATELIMIT = "ratelimit"
    """Pool for rate limiting operations - high frequency."""

Pool Size Configuration Source: backend/core/config.py:370-405

redis_pool_dedicated_enabled: bool = Field(
    default=True,
    description="Enable dedicated connection pools by workload type.",
)
redis_pool_size_cache: int = Field(
    default=20,
    description="Max connections for cache operations.",
)
redis_pool_size_queue: int = Field(
    default=15,
    description="Max connections for queue operations.",
)
redis_pool_size_pubsub: int = Field(
    default=10,
    description="Max connections for pub/sub operations.",
)
redis_pool_size_ratelimit: int = Field(
    default=10,
    description="Max connections for rate limiting operations.",
)

Rationale¶

Workload isolation: Blocking queue ops don't starve cache operations
Optimized sizing: Each pool sized for its access pattern
Connection efficiency: Pub/sub doesn't compete with high-frequency cache
Graceful fallback: Can disable with redis_pool_dedicated_enabled=False

Alternatives Rejected¶

Alternative	Why Rejected
Single pool	Blocking BLPOP exhausts connections, starves cache
Per-operation pools	Too many pools, management overhead
No pooling	Connection overhead per operation

DD-005: On-Demand Model Loading (Model Zoo)¶

Status: Accepted Date: 2025-01-10

Context¶

The system supports many optional AI models for enrichment (license plate detection, face detection, pose estimation, etc.). Loading all models at startup would exhaust GPU VRAM. A 24GB GPU has limited budget after core models.

Decision¶

Implement on-demand model loading via ModelManager with VRAM budget constraints.

Source: backend/services/model_zoo.py:1-38

"""Model Zoo for on-demand model loading.

VRAM Budget:
    - Nemotron LLM: 21,700 MB (always loaded)
    - YOLO26: 650 MB (always loaded)
    - Available for Model Zoo: ~1,650 MB
    - Models load sequentially, never concurrently
"""

Enrichment VRAM Budget Source: docker-compose.prod.yml:284

# VRAM management
- VRAM_BUDGET_GB=6.8

Rationale¶

VRAM efficiency: Only loaded models consume GPU memory
Flexible enrichment: Add models without pre-allocating VRAM
Sequential loading: Prevents concurrent load spikes
Auto-unload: Context managers release memory when done

VRAM Allocation¶

Component	VRAM (MB)	Notes
Nemotron LLM	~21,700	Always loaded, Q4_K_M quantization
YOLO26	~650	Always loaded, object detection
Model Zoo Budget	~1,650	On-demand enrichment models
Total	~24,000	Fits RTX A5500 24GB

Alternatives Rejected¶

Alternative	Why Rejected
Load all at startup	Exceeds 24GB VRAM budget
CPU fallback	Too slow for real-time processing
External API	Adds latency, requires network

DD-006: WebSocket + Redis Pub/Sub¶

Status: Accepted Date: 2024-12-21

Context¶

The dashboard needs real-time updates for security events, system status (GPU, cameras), and worker health. Multiple backend instances may be deployed behind a load balancer.

Decision¶

Use WebSocket for client connections with Redis pub/sub as the event backbone.

Source: backend/services/event_broadcaster.py:1-8

"""Event broadcaster service for WebSocket real-time event distribution.

This service manages WebSocket connections and broadcasts security events
to all connected clients using Redis pub/sub as the event backbone.
"""

Channel Configuration Source: backend/core/config.py:359-362

redis_event_channel: str = Field(
    default="security_events",
    description="Redis pub/sub channel for security events",
)

Rationale¶

Sub-second delivery: Events reach dashboard instantly
Bidirectional: Clients can send messages back
Multi-instance support: Redis pub/sub fans out to all backend instances
Native browser support: No polyfills needed

Communication Pattern¶

NemotronAnalyzer --> Redis PUBLISH --> EventBroadcaster(s) --> WebSocket --> Dashboard

Alternatives Rejected¶

Alternative	Why Rejected
Polling	Latency, wasted requests, server load
SSE	No bidirectional, limited connections
Direct WebSocket	Doesn't scale to multiple backend instances

DD-007: Single-User No-Auth MVP¶

Status: Accepted Date: 2024-12-21

Context¶

This is a single-user home security system deployed on a trusted local network. The system processes sensitive camera images that should never leave the local network.

Decision¶

No authentication for MVP (now superseded by SetupGuardMiddleware -- first admin registration required, after which API is open). System assumes trusted network access by single user.

Source: backend/main.py:1002-1003

# Add authentication middleware (if enabled in settings)
app.add_middleware(AuthMiddleware)

Auth Middleware Source: The AuthMiddleware exists but is disabled by default, allowing opt-in authentication when needed.

Rationale¶

Simplicity: Zero authentication complexity
Local deployment: Designed for LAN access only
Privacy focus: All processing stays local, no cloud
Fast iteration: No password management or recovery flows

Security Mitigations¶

Control	Implementation
Path traversal prevention	Media endpoints validate paths
Local-only design	Documentation warns against internet exposure
Optional API keys	Can be enabled via `API_KEY_ENABLED=true`
CORS restrictions	Configurable via `CORS_ORIGINS`

Future Considerations¶

For multi-user or internet-facing deployments:

Enable API_KEY_ENABLED=true with strong keys
Use HTTPS for all endpoints
Add rate limiting
Deploy behind reverse proxy with TLS

Alternatives Rejected¶

Alternative	Why Rejected
JWT tokens	Overkill for single-user local system
OAuth/OIDC	Requires identity provider, massive complexity
Basic auth	Passwords in headers, poor UX

Decision Dependencies¶

flowchart TB
    DD001["DD-001<br/>LLM Risk Scoring"]
    DD002["DD-002<br/>90s Batch Windows"]
    DD003["DD-003<br/>PostgreSQL"]
    DD004["DD-004<br/>Redis Multi-Pool"]
    DD005["DD-005<br/>Model Zoo"]
    DD006["DD-006<br/>WebSocket + Pub/Sub"]
    DD007["DD-007<br/>No Auth MVP"]

    DD002 --> DD001
    DD003 --> DD002
    DD004 --> DD002
    DD004 --> DD006
    DD005 --> DD001

Design Decisions¶

Table of Contents¶

DD-001: LLM-Determined Risk Scoring¶

Context¶

Decision¶

Rationale¶

Alternatives Rejected¶

Consequences¶

DD-002: 90-Second Batch Windows¶

Context¶

Decision¶

Rationale¶

Alternatives Rejected¶

Fast Path Exception¶

DD-003: PostgreSQL with Async SQLAlchemy¶

Context¶

Decision¶

Rationale¶

Alternatives Rejected¶

DD-004: Redis Multi-Pool Architecture¶

Context¶

Decision¶

Rationale¶

Alternatives Rejected¶

DD-005: On-Demand Model Loading (Model Zoo)¶

Context¶

Decision¶

Rationale¶

VRAM Allocation¶

Alternatives Rejected¶

DD-006: WebSocket + Redis Pub/Sub¶

Context¶

Decision¶

Rationale¶

Communication Pattern¶

Alternatives Rejected¶

DD-007: Single-User No-Auth MVP¶

Context¶

Decision¶

Rationale¶

Security Mitigations¶

Future Considerations¶

Alternatives Rejected¶

Decision Dependencies¶

Related Documentation¶