Design Decisions¶
This document captures the key architectural decisions in ADR (Architecture Decision Record) format, with source code citations for verification.
Table of Contents¶
- DD-001: LLM-Determined Risk Scoring
- DD-002: 90-Second Batch Windows
- DD-003: PostgreSQL with Async SQLAlchemy
- DD-004: Redis Multi-Pool Architecture
- DD-005: On-Demand Model Loading (Model Zoo)
- DD-006: WebSocket + Redis Pub/Sub
- DD-007: Single-User No-Auth MVP
DD-001: LLM-Determined Risk Scoring¶
Status: Accepted Date: 2024-12-21
Context¶
Each security event needs a risk score (0-100) and risk level (low/medium/high/critical). This could be calculated algorithmically based on rules (e.g., "person at night = high") or determined by the LLM based on contextual understanding.
Decision¶
Let the Nemotron LLM determine risk scores based on contextual analysis rather than using algorithmic rules.
Source: backend/services/nemotron_analyzer.py - NemotronAnalyzer processes batches and extracts risk scores from LLM responses.
Rationale¶
- Context-aware: "person at 2am approaching back door" scores higher than "person at 2pm on sidewalk"
- Reasoning: LLM provides human-readable explanation of WHY it scored the event
- Flexibility: No hardcoded rules to maintain as scenarios evolve
- Batch context: LLM sees multiple detections together, understanding motion sequences
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| Algorithmic rules | Cannot understand context, requires extensive rule maintenance |
| ML classifier | Needs labeled training data we don't have, still a black box |
| Hybrid approach | Added complexity without clear benefit given LLM capabilities |
Consequences¶
- Positive: Contextual understanding, natural language explanations, zero rule maintenance
- Negative: 2-5s latency per batch (vs milliseconds for rules), potential inconsistency
DD-002: 90-Second Batch Windows¶
Status: Accepted Date: 2024-12-21
Context¶
A single "person walks to door" scenario generates 15+ camera images over 30 seconds. Processing each image independently would waste GPU resources, generate noisy alerts, and miss contextual story.
Decision¶
Batch detections into 90-second time windows with 30-second idle timeout, then analyze as a single event.
Source: backend/services/batch_aggregator.py:7-13
# Batching Logic:
# - Create new batch when first detection arrives for a camera
# - Add subsequent detections within 90-second window
# - Close batch if:
# * 90 seconds elapsed from batch start (window timeout)
# * 30 seconds with no new detections (idle timeout)
Configuration Source: backend/core/config.py:626-634
batch_window_seconds: int = Field(
default=90,
gt=0,
description="Time window for batch processing detections",
)
batch_idle_timeout_seconds: int = Field(
default=30,
description="Idle timeout before processing incomplete batch",
)
Rationale¶
- Natural grouping: A single activity becomes one coherent event
- Reduced API calls: One LLM call per event vs per frame (~90% reduction)
- Better context: LLM reasons about sequences ("approached, paused, left")
- Configurable: Can tune via
BATCH_WINDOW_SECONDSandBATCH_IDLE_TIMEOUT_SECONDS
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| Immediate per-image | Noisy (15 alerts vs 1), expensive (15 LLM calls), no context |
| Fixed 60-second windows | May split natural events, delays short events |
| Event-driven clustering | Complex ML required, harder to implement and debug |
Fast Path Exception¶
High-confidence person detections (>90%) bypass batching for immediate alerts.
Source: backend/core/config.py:343-344
FAST_PATH_CONFIDENCE_THRESHOLD=${FAST_PATH_CONFIDENCE_THRESHOLD:-0.90}
FAST_PATH_OBJECT_TYPES=["person"]
DD-003: PostgreSQL with Async SQLAlchemy¶
Status: Accepted Date: 2024-12-28
Context¶
The system needs a database for storing security events, detections, camera configurations, and GPU statistics. Multiple pipeline workers (FileWatcher, BatchAggregator, CleanupService) perform concurrent writes.
Decision¶
Use PostgreSQL with asyncpg async driver via SQLAlchemy 2.0.
Source: backend/core/config.py:313-316
database_url: str = Field(
default="",
description="PostgreSQL database URL (format: postgresql+asyncpg://user:pass@host:port/db)...", # pragma: allowlist secret,
)
Pool Configuration Source: backend/core/config.py:321-344
database_pool_size: int = Field(
default=20,
ge=5,
le=100,
description="Base number of database connections to maintain in pool",
)
database_pool_overflow: int = Field(
default=30,
ge=0,
le=100,
description="Additional connections beyond pool_size when under load",
)
Rationale¶
- Concurrency: Multiple workers need concurrent writes without blocking
- Production-ready: Proven for concurrent workloads, proper transaction isolation
- Testing reliability: SQLite's concurrency issues caused test flakiness
- Modern features: JSONB for flexible data, full-text search, proper indexing
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| SQLite | Single-writer limitation causes bottlenecks with parallel workers |
| MongoDB | Overkill for structured data, additional complexity |
| In-memory | No persistence, data loss on restart |
DD-004: Redis Multi-Pool Architecture¶
Status: Accepted Date: 2025-01-15
Context¶
Redis serves multiple workload types: cache operations (fast, short-lived), queue operations (may block on BLPOP), pub/sub (long-lived connections), and rate limiting (high frequency). A single connection pool cannot optimize for all patterns.
Decision¶
Implement dedicated connection pools for each workload type.
Source: backend/core/redis.py:50-66
class PoolType(str, Enum):
"""Redis connection pool types for workload isolation (NEM-3368)."""
CACHE = "cache"
"""Pool for cache operations (get/set/delete) - fast, high availability."""
QUEUE = "queue"
"""Pool for queue operations (BLPOP/RPUSH) - can block."""
PUBSUB = "pubsub"
"""Pool for pub/sub operations - long-lived connections."""
RATELIMIT = "ratelimit"
"""Pool for rate limiting operations - high frequency."""
Pool Size Configuration Source: backend/core/config.py:370-405
redis_pool_dedicated_enabled: bool = Field(
default=True,
description="Enable dedicated connection pools by workload type.",
)
redis_pool_size_cache: int = Field(
default=20,
description="Max connections for cache operations.",
)
redis_pool_size_queue: int = Field(
default=15,
description="Max connections for queue operations.",
)
redis_pool_size_pubsub: int = Field(
default=10,
description="Max connections for pub/sub operations.",
)
redis_pool_size_ratelimit: int = Field(
default=10,
description="Max connections for rate limiting operations.",
)
Rationale¶
- Workload isolation: Blocking queue ops don't starve cache operations
- Optimized sizing: Each pool sized for its access pattern
- Connection efficiency: Pub/sub doesn't compete with high-frequency cache
- Graceful fallback: Can disable with
redis_pool_dedicated_enabled=False
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| Single pool | Blocking BLPOP exhausts connections, starves cache |
| Per-operation pools | Too many pools, management overhead |
| No pooling | Connection overhead per operation |
DD-005: On-Demand Model Loading (Model Zoo)¶
Status: Accepted Date: 2025-01-10
Context¶
The system supports many optional AI models for enrichment (license plate detection, face detection, pose estimation, etc.). Loading all models at startup would exhaust GPU VRAM. A 24GB GPU has limited budget after core models.
Decision¶
Implement on-demand model loading via ModelManager with VRAM budget constraints.
Source: backend/services/model_zoo.py:1-38
"""Model Zoo for on-demand model loading.
VRAM Budget:
- Nemotron LLM: 21,700 MB (always loaded)
- YOLO26: 650 MB (always loaded)
- Available for Model Zoo: ~1,650 MB
- Models load sequentially, never concurrently
"""
Enrichment VRAM Budget Source: docker-compose.prod.yml:284
Rationale¶
- VRAM efficiency: Only loaded models consume GPU memory
- Flexible enrichment: Add models without pre-allocating VRAM
- Sequential loading: Prevents concurrent load spikes
- Auto-unload: Context managers release memory when done
VRAM Allocation¶
| Component | VRAM (MB) | Notes |
|---|---|---|
| Nemotron LLM | ~21,700 | Always loaded, Q4_K_M quantization |
| YOLO26 | ~650 | Always loaded, object detection |
| Model Zoo Budget | ~1,650 | On-demand enrichment models |
| Total | ~24,000 | Fits RTX A5500 24GB |
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| Load all at startup | Exceeds 24GB VRAM budget |
| CPU fallback | Too slow for real-time processing |
| External API | Adds latency, requires network |
DD-006: WebSocket + Redis Pub/Sub¶
Status: Accepted Date: 2024-12-21
Context¶
The dashboard needs real-time updates for security events, system status (GPU, cameras), and worker health. Multiple backend instances may be deployed behind a load balancer.
Decision¶
Use WebSocket for client connections with Redis pub/sub as the event backbone.
Source: backend/services/event_broadcaster.py:1-8
"""Event broadcaster service for WebSocket real-time event distribution.
This service manages WebSocket connections and broadcasts security events
to all connected clients using Redis pub/sub as the event backbone.
"""
Channel Configuration Source: backend/core/config.py:359-362
redis_event_channel: str = Field(
default="security_events",
description="Redis pub/sub channel for security events",
)
Rationale¶
- Sub-second delivery: Events reach dashboard instantly
- Bidirectional: Clients can send messages back
- Multi-instance support: Redis pub/sub fans out to all backend instances
- Native browser support: No polyfills needed
Communication Pattern¶
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| Polling | Latency, wasted requests, server load |
| SSE | No bidirectional, limited connections |
| Direct WebSocket | Doesn't scale to multiple backend instances |
DD-007: Single-User No-Auth MVP¶
Status: Accepted Date: 2024-12-21
Context¶
This is a single-user home security system deployed on a trusted local network. The system processes sensitive camera images that should never leave the local network.
Decision¶
No authentication for MVP (now superseded by SetupGuardMiddleware -- first admin registration required, after which API is open). System assumes trusted network access by single user.
Source: backend/main.py:1002-1003
Auth Middleware Source: The AuthMiddleware exists but is disabled by default, allowing opt-in authentication when needed.
Rationale¶
- Simplicity: Zero authentication complexity
- Local deployment: Designed for LAN access only
- Privacy focus: All processing stays local, no cloud
- Fast iteration: No password management or recovery flows
Security Mitigations¶
| Control | Implementation |
|---|---|
| Path traversal prevention | Media endpoints validate paths |
| Local-only design | Documentation warns against internet exposure |
| Optional API keys | Can be enabled via API_KEY_ENABLED=true |
| CORS restrictions | Configurable via CORS_ORIGINS |
Future Considerations¶
For multi-user or internet-facing deployments:
- Enable
API_KEY_ENABLED=truewith strong keys - Use HTTPS for all endpoints
- Add rate limiting
- Deploy behind reverse proxy with TLS
Alternatives Rejected¶
| Alternative | Why Rejected |
|---|---|
| JWT tokens | Overkill for single-user local system |
| OAuth/OIDC | Requires identity provider, massive complexity |
| Basic auth | Passwords in headers, poor UX |
Decision Dependencies¶
flowchart TB
DD001["DD-001<br/>LLM Risk Scoring"]
DD002["DD-002<br/>90s Batch Windows"]
DD003["DD-003<br/>PostgreSQL"]
DD004["DD-004<br/>Redis Multi-Pool"]
DD005["DD-005<br/>Model Zoo"]
DD006["DD-006<br/>WebSocket + Pub/Sub"]
DD007["DD-007<br/>No Auth MVP"]
DD002 --> DD001
DD003 --> DD002
DD004 --> DD002
DD004 --> DD006
DD005 --> DD001