Architecture Decision Records (ADRs)¶
This document captures the key architectural decisions made during the development of the Home Security Intelligence system. Each decision follows the ADR format: Context, Decision, Alternatives Considered, and Consequences.
Table of Contents¶
- ADR-001: PostgreSQL for Database
- ADR-002: Redis for Queues and Pub/Sub
- ADR-003: Detection Batching Strategy
- ADR-004: Fully Containerized Deployment with GPU Passthrough
- ADR-005: No Authentication
- ADR-006: YOLO26 for Object Detection
- ADR-007: Nemotron for Risk Analysis
- ADR-008: FastAPI + React Stack
- ADR-009: WebSocket for Real-time Updates
- ADR-010: LLM-Determined Risk Scoring
- ADR-011: Native Tremor Charts over Grafana Embeds
- ADR-012: WebSocket Circuit Breaker Pattern
ADR-001: PostgreSQL for Database¶
Status: Accepted (Revised) Date: 2024-12-21 (Updated: 2024-12-28)
Context¶
We needed a database for storing security events, detections, camera configurations, and GPU statistics. The system is designed for single-user, local deployment on a home network with moderate data volume (30-day retention, ~5-8 cameras).
The system requires a database for storing security events, detections, camera configurations, and GPU statistics. Testing revealed that PostgreSQL was essential for handling the AI pipeline's parallel workers with concurrent writes.
Decision¶
Use PostgreSQL with asyncpg async driver.
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| PostgreSQL | Battle-tested, concurrent writes, full-text search, JSONB, handles parallel workers | Requires separate process, slightly more complex deployment |
| SQLite | Zero setup, embedded, file-based backup | Single-writer limitation causes bottlenecks with multiple pipeline workers |
| MongoDB | Flexible schema, good for event-like data | Additional complexity, overkill for structured data |
| In-memory only | Fastest, simplest | No persistence, data loss on restart |
Why PostgreSQL Won¶
- Concurrency: Multiple pipeline workers (FileWatcher, BatchAggregator, CleanupService) need concurrent writes without blocking
- Production-ready: Proven for concurrent workloads, no need for WAL mode tuning or busy timeouts
- Testing reliability: SQLite's concurrency issues cause test flakiness and race conditions
- Future-proofing: Easy migration path if scaling beyond single-node
- Modern features: JSONB for flexible data, full-text search, better indexing
Consequences¶
Positive:
- True concurrent read/write access without locking issues
- No write contention between pipeline workers
- Reliable test suite without flaky database locks
- Better tooling (pgAdmin, DBeaver, psql CLI)
- More robust transaction handling
Negative:
- Requires PostgreSQL service (but already using Docker Compose)
- Requires separate database process (handled by Docker Compose)
- Database must be created before first run
Mitigations:
- Docker Compose handles PostgreSQL automatically
- Database initialization happens on first app startup
- Connection pooling handles multiple workers efficiently
ADR-002: Redis for Queues and Pub/Sub¶
Status: Accepted Date: 2024-12-21
Context¶
The AI processing pipeline requires:
- A queue for passing image paths from File Watcher to YOLO26 detector
- A queue for passing detection batches to Nemotron analyzer
- Pub/Sub for real-time event broadcasting to WebSocket clients
- Temporary storage for batch aggregation state
Decision¶
Use Redis as a multi-purpose infrastructure component for queues, pub/sub, and temporary state - not just caching.
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| RabbitMQ | Purpose-built message broker, advanced routing | Additional service, overkill for simple queues |
| Redis | Multi-purpose (queue + pub/sub + cache), simple, fast | In-memory by default, less durable than RabbitMQ |
| Kafka | High throughput, persistent, replay | Massive overkill, complex setup |
| In-process queues | No external dependency | Lost on restart, no persistence, single-process only |
Consequences¶
Positive:
- Single service handles multiple concerns (queues, pub/sub, batch state)
- Sub-millisecond operations for queue operations
- Built-in TTL for automatic key expiration (orphan cleanup)
- Connection pooling with health checks for reliability
- Simple LIST operations (
RPUSH,BLPOP) for queue semantics
Negative:
- In-memory by default - queue contents lost on Redis restart
- Another service to deploy and monitor
- Must handle connection failures with retry logic
Implementation Details:
# Queue operations with automatic JSON serialization
await redis.add_to_queue("detection_queue", {"image_path": path, "camera_id": camera})
item = await redis.get_from_queue("detection_queue", timeout=5)
# Pub/Sub for real-time updates
await redis.publish("events", {"type": "new_event", "data": event})
# Batch state with TTL for orphan cleanup
await redis.set(f"batch:{batch_id}:detections", detections, expire=3600)
ADR-003: Detection Batching Strategy¶
Status: Accepted Date: 2024-12-21
Context¶
A single "person walks to door" event might generate 15+ camera images over 30 seconds. Processing each image independently through the LLM would:
- Waste GPU resources on repetitive analysis
- Generate 15 separate "person detected" alerts instead of one meaningful event
- Miss the contextual story (approached, paused, left)
Decision¶
Batch detections into time windows with idle timeout, then analyze the batch as a single event.
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| Immediate per-image | Fastest alerts, simplest logic | Noisy, expensive, no context |
| Fixed 60-second windows | Predictable timing | May split natural events, delays short events |
| 90s window + 30s idle | Balances context and latency | More complex state management |
| Event-driven clustering | Most intelligent grouping | Complex ML, harder to implement |
Consequences¶
Positive:
- Natural grouping - a single "person approaches door" becomes one event
- LLM can reason about sequences ("person approached, paused, left")
- Reduces LLM calls by ~90% compared to per-image processing
- Configurable via
BATCH_WINDOW_SECONDSandBATCH_IDLE_TIMEOUT_SECONDS
Negative:
- Maximum 90-second delay for alerts (worst case)
- Idle timeout adds complexity to batch management
- Redis keys must have TTL for orphan cleanup if service crashes
Fast Path Exception: High-confidence critical detections (person >90%) bypass batching for immediate alerts:
if confidence >= 0.90 and object_type == "person":
await analyzer.analyze_detection_fast_path(camera_id, detection_id)
flowchart TB
subgraph "Batching Logic"
A[New Detection]
A --> B{Fast Path?}
B -->|Yes: person >90%| C[Immediate Analysis]
B -->|No| D{Active Batch?}
D -->|No| E[Create Batch]
D -->|Yes| F[Add to Batch]
E --> G[Start 90s Timer]
F --> H[Reset 30s Idle Timer]
G --> I{Timeout?}
H --> I
I -->|Window: 90s| J[Close Batch]
I -->|Idle: 30s| J
J --> K[Queue for Analysis]
end ADR-004: Fully Containerized Deployment with GPU Passthrough¶
Status: Accepted (Updated 2024-12-30) Date: 2024-12-21 (Updated 2024-12-30)
Context¶
The system requires:
- Backend API service (FastAPI)
- Frontend application (React)
- PostgreSQL database
- Redis for queues/pub/sub
- YOLO26 object detection (~4GB VRAM)
- Nemotron-3-Nano-30B risk analysis (~14.7GB VRAM, Q4_K_M quantization)
NVIDIA Container Toolkit (CDI) has matured significantly, enabling reliable GPU passthrough in containers. This allows all services to be deployed uniformly using Docker Compose.
Decision¶
Use fully containerized deployment with Docker Compose for all services, including GPU-intensive AI models using NVIDIA Container Toolkit (CDI).
Update History¶
2024-12-30: Finalized fully containerized architecture. All services run in containers with GPU passthrough via NVIDIA Container Toolkit.
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| All Containerized | Uniform deployment, single compose file, easier ops | Requires NVIDIA Container Toolkit setup |
| All Native | Best performance, simplest GPU access | No isolation, harder dependency management |
| Kubernetes | Production-grade orchestration | Massive overkill for single-node home deployment |
Consequences¶
Positive:
- Single deployment command for all services (
docker compose up -d) - Consistent container management across all services
- GPU performance is comparable to native (minimal overhead with CDI)
- Better reproducibility - all dependencies in container images
- Standard Docker Compose files work with both Docker and Podman
Negative:
- Requires NVIDIA Container Toolkit (nvidia-container-toolkit) installation
- CDI configuration needed for GPU access in containers
- Slightly more complex initial setup than running AI services natively
Deployment Commands:
# Start ALL services (including AI with GPU)
docker compose -f docker-compose.prod.yml up -d
# Or using Podman
podman-compose -f docker-compose.prod.yml up -d
# Check AI container GPU usage
nvidia-smi --query-compute-apps=pid,name,used_memory --format=csv
flowchart TB
subgraph Containers["Docker/Podman Containers"]
FE[Frontend :5173]
BE[Backend :8000]
RD[Redis :6379]
PG[PostgreSQL :5432]
DET[YOLO26 :8095]
LLM[Nemotron :8091]
end
subgraph Hardware["GPU Hardware (via CDI)"]
GPU[NVIDIA RTX A5500 24GB]
end
FE -->|API/WS| BE
BE --> RD
BE --> PG
BE -->|HTTP| DET
BE -->|HTTP| LLM
DET -->|CDI| GPU
LLM -->|CDI| GPU ADR-005: No Authentication¶
Status: Superseded — The system now requires first-time admin registration via SetupGuardMiddleware (returns 503 until first user is created). After registration, API endpoints are open. Per-route protections (verify_api_key, require_admin_access) guard admin/destructive operations.
Date: 2024-12-21
Context¶
This is a single-user home security system deployed on a trusted local network. Adding authentication would increase complexity without providing significant value for the target use case.
Decision¶
No authentication for MVP (now superseded by SetupGuardMiddleware; see status note above). The system assumes it runs on a trusted network and is accessed by a single user.
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| No Auth | Simple, fast development, no password management | Insecure if exposed to internet |
| Basic Auth | Simple to implement | Passwords transmitted in headers, annoying UX |
| JWT Tokens | Stateless, industry standard | Implementation complexity, token management |
| OAuth/OIDC | Enterprise-grade, SSO | Massive overkill, requires identity provider |
Consequences¶
Positive:
- Zero authentication complexity
- No password management or recovery flows
- Faster development and iteration
- Simpler API client code
Negative:
- MUST NOT expose to the internet without adding auth
- No audit trail of who accessed what
- No multi-user support
Security Mitigations:
- Path traversal protection on media endpoints
- Designed for localhost/LAN access only
- Documentation warns against public exposure
WARNING: This system is designed for local/trusted network use.
DO NOT expose to the internet without adding authentication.
ADR-006: YOLO26 for Object Detection¶
Status: Accepted Date: 2024-12-21
Context¶
We need fast, accurate object detection to identify security-relevant objects (people, vehicles, animals) in camera images. The model must run locally on consumer GPU hardware.
Decision¶
Use YOLO26 (Real-Time Detection Transformer v2) loaded via HuggingFace Transformers.
Alternatives Considered¶
| Model | Accuracy (mAP) | Speed | VRAM | Pros | Cons |
|---|---|---|---|---|---|
| YOLOv8 | ~53% | 5-10ms | ~2GB | Fast, lightweight, well-documented | Older architecture, less accurate |
| YOLO26 | ~54% | 15-30ms | ~3GB | Transformer-based, good accuracy | First generation |
| YOLO26 | ~56% | 30-50ms | ~4GB | Best accuracy, end-to-end | Slightly slower, more VRAM |
| DINO | ~63% | 80-100ms | ~8GB | Highest accuracy | Too slow for real-time |
Consequences¶
Positive:
- State-of-the-art accuracy for real-time detection
- Transformer architecture - better at understanding spatial relationships
- HuggingFace integration - easy model loading and inference
- Trained on COCO + Objects365 - recognizes 80+ object classes
Negative:
- ~4GB VRAM usage (acceptable for RTX A5500)
- 30-50ms inference time (vs. 5-10ms for YOLOv8)
- Requires CUDA - no CPU fallback for production use
Security-Relevant Classes: Only these classes are returned to reduce noise:
person,car,truck,dog,cat,bird,bicycle,motorcycle,bus
All other COCO classes (chairs, bottles, etc.) are filtered out.
ADR-007: Nemotron for Risk Analysis¶
Status: Accepted Date: 2024-12-21
Context¶
After detecting objects, we need an AI model to analyze the context and provide risk assessment. This requires natural language understanding and reasoning capabilities.
Decision¶
Use Nemotron-3-Nano-30B-A3B via llama.cpp for local LLM inference in production. Development can use the smaller Nemotron Mini 4B for faster iteration.
Alternatives Considered¶
| Option | Pros | Cons |
|---|---|---|
| Cloud API (GPT-4, Claude) | Best quality, no GPU needed | Privacy concerns, latency, cost, internet required |
| Local Llama 2 7B | Open source, good quality | ~14GB VRAM, slower |
| Local Nemotron 30B | NVIDIA-optimized, 128K context | ~14.7GB VRAM |
| Local Nemotron 4B | NVIDIA-optimized, compact, fast | Smaller context (4K), less capable |
| Rule-based scoring | Deterministic, fast, no ML | Cannot understand context, no reasoning |
Consequences¶
Positive:
- Privacy: All data stays local - no images or events sent to cloud
- Latency: 2-5 seconds per batch vs. 5-10 seconds for cloud APIs
- Cost: Zero inference cost after initial setup
- Offline: Works without internet connectivity
- NVIDIA-optimized: Designed for efficient inference on NVIDIA GPUs
- Large context: 128K token context window (production 30B model)
Negative:
- Lower quality than GPT-4 or Claude for complex reasoning
- Q4_K_M quantization trades some accuracy for speed
- Requires ~14.7GB VRAM (Nemotron-3-Nano-30B) or ~3GB (Mini 4B dev)
Why Local over Cloud:
Camera images contain sensitive data (home interior, family members).
Sending to cloud APIs would require:
- User consent for data processing
- GDPR compliance considerations
- Trust in third-party data handling
- Internet dependency for core functionality
Local inference eliminates all these concerns.
ADR-008: FastAPI + React Stack¶
Status: Accepted Date: 2024-12-21
Context¶
We need a web framework for the backend API and a frontend framework for the dashboard. The stack should be modern, well-supported, and suitable for real-time applications.
Decision¶
Use FastAPI (Python) for backend and React + TypeScript + Tailwind + Tremor for frontend.
Alternatives Considered¶
Backend: | Framework | Pros | Cons | |-----------|------|------| | FastAPI | Async-native, auto-docs, type hints, WebSocket support | Python ecosystem | | Django | Batteries-included, ORM | Sync by default, heavier | | Node.js Express | JavaScript everywhere | Different ecosystem from AI code | | Go Fiber | Very fast, low memory | Fewer AI/ML libraries |
Frontend: | Framework | Pros | Cons | |-----------|------|------| | React | Huge ecosystem, hooks, well-documented | Bundle size, learning curve | | Vue | Simpler, good docs | Smaller ecosystem | | Svelte | Smallest bundle, fast | Newer, smaller community | | HTMX | Minimal JavaScript | Limited for complex UIs |
Consequences¶
Positive:
- FastAPI: Native async/await matches AI inference patterns
- FastAPI: Automatic OpenAPI docs at
/docs - FastAPI: Built-in WebSocket support for real-time updates
- React: Tremor provides excellent pre-built dashboard components
- React: TypeScript catches errors at compile time
- Python backend: Same language as AI models - easier integration
Negative:
- Two languages (Python + TypeScript) to maintain
- React bundle size larger than minimal alternatives
- FastAPI requires async mindset throughout
ADR-009: WebSocket for Real-time Updates¶
Status: Accepted Date: 2024-12-21
Context¶
The dashboard needs real-time updates for:
- New security events as they are analyzed
- System status (GPU utilization, temperature)
- Camera status changes
Decision¶
Use WebSocket connections for real-time, bidirectional communication.
Alternatives Considered¶
| Technique | Pros | Cons |
|---|---|---|
| Polling | Simple, works everywhere | Latency, wasted requests, server load |
| Long Polling | Better than polling | Connection overhead, complexity |
| SSE (Server-Sent Events) | Simple, one-way streaming | No bidirectional, limited browser connections |
| WebSocket | Full duplex, low latency, efficient | More complex setup, connection management |
Consequences¶
Positive:
- Sub-second event delivery to dashboard
- Bidirectional - client can send messages back
- Single persistent connection per client
- Native browser support (no polyfills)
Negative:
- Must handle reconnection logic on client
- Connection state management complexity
- Proxy/firewall considerations
Implementation:
// Client-side reconnection with exponential backoff
const { isConnected, lastMessage } = useWebSocket({
url: 'ws://localhost:8000/ws/events',
reconnect: true,
reconnectInterval: 3000,
reconnectAttempts: 5,
});
WebSocket Channels:
ADR-010: LLM-Determined Risk Scoring¶
Status: Accepted Date: 2024-12-21
Context¶
Each security event needs a risk score (0-100) and risk level (low/medium/high/critical). This could be calculated algorithmically or determined by the LLM.
Decision¶
Let the LLM determine risk scores based on context, rather than using algorithmic rules.
Alternatives Considered¶
| Approach | Pros | Cons |
|---|---|---|
| Algorithmic rules | Deterministic, explainable, fast | Cannot understand context, many edge cases |
| ML classifier | Learns patterns, fast inference | Needs training data, black box |
| LLM-determined | Understands context, provides reasoning | Slower, may be inconsistent |
| Hybrid | Best of both | Complex to implement and maintain |
Consequences¶
Positive:
- Context-aware: "person at 2am" scores higher than "person at 2pm"
- Reasoning: LLM explains WHY it scored the event
- Flexible: No hardcoded rules to maintain
- Natural language: Human-readable explanations
Negative:
- ~2-5 seconds per analysis (vs. milliseconds for rules)
- Potential inconsistency between runs
- Dependent on prompt engineering
Risk Scoring Guidelines (in LLM prompt):
See Risk Levels Reference for the canonical definition.
ADR-011: Native Tremor Charts over Grafana Embeds¶
Status: Accepted Date: 2024-12-27
Context¶
The dashboard needs to display system metrics (GPU utilization, memory, temperature). We considered embedding Grafana panels vs. using native React chart components.
Decision¶
Use native Tremor charts for dashboard metrics. Link to standalone Grafana for detailed exploration.
Alternatives Considered¶
| Option | Pros | Cons |
|---|---|---|
| Grafana iframe embeds | Rich visualization, pre-built dashboards | Auth complexity, CSP issues, cross-origin problems |
| Grafana public dashboards | Shareable, no auth | Requires Enterprise or public exposure |
| Grafana image rendering | Simple integration | Stale data, polling overhead |
| Native Tremor charts | Simple, no auth, already in stack | Less feature-rich than Grafana |
Consequences¶
Positive:
- No additional authentication complexity
- No CSP/iframe security issues
- Tremor already in frontend stack
- Faster dashboard load times
- Backend metrics API serves both dashboard and Grafana
Negative:
- Less sophisticated charting than Grafana
- No built-in alerting through dashboard
- Users wanting detailed metrics must open separate Grafana window
Implementation:
import { AreaChart } from '@tremor/react';
<AreaChart
data={gpuMetrics}
index="timestamp"
categories={["utilization", "memory_used"]}
colors={["blue", "cyan"]}
/>
ADR-012: WebSocket Circuit Breaker Pattern¶
Status: Accepted Date: 2025-01-03
Context¶
The EventBroadcaster and SystemBroadcaster services maintain persistent connections to Redis pub/sub channels and WebSocket clients. When Redis experiences failures or network issues, broadcasters can enter a failure cascade where repeated connection attempts exhaust resources and delay recovery.
The existing service-level circuit breaker (backend/services/circuit_breaker.py) is designed for external service calls (YOLO26, Nemotron), not for the specific patterns of WebSocket connection management and pub/sub subscription recovery.
Decision¶
Implement a dedicated WebSocket Circuit Breaker (backend/core/websocket_circuit_breaker.py) specifically designed for broadcaster services with:
- Configurable failure threshold (5 for broadcasters, matches MAX_RECOVERY_ATTEMPTS)
- 30-second recovery timeout for gradual recovery testing
- Single test call in half-open state to avoid overwhelming recovering services
- Integration with
is_degraded()method for graceful degradation signaling - Metrics tracking for monitoring and alerting
Alternatives Considered¶
| Alternative | Pros | Cons |
|---|---|---|
| Dedicated WS Circuit Breaker | Tailored for WebSocket patterns, degraded mode | Additional class to maintain |
| Reuse existing CircuitBreaker | Single implementation | Generic config not optimal for WS recovery |
| No circuit breaker | Simpler code | Cascading failures, resource exhaustion |
| Exponential backoff only | Simple retry logic | No state tracking, no coordinated recovery |
Consequences¶
Positive:
- Failure detection (5 consecutive failures opens circuit, matching MAX_RECOVERY_ATTEMPTS)
- Coordinated recovery via half-open state
- Graceful degradation signaling to clients via
is_degraded()method - Metrics for monitoring circuit breaker state
- Thread-safe with asyncio Lock
Negative:
- Additional class to maintain alongside service-level circuit breaker
- Different configuration from service circuit breaker (intentional but may cause confusion)
Implementation¶
from backend.core.websocket_circuit_breaker import WebSocketCircuitBreaker
class SystemBroadcaster:
MAX_RECOVERY_ATTEMPTS = 5
def __init__(self, ...):
self._circuit_breaker = WebSocketCircuitBreaker(
failure_threshold=self.MAX_RECOVERY_ATTEMPTS, # Open after 5 failures
recovery_timeout=30.0, # Wait 30s before testing
half_open_max_calls=1, # Single test call
success_threshold=1, # Close after 1 success
name="system_broadcaster",
)
def is_degraded(self) -> bool:
"""Check if broadcaster is in degraded mode."""
return self._is_degraded
Circuit Breaker States¶
CLOSED (Normal) --[5 failures]--> OPEN (Blocking)
|
| (30s timeout)
v
HALF_OPEN (Testing)
|
+---------------------+---------------------+
| |
[success] [failure]
| |
v v
CLOSED OPEN
Metrics Available¶
| Metric | Description |
|---|---|
state | Current circuit state (closed/open/half_open) |
failure_count | Consecutive failures since last success |
total_failures | Total failures recorded |
total_successes | Total successes recorded |
opened_at | Timestamp when circuit was last opened |
Decision Overview Diagram¶
flowchart TB
subgraph "Data Layer Decisions"
DB["ADR-001: PostgreSQL<br/>Concurrent, reliable"]
RD["ADR-002: Redis<br/>Queues + Pub/Sub"]
end
subgraph "Processing Decisions"
BA["ADR-003: Batching<br/>90s window + 30s idle"]
RS["ADR-010: LLM Scoring<br/>Context-aware risk"]
end
subgraph "AI Model Decisions"
DET["ADR-006: YOLO26<br/>Best real-time accuracy"]
LLM["ADR-007: Nemotron<br/>Local privacy"]
end
subgraph "Architecture Decisions"
HY["ADR-004: Containerized<br/>Docker + GPU CDI"]
NA["ADR-005: No Auth<br/>Trusted network"]
end
subgraph "Frontend Decisions"
ST["ADR-008: FastAPI + React<br/>Modern async stack"]
WS["ADR-009: WebSocket<br/>Real-time updates"]
CH["ADR-011: Tremor Charts<br/>Native over Grafana"]
end
DB --> BA
RD --> BA
BA --> RS
DET --> BA
LLM --> RS
HY --> DET
HY --> LLM
ST --> WS
ST --> CH Revision History¶
| Date | Version | Changes |
|---|---|---|
| 2024-12-21 | 1.0 | Initial architecture decisions documented |
| 2024-12-27 | 1.1 | Added ADR-011 (Grafana integration decision) |
| 2025-12-28 | 1.2 | Consolidated all ADRs into single document with diagrams |
| 2025-01-03 | 1.3 | Added ADR-012 (WebSocket Circuit Breaker pattern) |