Risk Analysis¶

NVIDIA Nemotron LLM integration for risk scoring and event generation.

Time to read: ~15 min Prerequisites: Batching Logic

What NVIDIA Nemotron Does¶

NVIDIA Nemotron analyzes batched detections and generates:

Risk score (0-100)
Risk level (low/medium/high/critical)
Human-readable summary
Detailed reasoning explanation
Entity-level threat assessments (with enrichment)
Recommended actions

Model Options¶

Deployment	Model	File	VRAM	Context
Production	NVIDIA Nemotron-3-Nano-30B-A3B	`Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf`	~14.7 GB	131,072
Development	Nemotron Mini 4B Instruct	`nemotron-mini-4b-instruct-q4_k_m.gguf`	~3 GB	4,096

Source Files¶

/ai/nemotron/AGENTS.md - Comprehensive NVIDIA Nemotron documentation
/ai/nemotron/ - Model files and configuration
/backend/services/nemotron_analyzer.py - Analysis service
/backend/services/prompts.py - Prompt templates (5 tiers)

Analysis Flow¶

Batch closed
      |
      v
Load detection details from DB
      |
      v
Format prompt with context
      |
      v
Call Nemotron /completion
      |
      v
Parse JSON response
      |
      v
Validate and normalize
      |
      v
Create Event record
      |
      v
Broadcast via WebSocket

Prompt Engineering¶

ChatML Format¶

NVIDIA Nemotron uses ChatML format with special delimiters. All prompts follow this structure:

<|im_start|>system
{system message}
<|im_end|>
<|im_start|>user
{user message with detection context}
<|im_end|>
<|im_start|>assistant
{model response begins here}

Stop Tokens: ["<|im_end|>", "<|im_start|>"] - Model stops at these tokens.

Prompt Templates (5 Tiers)¶

The backend automatically selects the appropriate prompt based on available enrichment data:

Template	Constant Name	When Used
Basic	`RISK_ANALYSIS_PROMPT`	Fallback when no enrichment available
Enriched	`ENRICHED_RISK_ANALYSIS_PROMPT`	Zone/baseline/cross-camera context available
Full Enriched	`FULL_ENRICHED_RISK_ANALYSIS_PROMPT`	Enriched + license plates/faces from pipeline
Vision	`VISION_ENHANCED_RISK_ANALYSIS_PROMPT`	Florence-2 extraction + context enrichment
Model Zoo	`MODEL_ZOO_ENHANCED_RISK_ANALYSIS_PROMPT`	Full model zoo (violence, weather, clothing, etc.)

Prompt Template Selection Decision Tree¶

flowchart TD
    A[Start: Select Prompt Template] --> B{Model Zoo<br>enrichment available?}
    B -->|Yes| C[MODEL_ZOO_ENHANCED_RISK_ANALYSIS_PROMPT]
    B -->|No| D{Florence-2 vision<br>attributes available?}
    D -->|Yes| E[VISION_ENHANCED_RISK_ANALYSIS_PROMPT]
    D -->|No| F{License plates or<br>faces detected?}
    F -->|Yes| G[FULL_ENRICHED_RISK_ANALYSIS_PROMPT]
    F -->|No| H{Zone/baseline/cross-camera<br>context available?}
    H -->|Yes| I[ENRICHED_RISK_ANALYSIS_PROMPT]
    H -->|No| J[RISK_ANALYSIS_PROMPT<br>Basic Fallback]

    style C fill:#76B900,color:#000
    style E fill:#76B900,color:#000
    style G fill:#76B900,color:#000
    style I fill:#76B900,color:#000
    style J fill:#F59E0B,color:#000

Basic Prompt (Fallback)¶

Used when enrichment services are unavailable:

RISK_ANALYSIS_PROMPT = """<|im_start|>system
You are a home security risk analyzer.
IMPORTANT: Output ONLY a valid JSON object...<|im_end|>
<|im_start|>user
Analyze these detections and output a JSON risk assessment.

Camera: {camera_name}
Time: {start_time} to {end_time}
Detections:
{detections_list}

Risk levels: low (0-29), medium (30-59), high (60-84), critical (85-100)

Output JSON:
{{"risk_score": N, "risk_level": "level", "summary": "text", "reasoning": "text"}}<|im_end|>
<|im_start|>assistant
"""

Enriched Prompt¶

Adds contextual intelligence:

Zone Analysis: Entry points, high-security areas
Baseline Comparison: Expected vs. actual activity patterns
Deviation Score: 0 (normal) to 1 (highly unusual)
Cross-Camera Correlation: Activity seen on other cameras

Full Enriched Prompt¶

All enriched context plus vision pipeline results:

License Plates: Known vs. unknown vehicles
Face Detections: Presence of identifiable faces
OCR Text: Text recognized in images

Vision Enhanced Prompt¶

Florence-2 vision-language model attributes:

Person Attributes: Clothing, carrying items, actions
Vehicle Attributes: Color, type, commercial markings
Re-Identification Context: Track entities across cameras
Scene Analysis: Environment description
Service Worker Detection: Lower risk for delivery/utility workers

Model Zoo Enhanced Prompt¶

Comprehensive enrichment from full model zoo:

Violence Detection: ViT violence classifier alerts
Weather Context: Visibility and weather conditions
Clothing Analysis: FashionCLIP + SegFormer (suspicious attire, face coverings)
Vehicle Classification: Type, commercial status, damage
Pet Detection: False positive filtering for household pets
Pose Analysis: Crouching, running, lying detection
Action Recognition: Security-relevant behaviors
Image Quality: Blur, noise, tampering indicators

Context Provided to LLM¶

Depending on the prompt template selected, NVIDIA Nemotron receives:

Basic Context:

Camera Name: Human-readable identifier (e.g., "front_door")
Time Window: ISO format timestamps
Detection List: Timestamps, object types, confidence scores

Enriched Context Additions: 4. Day of Week: Weekday/weekend patterns 5. Zone Analysis: Which security zones triggered 6. Baseline Comparison: Historical activity patterns 7. Deviation Score: Statistical anomaly measure 8. Cross-Camera Activity: Correlated detections

Vision Context Additions: 9. Detailed Attributes: Per-detection clothing, colors, actions 10. Re-ID Context: Entity tracking across cameras 11. Scene Analysis: Environment and lighting description

Model Zoo Context Additions: 12. Violence Alerts: Explicit violence detection flags 13. Weather/Visibility: Environmental conditions 14. Clothing Segmentation: Face covering detection 15. Vehicle Damage: Security-relevant vehicle damage 16. Pet Classifications: False positive filtering

API Call¶

Endpoint: POST http://localhost:8091/completion

Request:

{
  "prompt": "<ChatML formatted prompt>",
  "temperature": 0.7,
  "top_p": 0.95,
  "max_tokens": 1536,
  "stop": ["<|im_end|>", "<|im_start|>"]
}

Response:

{
  "content": "<think>Analyzing detections...</think>{\"risk_score\": 65, ...}",
  "model": "Nemotron-3-Nano-30B-A3B-Q4_K_M.gguf",
  "tokens_predicted": 287,
  "tokens_evaluated": 1245
}

Note: NVIDIA Nemotron-3-Nano outputs <think>...</think> reasoning blocks before the JSON response. The backend strips these before parsing.

Output Format¶

The LLM produces JSON:

{
  "risk_score": 65,
  "risk_level": "high",
  "summary": "Unknown person detected approaching front door at night",
  "reasoning": "Single person detection at 2:15 AM is unusual.
               The person appeared to be approaching the entrance.
               Time of day and approach pattern warrant elevated concern."
}

Risk Level Mapping¶

Risk Analysis Decision Tree

Decision tree diagram showing how the LLM analyzes detection context to determine risk scores and levels.

Score Range	Level	Description
0-29	`low`	Normal activity, no concern
30-59	`medium`	Unusual but not threatening
60-84	`high`	Suspicious, needs attention
85-100	`critical`	Potential threat, immediate action

Risk Level State Diagram¶

stateDiagram-v2
    direction LR

    [*] --> LOW: score 0-29
    [*] --> MEDIUM: score 30-59
    [*] --> HIGH: score 60-84
    [*] --> CRITICAL: score 85-100

    state LOW {
        note right of LOW
            Normal activity
            No concern
            No alert triggered
        end note
    }

    state MEDIUM {
        note right of MEDIUM
            Unusual activity
            Not threatening
            Optional notification
        end note
    }

    state HIGH {
        note right of HIGH
            Suspicious activity
            Needs attention
            Alert triggered
        end note
    }

    state CRITICAL {
        note right of CRITICAL
            Potential threat
            Immediate action
            Priority alert
        end note
    }

    LOW --> [*]: Event processed
    MEDIUM --> [*]: Event processed
    HIGH --> [*]: Event processed
    CRITICAL --> [*]: Event processed

Validation and Normalization¶

The _validate_risk_data() method ensures valid output:

def _validate_risk_data(self, data: dict) -> dict:
    # Validate risk_score (0-100, integer)
    risk_score = data.get("risk_score", 50)
    risk_score = max(0, min(100, int(risk_score)))

    # Validate risk_level
    valid_levels = ["low", "medium", "high", "critical"]
    risk_level = str(data.get("risk_level", "medium")).lower()

    if risk_level not in valid_levels:
        # Infer from risk_score
        if risk_score < 30:
            risk_level = "low"
        elif risk_score < 60:
            risk_level = "medium"
        elif risk_score < 85:
            risk_level = "high"
        else:
            risk_level = "critical"

    return {
        "risk_score": risk_score,
        "risk_level": risk_level,
        "summary": data.get("summary", "Risk analysis completed"),
        "reasoning": data.get("reasoning", "No detailed reasoning provided"),
    }

JSON Extraction¶

LLM output may contain extra text. The analyzer extracts JSON using regex:

json_pattern = r"\{[^{}]*(?:\{[^{}]*\}[^{}]*)*\}"
matches = re.findall(json_pattern, text, re.DOTALL)

Fallback Behavior¶

When LLM analysis fails, default values are used:

{
    "risk_score": 50,
    "risk_level": "medium",
    "summary": "Analysis unavailable - LLM service error",
    "reasoning": "Failed to analyze detections due to service error"
}

Error Handling¶

Error	Response	Recovery
Batch not found	Raise `ValueError`	Skip batch
Nemotron unreachable	Use fallback	Event created
Nemotron timeout (60s)	Use fallback	Event created
Invalid LLM JSON	Use fallback	Event created

Performance¶

Metric	Production (30B)	Development (4B)
Inference time	2-5 seconds per batch	1-3 seconds per batch
Token generation	~50-100 tokens/second	~100-200 tokens/second
Context processing	~1000 tokens/second	~2000 tokens/second
Concurrent requests	1-2 (configured)	2-4 (configurable)
VRAM usage	~14.7 GB	~3 GB
Context window	131,072 tokens (128K)	4,096 tokens

The production 30B model enables analyzing significantly more context (hours of detection history vs. minutes with 4B).

Event Database Model¶

CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    batch_id VARCHAR NOT NULL,
    camera_id VARCHAR NOT NULL,
    started_at TIMESTAMP NOT NULL,
    ended_at TIMESTAMP NOT NULL,
    risk_score INTEGER NOT NULL,
    risk_level VARCHAR NOT NULL,
    summary TEXT,
    reasoning TEXT,
    detection_ids TEXT,  -- JSON array
    reviewed BOOLEAN DEFAULT FALSE,
    notes TEXT,
    is_fast_path BOOLEAN DEFAULT FALSE
);

WebSocket Broadcast¶

After event creation, broadcast to all connected clients:

await self.broadcaster.broadcast_event({
    "type": "new_event",
    "event": event.to_dict()
})

Clients receive:

{
  "type": "new_event",
  "event": {
    "id": 42,
    "camera_id": "front_door",
    "risk_score": 65,
    "risk_level": "high",
    "summary": "Unknown person detected...",
    "started_at": "2025-12-28T14:30:00",
    "ended_at": "2025-12-28T14:31:30"
  }
}

Next Steps¶

Pipeline Overview - Full pipeline context
Batching Logic - Batch aggregation details

External Resources¶

Back to Developer Hub