AI Service Troubleshooting¶

Solving AI service and pipeline problems (YOLO26, Nemotron, and optional Florence/CLIP/Enrichment).

Time to read: ~6 min Prerequisites: GPU Issues for hardware problems

Service Not Running¶

Symptoms¶

Health check: "yolo26": "connection refused"
Health check: "nemotron": "connection refused"
No detections being created

Diagnosis¶

# Check AI service status
./scripts/start-ai.sh status

# Check if processes are running
pgrep -f "model.py"      # YOLO26
pgrep -f "llama-server"  # Nemotron

# Check logs
tail -f /tmp/yolo26-detector.log
tail -f /tmp/nemotron-llm.log

If you are running the optional services, also check:

curl http://localhost:8092/health  # Florence-2 (optional)
curl http://localhost:8093/health  # CLIP (optional)
curl http://localhost:8094/health  # Enrichment (optional)

Solutions¶

1. Start AI services:

./scripts/start-ai.sh start

2. Check for startup errors:

cat /tmp/yolo26-detector.log
cat /tmp/nemotron-llm.log

Common startup errors:

Missing model files (run ./ai/download_models.sh)
Port already in use
CUDA initialization failure

3. Check model files exist:

ls -la ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf
# Should be ~2.5GB

Degraded Mode¶

Symptoms¶

Health check shows "ai": "degraded"
One service healthy, one unhealthy
Partial functionality

Diagnosis¶

# Check overall service health (includes AI service URLs from config)
curl http://localhost:8000/api/system/health | jq .services

# Check individual AI services
curl http://localhost:8095/health  # YOLO26
curl http://localhost:8091/health  # Nemotron
curl http://localhost:8092/health  # Florence-2 (optional)
curl http://localhost:8093/health  # CLIP (optional)
curl http://localhost:8094/health  # Enrichment (optional)

Solutions¶

Understand degraded behavior:

YOLO26	Nemotron	Result
Up	Up	Full functionality
Up	Down	Detections work, no risk analysis
Down	Up	No new detections, existing events can be re-analyzed
Down	Down	System unhealthy

Optional enrichment services (Florence/CLIP/Enrichment) typically degrade enrichment quality rather than fully stopping event creation. The core “detections → batches → LLM → events” path can still function if YOLO26 and Nemotron are healthy.

Optional Enrichment Issues (Florence / CLIP / Enrichment)¶

The optional enrichment services (Florence-2, CLIP, Enrichment) provide enhanced context for detections, including:

Florence-2: Visual attributes, OCR, dense captions
CLIP: Embedding generation for re-identification
Enrichment: Orchestrates and aggregates enrichment data

These services are optional - the core detection and risk analysis pipeline works without them.

Symptoms¶

Events exist, but "extra context" fields are missing (no attributes, no re-identification hints, etc.)
Backend logs mention enrichment timeouts or connection errors
CPU spikes on the backend when enrichment is enabled
Circuit breakers open for enrichment services

Quick Diagnosis¶

# Confirm the backend is configured to reach the optional services
curl http://localhost:8000/api/system/config | jq '.florence_url, .clip_url, .enrichment_url'

# Check feature toggles (are enrichment features enabled?)
curl http://localhost:8000/api/system/config | jq '.vision_extraction_enabled, .reid_enabled, .scene_change_enabled'

# Check health endpoints
curl http://localhost:8092/health  # Florence-2
curl http://localhost:8093/health  # CLIP
curl http://localhost:8094/health  # Enrichment

# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq '.florence, .clip, .enrichment'

Understanding Feature Toggles¶

Variable	Default	Effect When Disabled
`VISION_EXTRACTION_ENABLED`	`true`	No Florence-2 attributes, OCR, or dense captions
`REID_ENABLED`	`true`	No CLIP embeddings or re-identification
`SCENE_CHANGE_ENABLED`	`true`	No scene change detection between frames

Common Causes¶

Wrong URL from backend (container vs host networking)
GPU/VRAM pressure (too many services competing for limited VRAM)
Timeouts (services are up, but slow to respond under load)
Circuit breakers open (service failures triggered protection)
Feature toggles disabled (enrichment turned off in config)

Solutions¶

1. Fix container vs host networking

Production compose: backend should use compose DNS names (http://ai-florence:8092, http://ai-clip:8093, http://ai-enrichment:8094)
Host-run AI: backend should use localhost (or host.docker.internal / host.containers.internal when backend is containerized)

2. Disable optional enrichment temporarily

If you need the system running reliably while debugging, disable the optional features:

# In .env - disable all enrichment
VISION_EXTRACTION_ENABLED=false
REID_ENABLED=false
SCENE_CHANGE_ENABLED=false

# Restart backend to apply
docker compose -f docker-compose.prod.yml restart backend

Then re-enable one-by-one after stabilizing GPU/latency.

3. Adjust timeouts for slow services

If services are healthy but timing out under load:

# In .env - increase timeouts
FLORENCE_READ_TIMEOUT=60.0   # Default: 30s
CLIP_READ_TIMEOUT=30.0       # Default: 15s
ENRICHMENT_READ_TIMEOUT=120.0 # Default: 60s

4. Reset circuit breakers

If circuit breakers opened due to transient failures:

# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq

# Reset specific circuit breaker (if API available)
curl -X POST http://localhost:8000/api/system/circuit-breakers/florence/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/clip/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/enrichment/reset

# Or restart backend to reset all circuit breakers
docker compose -f docker-compose.prod.yml restart backend

5. Check GPU/VRAM availability

Optional services compete for GPU memory. Check utilization:

nvidia-smi

# Expected VRAM usage per service:
# - YOLO26: ~4GB
# - Nemotron: ~3GB (Q4_K_M), ~14GB (30B)
# - Florence-2: ~2-4GB
# - CLIP: ~1-2GB

If GPU is overloaded, consider:

Running fewer AI services simultaneously
Using smaller model quantizations
Disabling non-essential enrichment features

6. Tune re-identification settings

If re-ID is slow or producing poor matches:

# Adjust similarity threshold (higher = stricter matching)
REID_SIMILARITY_THRESHOLD=0.85

# Reduce TTL if embeddings are stale
REID_TTL_HOURS=12

# Limit concurrent re-ID operations
REID_MAX_CONCURRENT=2
REID_TIMEOUT_SECONDS=10.0

7. Restart failed services

# Just YOLO26
./ai/start_detector.sh

# Just Nemotron
./ai/start_llm.sh

# All AI services (if using docker compose)
docker compose -f docker-compose.prod.yml restart ai-florence ai-clip ai-enrichment

# Both core services
./scripts/start-ai.sh restart

Verifying Enrichment is Working¶

After enabling enrichment, verify data is being populated:

# Get a recent event with enrichment data
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment'

# Check for Florence attributes
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.attributes'

# Check for CLIP embeddings (should show embedding exists, not the actual vector)
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.has_embedding'

Batch Not Processing¶

Symptoms¶

Detections created but no events
Batches accumulating without completion
Pipeline status shows stale batches

Diagnosis¶

# Check batch aggregator status
curl http://localhost:8000/api/system/pipeline | jq .batch_aggregator

# Check queue depths
curl http://localhost:8000/api/system/telemetry | jq .queues

# Check pipeline workers
curl http://localhost:8000/api/system/health/ready | jq .workers

Solutions¶

1. Check batch settings:

# Default: 90 second window, 30 second idle timeout
BATCH_WINDOW_SECONDS=90
BATCH_IDLE_TIMEOUT_SECONDS=30

2. Check analysis worker:

# Worker should be "running"
curl http://localhost:8000/api/system/health/ready | jq '.workers[] | select(.name=="analysis_worker")'

3. Check Nemotron service:

Batch completion requires Nemotron for risk analysis. If Nemotron is down, batches queue up.

4. Check Redis:

Batch state is stored in Redis:

redis-cli keys "batch:*"

Analysis Failing¶

Symptoms¶

Events created with risk_score: null
risk_level: null
Empty reasoning field

Diagnosis¶

# Check Nemotron health
curl http://localhost:8091/health

# Check Nemotron logs
tail -f /tmp/nemotron-llm.log

# Test Nemotron directly
curl -X POST http://localhost:8091/completion \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Test prompt", "max_tokens": 50}'

Solutions¶

1. Check Nemotron is responding:

If health check passes but analysis fails:

Check for timeout (increase NEMOTRON_READ_TIMEOUT)
Check model is fully loaded (first requests take longer)

2. Check prompt/response:

# Watch Nemotron logs during analysis
tail -f /tmp/nemotron-llm.log

3. Restart Nemotron:

./scripts/start-ai.sh stop
./ai/start_llm.sh

Detection Quality Issues¶

Symptoms¶

Too many false positives
Missing obvious detections
Wrong object classifications

Solutions¶

Adjust confidence threshold:

# Higher = fewer detections, less false positives
# Lower = more detections, more false positives
DETECTION_CONFIDENCE_THRESHOLD=0.6  # Default: 0.5

Check image quality:

Detection works best with:

Good lighting
Clear, unobstructed view
Reasonable resolution (640x480 minimum)

Check camera positioning:

Objects should be:

Not too far from camera
Not too close (partial view)
At a reasonable angle

Slow Inference¶

Symptoms¶

Detection takes >100ms (expected: 30-50ms)
LLM responses take >10s (expected: 2-5s)
GPU utilization low during inference

Diagnosis¶

# Check latency stats
curl http://localhost:8000/api/system/pipeline-latency | jq

# Monitor GPU during inference
watch -n 1 nvidia-smi

Solutions¶

1. Verify GPU is being used:

See GPU Issues - CPU Fallback

2. Check for thermal throttling:

See GPU Issues - Thermal Throttling

3. Reduce concurrent load:

Lower --parallel in Nemotron startup
Process fewer cameras simultaneously

4. Optimize settings:

# YOLO26: ensure batching for multiple images
# Nemotron: adjust context size
--ctx-size 2048  # Smaller than default 4096

Model Loading Issues¶

Symptoms¶

"Model file not found"
"Failed to load model"
Service starts but first request fails

Solutions¶

1. Download models:

./ai/download_models.sh

2. Verify model files:

# Nemotron model
ls -la ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf
# Should be ~2.5GB

# YOLO26 (auto-downloads to HuggingFace cache)
ls -la ~/.cache/huggingface/

3. Check model path configuration:

# For custom paths
NEMOTRON_MODEL_PATH=/path/to/model.gguf
YOLO26_MODEL_PATH=/path/to/yolo26

Circuit Breaker Open¶

Symptoms¶

AI service marked as "unavailable (circuit open)"
Requests immediately rejected
Health checks return cached error

Diagnosis¶

# Check circuit breakers
curl http://localhost:8000/api/system/circuit-breakers | jq

Solutions¶

1. Wait for automatic recovery:

Circuit breakers auto-reset after timeout (default: 30s).

2. Manual reset:

curl -X POST http://localhost:8000/api/system/circuit-breakers/yolo26/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/nemotron/reset

3. Fix underlying issue:

Circuit opened because service repeatedly failed. Check:

Service health
Network connectivity
Resource availability

Next Steps¶

GPU Issues - Hardware problems
Connection Issues - Network problems
Troubleshooting Index - Back to symptom index

AI Service Troubleshooting¶

Service Not Running¶

Symptoms¶

Diagnosis¶

Solutions¶

Degraded Mode¶

Symptoms¶

Diagnosis¶

Solutions¶

Optional Enrichment Issues (Florence / CLIP / Enrichment)¶

Symptoms¶

Quick Diagnosis¶

Understanding Feature Toggles¶

Common Causes¶

Solutions¶

Verifying Enrichment is Working¶

Batch Not Processing¶

Symptoms¶

Diagnosis¶

Solutions¶

Analysis Failing¶

Symptoms¶

Diagnosis¶

Solutions¶

Detection Quality Issues¶

Symptoms¶

Solutions¶

Slow Inference¶

Symptoms¶

Diagnosis¶

Solutions¶

Model Loading Issues¶

Symptoms¶

Solutions¶

Circuit Breaker Open¶

Symptoms¶

Diagnosis¶

Solutions¶

Next Steps¶

See Also¶