AI Service Troubleshooting¶
Solving AI service and pipeline problems (YOLO26, Nemotron, and optional Florence/CLIP/Enrichment).
Time to read: ~6 min Prerequisites: GPU Issues for hardware problems
Service Not Running¶
Symptoms¶
- Health check:
"yolo26": "connection refused" - Health check:
"nemotron": "connection refused" - No detections being created
Diagnosis¶
# Check AI service status
./scripts/start-ai.sh status
# Check if processes are running
pgrep -f "model.py" # YOLO26
pgrep -f "llama-server" # Nemotron
# Check logs
tail -f /tmp/yolo26-detector.log
tail -f /tmp/nemotron-llm.log
If you are running the optional services, also check:
curl http://localhost:8092/health # Florence-2 (optional)
curl http://localhost:8093/health # CLIP (optional)
curl http://localhost:8094/health # Enrichment (optional)
Solutions¶
1. Start AI services:
2. Check for startup errors:
Common startup errors:
- Missing model files (run
./ai/download_models.sh) - Port already in use
- CUDA initialization failure
3. Check model files exist:
Degraded Mode¶
Symptoms¶
- Health check shows
"ai": "degraded" - One service healthy, one unhealthy
- Partial functionality
Diagnosis¶
# Check overall service health (includes AI service URLs from config)
curl http://localhost:8000/api/system/health | jq .services
# Check individual AI services
curl http://localhost:8095/health # YOLO26
curl http://localhost:8091/health # Nemotron
curl http://localhost:8092/health # Florence-2 (optional)
curl http://localhost:8093/health # CLIP (optional)
curl http://localhost:8094/health # Enrichment (optional)
Solutions¶
Understand degraded behavior:
| YOLO26 | Nemotron | Result |
|---|---|---|
| Up | Up | Full functionality |
| Up | Down | Detections work, no risk analysis |
| Down | Up | No new detections, existing events can be re-analyzed |
| Down | Down | System unhealthy |
Optional enrichment services (Florence/CLIP/Enrichment) typically degrade enrichment quality rather than fully stopping event creation. The core “detections → batches → LLM → events” path can still function if YOLO26 and Nemotron are healthy.
Optional Enrichment Issues (Florence / CLIP / Enrichment)¶
The optional enrichment services (Florence-2, CLIP, Enrichment) provide enhanced context for detections, including:
- Florence-2: Visual attributes, OCR, dense captions
- CLIP: Embedding generation for re-identification
- Enrichment: Orchestrates and aggregates enrichment data
These services are optional - the core detection and risk analysis pipeline works without them.
Symptoms¶
- Events exist, but "extra context" fields are missing (no attributes, no re-identification hints, etc.)
- Backend logs mention enrichment timeouts or connection errors
- CPU spikes on the backend when enrichment is enabled
- Circuit breakers open for enrichment services
Quick Diagnosis¶
# Confirm the backend is configured to reach the optional services
curl http://localhost:8000/api/system/config | jq '.florence_url, .clip_url, .enrichment_url'
# Check feature toggles (are enrichment features enabled?)
curl http://localhost:8000/api/system/config | jq '.vision_extraction_enabled, .reid_enabled, .scene_change_enabled'
# Check health endpoints
curl http://localhost:8092/health # Florence-2
curl http://localhost:8093/health # CLIP
curl http://localhost:8094/health # Enrichment
# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq '.florence, .clip, .enrichment'
Understanding Feature Toggles¶
| Variable | Default | Effect When Disabled |
|---|---|---|
VISION_EXTRACTION_ENABLED | true | No Florence-2 attributes, OCR, or dense captions |
REID_ENABLED | true | No CLIP embeddings or re-identification |
SCENE_CHANGE_ENABLED | true | No scene change detection between frames |
Common Causes¶
- Wrong URL from backend (container vs host networking)
- GPU/VRAM pressure (too many services competing for limited VRAM)
- Timeouts (services are up, but slow to respond under load)
- Circuit breakers open (service failures triggered protection)
- Feature toggles disabled (enrichment turned off in config)
Solutions¶
1. Fix container vs host networking
- Production compose: backend should use compose DNS names (
http://ai-florence:8092,http://ai-clip:8093,http://ai-enrichment:8094) - Host-run AI: backend should use
localhost(orhost.docker.internal/host.containers.internalwhen backend is containerized)
2. Disable optional enrichment temporarily
If you need the system running reliably while debugging, disable the optional features:
# In .env - disable all enrichment
VISION_EXTRACTION_ENABLED=false
REID_ENABLED=false
SCENE_CHANGE_ENABLED=false
# Restart backend to apply
docker compose -f docker-compose.prod.yml restart backend
Then re-enable one-by-one after stabilizing GPU/latency.
3. Adjust timeouts for slow services
If services are healthy but timing out under load:
# In .env - increase timeouts
FLORENCE_READ_TIMEOUT=60.0 # Default: 30s
CLIP_READ_TIMEOUT=30.0 # Default: 15s
ENRICHMENT_READ_TIMEOUT=120.0 # Default: 60s
4. Reset circuit breakers
If circuit breakers opened due to transient failures:
# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq
# Reset specific circuit breaker (if API available)
curl -X POST http://localhost:8000/api/system/circuit-breakers/florence/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/clip/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/enrichment/reset
# Or restart backend to reset all circuit breakers
docker compose -f docker-compose.prod.yml restart backend
5. Check GPU/VRAM availability
Optional services compete for GPU memory. Check utilization:
nvidia-smi
# Expected VRAM usage per service:
# - YOLO26: ~4GB
# - Nemotron: ~3GB (Q4_K_M), ~14GB (30B)
# - Florence-2: ~2-4GB
# - CLIP: ~1-2GB
If GPU is overloaded, consider:
- Running fewer AI services simultaneously
- Using smaller model quantizations
- Disabling non-essential enrichment features
6. Tune re-identification settings
If re-ID is slow or producing poor matches:
# Adjust similarity threshold (higher = stricter matching)
REID_SIMILARITY_THRESHOLD=0.85
# Reduce TTL if embeddings are stale
REID_TTL_HOURS=12
# Limit concurrent re-ID operations
REID_MAX_CONCURRENT=2
REID_TIMEOUT_SECONDS=10.0
7. Restart failed services
# Just YOLO26
./ai/start_detector.sh
# Just Nemotron
./ai/start_llm.sh
# All AI services (if using docker compose)
docker compose -f docker-compose.prod.yml restart ai-florence ai-clip ai-enrichment
# Both core services
./scripts/start-ai.sh restart
Verifying Enrichment is Working¶
After enabling enrichment, verify data is being populated:
# Get a recent event with enrichment data
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment'
# Check for Florence attributes
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.attributes'
# Check for CLIP embeddings (should show embedding exists, not the actual vector)
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.has_embedding'
Batch Not Processing¶
Symptoms¶
- Detections created but no events
- Batches accumulating without completion
- Pipeline status shows stale batches
Diagnosis¶
# Check batch aggregator status
curl http://localhost:8000/api/system/pipeline | jq .batch_aggregator
# Check queue depths
curl http://localhost:8000/api/system/telemetry | jq .queues
# Check pipeline workers
curl http://localhost:8000/api/system/health/ready | jq .workers
Solutions¶
1. Check batch settings:
# Default: 90 second window, 30 second idle timeout
BATCH_WINDOW_SECONDS=90
BATCH_IDLE_TIMEOUT_SECONDS=30
2. Check analysis worker:
# Worker should be "running"
curl http://localhost:8000/api/system/health/ready | jq '.workers[] | select(.name=="analysis_worker")'
3. Check Nemotron service:
Batch completion requires Nemotron for risk analysis. If Nemotron is down, batches queue up.
4. Check Redis:
Batch state is stored in Redis:
Analysis Failing¶
Symptoms¶
- Events created with
risk_score: null risk_level: null- Empty
reasoningfield
Diagnosis¶
# Check Nemotron health
curl http://localhost:8091/health
# Check Nemotron logs
tail -f /tmp/nemotron-llm.log
# Test Nemotron directly
curl -X POST http://localhost:8091/completion \
-H "Content-Type: application/json" \
-d '{"prompt": "Test prompt", "max_tokens": 50}'
Solutions¶
1. Check Nemotron is responding:
If health check passes but analysis fails:
- Check for timeout (increase
NEMOTRON_READ_TIMEOUT) - Check model is fully loaded (first requests take longer)
2. Check prompt/response:
3. Restart Nemotron:
Detection Quality Issues¶
Symptoms¶
- Too many false positives
- Missing obvious detections
- Wrong object classifications
Solutions¶
Adjust confidence threshold:
# Higher = fewer detections, less false positives
# Lower = more detections, more false positives
DETECTION_CONFIDENCE_THRESHOLD=0.6 # Default: 0.5
Check image quality:
Detection works best with:
- Good lighting
- Clear, unobstructed view
- Reasonable resolution (640x480 minimum)
Check camera positioning:
Objects should be:
- Not too far from camera
- Not too close (partial view)
- At a reasonable angle
Slow Inference¶
Symptoms¶
- Detection takes >100ms (expected: 30-50ms)
- LLM responses take >10s (expected: 2-5s)
- GPU utilization low during inference
Diagnosis¶
# Check latency stats
curl http://localhost:8000/api/system/pipeline-latency | jq
# Monitor GPU during inference
watch -n 1 nvidia-smi
Solutions¶
1. Verify GPU is being used:
2. Check for thermal throttling:
See GPU Issues - Thermal Throttling
3. Reduce concurrent load:
- Lower
--parallelin Nemotron startup - Process fewer cameras simultaneously
4. Optimize settings:
# YOLO26: ensure batching for multiple images
# Nemotron: adjust context size
--ctx-size 2048 # Smaller than default 4096
Model Loading Issues¶
Symptoms¶
- "Model file not found"
- "Failed to load model"
- Service starts but first request fails
Solutions¶
1. Download models:
2. Verify model files:
# Nemotron model
ls -la ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf
# Should be ~2.5GB
# YOLO26 (auto-downloads to HuggingFace cache)
ls -la ~/.cache/huggingface/
3. Check model path configuration:
Circuit Breaker Open¶
Symptoms¶
- AI service marked as "unavailable (circuit open)"
- Requests immediately rejected
- Health checks return cached error
Diagnosis¶
Solutions¶
1. Wait for automatic recovery:
Circuit breakers auto-reset after timeout (default: 30s).
2. Manual reset:
curl -X POST http://localhost:8000/api/system/circuit-breakers/yolo26/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/nemotron/reset
3. Fix underlying issue:
Circuit opened because service repeatedly failed. Check:
- Service health
- Network connectivity
- Resource availability
Next Steps¶
- GPU Issues - Hardware problems
- Connection Issues - Network problems
- Troubleshooting Index - Back to symptom index
See Also¶
- AI Overview - AI services architecture
- AI Configuration - Environment variables
- AI Troubleshooting (Operator) - Quick fixes
- Pipeline Overview - How the AI pipeline works