Skip to content

AI Service Troubleshooting

Solving AI service and pipeline problems (YOLO26, Nemotron, and optional Florence/CLIP/Enrichment).

Time to read: ~6 min Prerequisites: GPU Issues for hardware problems


Service Not Running

Symptoms

  • Health check: "yolo26": "connection refused"
  • Health check: "nemotron": "connection refused"
  • No detections being created

Diagnosis

# Check AI service status
./scripts/start-ai.sh status

# Check if processes are running
pgrep -f "model.py"      # YOLO26
pgrep -f "llama-server"  # Nemotron

# Check logs
tail -f /tmp/yolo26-detector.log
tail -f /tmp/nemotron-llm.log

If you are running the optional services, also check:

curl http://localhost:8092/health  # Florence-2 (optional)
curl http://localhost:8093/health  # CLIP (optional)
curl http://localhost:8094/health  # Enrichment (optional)

Solutions

1. Start AI services:

./scripts/start-ai.sh start

2. Check for startup errors:

cat /tmp/yolo26-detector.log
cat /tmp/nemotron-llm.log

Common startup errors:

  • Missing model files (run ./ai/download_models.sh)
  • Port already in use
  • CUDA initialization failure

3. Check model files exist:

ls -la ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf
# Should be ~2.5GB

Degraded Mode

Symptoms

  • Health check shows "ai": "degraded"
  • One service healthy, one unhealthy
  • Partial functionality

Diagnosis

# Check overall service health (includes AI service URLs from config)
curl http://localhost:8000/api/system/health | jq .services

# Check individual AI services
curl http://localhost:8095/health  # YOLO26
curl http://localhost:8091/health  # Nemotron
curl http://localhost:8092/health  # Florence-2 (optional)
curl http://localhost:8093/health  # CLIP (optional)
curl http://localhost:8094/health  # Enrichment (optional)

Solutions

Understand degraded behavior:

YOLO26 Nemotron Result
Up Up Full functionality
Up Down Detections work, no risk analysis
Down Up No new detections, existing events can be re-analyzed
Down Down System unhealthy

Optional enrichment services (Florence/CLIP/Enrichment) typically degrade enrichment quality rather than fully stopping event creation. The core “detections → batches → LLM → events” path can still function if YOLO26 and Nemotron are healthy.


Optional Enrichment Issues (Florence / CLIP / Enrichment)

The optional enrichment services (Florence-2, CLIP, Enrichment) provide enhanced context for detections, including:

  • Florence-2: Visual attributes, OCR, dense captions
  • CLIP: Embedding generation for re-identification
  • Enrichment: Orchestrates and aggregates enrichment data

These services are optional - the core detection and risk analysis pipeline works without them.

Symptoms

  • Events exist, but "extra context" fields are missing (no attributes, no re-identification hints, etc.)
  • Backend logs mention enrichment timeouts or connection errors
  • CPU spikes on the backend when enrichment is enabled
  • Circuit breakers open for enrichment services

Quick Diagnosis

# Confirm the backend is configured to reach the optional services
curl http://localhost:8000/api/system/config | jq '.florence_url, .clip_url, .enrichment_url'

# Check feature toggles (are enrichment features enabled?)
curl http://localhost:8000/api/system/config | jq '.vision_extraction_enabled, .reid_enabled, .scene_change_enabled'

# Check health endpoints
curl http://localhost:8092/health  # Florence-2
curl http://localhost:8093/health  # CLIP
curl http://localhost:8094/health  # Enrichment

# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq '.florence, .clip, .enrichment'

Understanding Feature Toggles

Variable Default Effect When Disabled
VISION_EXTRACTION_ENABLED true No Florence-2 attributes, OCR, or dense captions
REID_ENABLED true No CLIP embeddings or re-identification
SCENE_CHANGE_ENABLED true No scene change detection between frames

Common Causes

  1. Wrong URL from backend (container vs host networking)
  2. GPU/VRAM pressure (too many services competing for limited VRAM)
  3. Timeouts (services are up, but slow to respond under load)
  4. Circuit breakers open (service failures triggered protection)
  5. Feature toggles disabled (enrichment turned off in config)

Solutions

1. Fix container vs host networking

  • Production compose: backend should use compose DNS names (http://ai-florence:8092, http://ai-clip:8093, http://ai-enrichment:8094)
  • Host-run AI: backend should use localhost (or host.docker.internal / host.containers.internal when backend is containerized)

2. Disable optional enrichment temporarily

If you need the system running reliably while debugging, disable the optional features:

# In .env - disable all enrichment
VISION_EXTRACTION_ENABLED=false
REID_ENABLED=false
SCENE_CHANGE_ENABLED=false

# Restart backend to apply
docker compose -f docker-compose.prod.yml restart backend

Then re-enable one-by-one after stabilizing GPU/latency.

3. Adjust timeouts for slow services

If services are healthy but timing out under load:

# In .env - increase timeouts
FLORENCE_READ_TIMEOUT=60.0   # Default: 30s
CLIP_READ_TIMEOUT=30.0       # Default: 15s
ENRICHMENT_READ_TIMEOUT=120.0 # Default: 60s

4. Reset circuit breakers

If circuit breakers opened due to transient failures:

# Check circuit breaker status
curl http://localhost:8000/api/system/circuit-breakers | jq

# Reset specific circuit breaker (if API available)
curl -X POST http://localhost:8000/api/system/circuit-breakers/florence/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/clip/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/enrichment/reset

# Or restart backend to reset all circuit breakers
docker compose -f docker-compose.prod.yml restart backend

5. Check GPU/VRAM availability

Optional services compete for GPU memory. Check utilization:

nvidia-smi

# Expected VRAM usage per service:
# - YOLO26: ~4GB
# - Nemotron: ~3GB (Q4_K_M), ~14GB (30B)
# - Florence-2: ~2-4GB
# - CLIP: ~1-2GB

If GPU is overloaded, consider:

  • Running fewer AI services simultaneously
  • Using smaller model quantizations
  • Disabling non-essential enrichment features

6. Tune re-identification settings

If re-ID is slow or producing poor matches:

# Adjust similarity threshold (higher = stricter matching)
REID_SIMILARITY_THRESHOLD=0.85

# Reduce TTL if embeddings are stale
REID_TTL_HOURS=12

# Limit concurrent re-ID operations
REID_MAX_CONCURRENT=2
REID_TIMEOUT_SECONDS=10.0

7. Restart failed services

# Just YOLO26
./ai/start_detector.sh

# Just Nemotron
./ai/start_llm.sh

# All AI services (if using docker compose)
docker compose -f docker-compose.prod.yml restart ai-florence ai-clip ai-enrichment

# Both core services
./scripts/start-ai.sh restart

Verifying Enrichment is Working

After enabling enrichment, verify data is being populated:

# Get a recent event with enrichment data
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment'

# Check for Florence attributes
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.attributes'

# Check for CLIP embeddings (should show embedding exists, not the actual vector)
curl -s http://localhost:8000/api/events?limit=1 | jq '.events[0].detections[0].enrichment.has_embedding'

Batch Not Processing

Symptoms

  • Detections created but no events
  • Batches accumulating without completion
  • Pipeline status shows stale batches

Diagnosis

# Check batch aggregator status
curl http://localhost:8000/api/system/pipeline | jq .batch_aggregator

# Check queue depths
curl http://localhost:8000/api/system/telemetry | jq .queues

# Check pipeline workers
curl http://localhost:8000/api/system/health/ready | jq .workers

Solutions

1. Check batch settings:

# Default: 90 second window, 30 second idle timeout
BATCH_WINDOW_SECONDS=90
BATCH_IDLE_TIMEOUT_SECONDS=30

2. Check analysis worker:

# Worker should be "running"
curl http://localhost:8000/api/system/health/ready | jq '.workers[] | select(.name=="analysis_worker")'

3. Check Nemotron service:

Batch completion requires Nemotron for risk analysis. If Nemotron is down, batches queue up.

4. Check Redis:

Batch state is stored in Redis:

redis-cli keys "batch:*"

Analysis Failing

Symptoms

  • Events created with risk_score: null
  • risk_level: null
  • Empty reasoning field

Diagnosis

# Check Nemotron health
curl http://localhost:8091/health

# Check Nemotron logs
tail -f /tmp/nemotron-llm.log

# Test Nemotron directly
curl -X POST http://localhost:8091/completion \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Test prompt", "max_tokens": 50}'

Solutions

1. Check Nemotron is responding:

If health check passes but analysis fails:

  • Check for timeout (increase NEMOTRON_READ_TIMEOUT)
  • Check model is fully loaded (first requests take longer)

2. Check prompt/response:

# Watch Nemotron logs during analysis
tail -f /tmp/nemotron-llm.log

3. Restart Nemotron:

./scripts/start-ai.sh stop
./ai/start_llm.sh

Detection Quality Issues

Symptoms

  • Too many false positives
  • Missing obvious detections
  • Wrong object classifications

Solutions

Adjust confidence threshold:

# Higher = fewer detections, less false positives
# Lower = more detections, more false positives
DETECTION_CONFIDENCE_THRESHOLD=0.6  # Default: 0.5

Check image quality:

Detection works best with:

  • Good lighting
  • Clear, unobstructed view
  • Reasonable resolution (640x480 minimum)

Check camera positioning:

Objects should be:

  • Not too far from camera
  • Not too close (partial view)
  • At a reasonable angle

Slow Inference

Symptoms

  • Detection takes >100ms (expected: 30-50ms)
  • LLM responses take >10s (expected: 2-5s)
  • GPU utilization low during inference

Diagnosis

# Check latency stats
curl http://localhost:8000/api/system/pipeline-latency | jq

# Monitor GPU during inference
watch -n 1 nvidia-smi

Solutions

1. Verify GPU is being used:

See GPU Issues - CPU Fallback

2. Check for thermal throttling:

See GPU Issues - Thermal Throttling

3. Reduce concurrent load:

  • Lower --parallel in Nemotron startup
  • Process fewer cameras simultaneously

4. Optimize settings:

# YOLO26: ensure batching for multiple images
# Nemotron: adjust context size
--ctx-size 2048  # Smaller than default 4096

Model Loading Issues

Symptoms

  • "Model file not found"
  • "Failed to load model"
  • Service starts but first request fails

Solutions

1. Download models:

./ai/download_models.sh

2. Verify model files:

# Nemotron model
ls -la ai/nemotron/nemotron-mini-4b-instruct-q4_k_m.gguf
# Should be ~2.5GB

# YOLO26 (auto-downloads to HuggingFace cache)
ls -la ~/.cache/huggingface/

3. Check model path configuration:

# For custom paths
NEMOTRON_MODEL_PATH=/path/to/model.gguf
YOLO26_MODEL_PATH=/path/to/yolo26

Circuit Breaker Open

Symptoms

  • AI service marked as "unavailable (circuit open)"
  • Requests immediately rejected
  • Health checks return cached error

Diagnosis

# Check circuit breakers
curl http://localhost:8000/api/system/circuit-breakers | jq

Solutions

1. Wait for automatic recovery:

Circuit breakers auto-reset after timeout (default: 30s).

2. Manual reset:

curl -X POST http://localhost:8000/api/system/circuit-breakers/yolo26/reset
curl -X POST http://localhost:8000/api/system/circuit-breakers/nemotron/reset

3. Fix underlying issue:

Circuit opened because service repeatedly failed. Check:

  • Service health
  • Network connectivity
  • Resource availability

Next Steps


See Also


Back to Operator Hub