AI Enrichment Light Service¶

The ai-enrichment-light service is a lightweight GPU microservice designed for small, efficient AI models that run on a secondary GPU (e.g., NVIDIA A400 4GB). It complements the heavier ai-enrichment service by hosting models with lower VRAM requirements.

Overview¶

Property	Value
Container Name	`ai-enrichment-light`
Port	8096
Expected VRAM	~1.2GB (with TensorRT optimization)
Target GPU	GPU 1 (secondary, e.g., A400 4GB)
Source Directory	`ai/enrichment-light/`

Models Hosted¶

The service hosts five lightweight models optimized for efficient inference:

Model	VRAM	Purpose	Security Value
YOLOv8n-pose	~300MB	Human pose estimation	Detect suspicious postures (crouching, running)
Threat Detector	~400MB	Weapon detection	Identify knives, guns, bats, etc.
OSNet-x0.25	~100MB	Person re-identification	Track individuals across cameras
Pet Classifier	~200MB	Cat/dog classification	Reduce false positives from pets
Depth Anything V2	~150MB	Monocular depth estimation	Distance context for detections

API Endpoints¶

Health Check¶

GET /health

Response:

{
  "status": "healthy",
  "models_loaded": {
    "pose_estimator": true,
    "threat_detector": true,
    "person_reid": true,
    "pet_classifier": true,
    "depth_estimator": true
  },
  "gpu_available": true,
  "gpu_memory_used_gb": 1.15,
  "uptime_seconds": 3600.5
}

Pose Analysis¶

POST /pose-analyze
Content-Type: application/json

{
  "image_base64": "<base64-encoded-image>"
}

Response:

{
  "keypoints": [
    { "name": "nose", "x": 320, "y": 150, "confidence": 0.95 },
    { "name": "left_shoulder", "x": 280, "y": 200, "confidence": 0.92 }
  ],
  "posture": "standing",
  "alerts": [],
  "inference_time_ms": 15.3
}

Threat Detection¶

POST /threat-detect
Content-Type: application/json

{
  "image_base64": "<base64-encoded-image>"
}

Response:

{
  "threats_detected": [
    {
      "class": "knife",
      "confidence": 0.87,
      "bbox": { "x": 100, "y": 150, "width": 50, "height": 120 }
    }
  ],
  "is_threat": true,
  "max_confidence": 0.87,
  "inference_time_ms": 12.5
}

Person Re-identification¶

POST /person-reid
Content-Type: application/json

{
  "image_base64": "<base64-encoded-image>"
}

Response:

{
  "embedding": [0.123, -0.456, 0.789, ...],
  "embedding_dim": 512,
  "inference_time_ms": 8.2
}

Pet Classification¶

POST /pet-classify
Content-Type: application/json

{
  "image_base64": "<base64-encoded-image>"
}

Response:

{
  "pet_type": "dog",
  "breed": "unknown",
  "confidence": 0.94,
  "is_household_pet": true,
  "inference_time_ms": 10.1
}

Depth Estimation¶

POST /depth-estimate
Content-Type: application/json

{
  "image_base64": "<base64-encoded-image>"
}

Response:

{
  "depth_map_base64": "<base64-encoded-png>",
  "min_depth": 0.15,
  "max_depth": 0.95,
  "mean_depth": 0.52,
  "inference_time_ms": 18.7
}

Prometheus Metrics¶

GET /metrics

Returns Prometheus-formatted metrics for monitoring.

Configuration¶

Environment Variables¶

Variable	Default	Description
`HOST`	`0.0.0.0`	Server bind address
`PORT`	`8096`	Server port
`POSE_MODEL_PATH`	`/models/yolov8n-pose/yolov8n-pose.pt`	Pose model path
`THREAT_MODEL_PATH`	`/models/threat-detection-yolov8n/weights/best.pt`	Threat model path
`REID_MODEL_PATH`	`/models/osnet-x0-25/osnet_x0_25.pth`	Re-ID model path
`PET_MODEL_PATH`	`/models/pet-classifier`	Pet classifier path
`DEPTH_MODEL_PATH`	`/models/depth-anything-v2-small`	Depth model path
`PYROSCOPE_ENABLED`	`true`	Enable continuous profiling
`PYROSCOPE_URL`	`http://pyroscope:4040`	Pyroscope server URL

Model Assignment¶

Models are assigned to either light or heavy service via environment variables:

# In docker-compose.prod.yml or .env
ENRICHMENT_POSE_SERVICE=light      # Default: light
ENRICHMENT_THREAT_SERVICE=light    # Default: light
ENRICHMENT_REID_SERVICE=light      # Default: light
ENRICHMENT_PET_SERVICE=light       # Default: light
ENRICHMENT_DEPTH_SERVICE=light     # Default: light

Preload Configuration¶

Control which models load at startup vs on-demand:

# Preload specific models at startup (comma-separated)
ENRICHMENT_LIGHT_PRELOAD_MODELS=pose_estimator,threat_detector

# Empty = all models load on-demand (saves startup time)
ENRICHMENT_LIGHT_PRELOAD_MODELS=

Available model names for preloading:

pose_estimator
threat_detector
person_reid
pet_classifier
depth_estimator

TensorRT Acceleration¶

YOLO models support TensorRT optimization for faster inference:

POSE_USE_TENSORRT=true      # Default: true
THREAT_USE_TENSORRT=true    # Default: true

TensorRT engines are cached in /cache/tensorrt/ and persist across container restarts.

Docker Compose Configuration¶

ai-enrichment-light:
  build:
    context: .
    dockerfile: ai/enrichment-light/Dockerfile
  ports:
    - '8096:8096'
  volumes:
    - ${AI_MODELS_PATH:-/export/ai_models}/model-zoo/yolov8n-pose:/models/yolov8n-pose:ro
    - ${AI_MODELS_PATH:-/export/ai_models}/model-zoo/threat-detection-yolov8n:/models/threat-detection-yolov8n:ro
    - ${AI_MODELS_PATH:-/export/ai_models}/model-zoo/osnet-x0-25:/models/osnet-x0-25:ro
    - ${AI_MODELS_PATH:-/export/ai_models}/model-zoo/pet-classifier:/models/pet-classifier:ro
    - ${AI_MODELS_PATH:-/export/ai_models}/model-zoo/depth-anything-v2-small:/models/depth-anything-v2-small:ro
    - enrichment-light-tensorrt-cache:/cache/tensorrt
  environment:
    - SERVICE_NAME=ai-enrichment-light
    - PYROSCOPE_ENABLED=${PYROSCOPE_ENABLED:-true}
    - ENRICHMENT_PRELOAD_MODELS=${ENRICHMENT_LIGHT_PRELOAD_MODELS:-}
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['${GPU_CLIP:-1}']
            capabilities: [gpu]

Backend Integration¶

The backend routes requests to the appropriate enrichment service based on configuration:

# From backend/services/enrichment_client.py
ENRICHMENT_URL = os.environ.get("ENRICHMENT_URL", "http://ai-enrichment:8094")
ENRICHMENT_LIGHT_URL = os.environ.get("ENRICHMENT_LIGHT_URL", "http://ai-enrichment-light:8096")

# Route based on model assignment
if os.environ.get("ENRICHMENT_POSE_SERVICE", "light") == "light":
    response = await httpx.post(f"{ENRICHMENT_LIGHT_URL}/pose-analyze", ...)
else:
    response = await httpx.post(f"{ENRICHMENT_URL}/pose-analyze", ...)

Prometheus Metrics¶

Metric	Type	Description
`enrichment_light_inference_requests_total{endpoint, status}`	Counter	Total inference requests
`enrichment_light_inference_latency_seconds{endpoint}`	Histogram	Inference latency
`enrichment_light_gpu_memory_used_gb`	Gauge	GPU memory usage
`enrichment_light_model_loaded{model}`	Gauge	Model loaded status (0/1)

Health Checks¶

The service uses an HTTP health check:

healthcheck:
  test:
    [
      'CMD',
      'python',
      '-c',
      "import httpx; r = httpx.get('http://localhost:8096/health'); r.raise_for_status()",
    ]
  interval: 30s
  timeout: 10s
  start_period: 90s
  retries: 3

The 90-second start period allows time for model loading.

GPU Memory Budget¶

Typical VRAM allocation on a 4GB GPU:

Model	VRAM (TensorRT)	VRAM (PyTorch)
YOLOv8n-pose	~200MB	~300MB
Threat Detector	~250MB	~400MB
OSNet-x0.25	~100MB	~100MB
Pet Classifier	~150MB	~200MB
Depth Anything V2	~150MB	~150MB
Total	~850MB	~1.2GB

This leaves headroom for CUDA context and dynamic allocations on a 4GB GPU.

Comparison: Light vs Heavy Enrichment¶

Aspect	ai-enrichment-light	ai-enrichment
Port	8096	8094
Target GPU	GPU 1 (A400 4GB)	GPU 0 (A5500 24GB)
VRAM Budget	~1.2GB	~6.8GB
Model Types	Small, efficient	Large transformers
Models	Pose, threat, reid, pet, depth	Vehicle, fashion, age, gender, action

Troubleshooting¶

Model Not Loading¶

Check model assignment:

# Verify model is assigned to light service
echo $ENRICHMENT_POSE_SERVICE  # Should be 'light'

# Check preload configuration
echo $ENRICHMENT_LIGHT_PRELOAD_MODELS

GPU Memory Issues¶

# Check GPU memory from container
docker exec ai-enrichment-light nvidia-smi

# Check which models are loaded
curl http://localhost:8096/health | jq '.models_loaded'

TensorRT Engine Not Building¶

# Check TensorRT cache
docker exec ai-enrichment-light ls -la /cache/tensorrt/

# View container logs for TensorRT build progress
docker logs ai-enrichment-light 2>&1 | grep -i tensorrt

AI Enrichment Light Service¶

Overview¶

Models Hosted¶

API Endpoints¶

Health Check¶

Pose Analysis¶

Threat Detection¶

Person Re-identification¶

Pet Classification¶

Depth Estimation¶

Prometheus Metrics¶

Configuration¶

Environment Variables¶

Model Assignment¶

Preload Configuration¶

TensorRT Acceleration¶

Docker Compose Configuration¶

Backend Integration¶

Prometheus Metrics¶

Health Checks¶

GPU Memory Budget¶

Comparison: Light vs Heavy Enrichment¶

Troubleshooting¶

Model Not Loading¶

GPU Memory Issues¶

TensorRT Engine Not Building¶

Related Documentation¶