Service Control¶

Start, stop, restart, and monitor services through the dashboard UI.

Time to read: ~7 min Prerequisites: System running, dashboard access

Overview¶

The Services Panel provides a unified interface for monitoring and controlling all system services. It combines real-time WebSocket updates with REST API fallback for reliable service management.

Service Categories¶

Services are organized into three categories:

Category	Services	Purpose
Infrastructure	PostgreSQL, Redis	Data storage and messaging
AI	YOLO26, Nemotron	Object detection and analysis
Monitoring	File Watcher, Batch Aggregator, Cleanup	Pipeline orchestration

Service Management UI¶

Accessing the Panel¶

The Services Panel is available on the System page of the dashboard. It displays:

Category summary bar showing health counts
Individual service cards with status and controls
Real-time status updates via WebSocket

Category Summary Bar¶

At the top of the panel, badges show health status per category:

[Infrastructure 2/2] [AI Services 2/2] [Monitoring 3/3]

Green badge: All services healthy
Yellow badge: Some services degraded
Red badge: One or more services unhealthy

Service Cards¶

Each service card displays:

Element	Description
Status Icon	Green check, red X, or yellow warning
Service Name	Display name with port number if applicable
Description	Brief description of service purpose
Status Badge	Current status (Healthy, Unhealthy, etc.)
Restart Button	Manually restart the service
Toggle Button	Enable/disable the service

Service Status States¶

Status	Icon	Badge Color	Description
Healthy	Check	Green	Service running and responding
Unhealthy	X	Red	Service down or not responding
Degraded	Warning	Yellow	Service running but experiencing issues
Restarting	Spinner	Yellow	Service is restarting
Disabled	-	Gray	Service manually disabled
Unknown	Warning	Gray	Status cannot be determined

Starting and Stopping Services¶

Starting a Stopped Service¶

Via Dashboard:

Locate the stopped service in the Services Panel
The Restart button initiates a start operation
Status updates to "Starting" then "Healthy" when ready

Via API:

# Start a specific service
curl -X POST "http://localhost:8000/api/system/services/{name}/start"

Response:

{
  "success": true,
  "message": "Service 'yolo26' start initiated",
  "service": {
    "name": "yolo26",
    "display_name": "YOLO26",
    "status": "starting",
    ...
  }
}

Disabling a Service¶

Disabling prevents automatic restarts:

Via Dashboard:

Click the toggle button (green when enabled)
Service status changes to "Disabled"
Self-healing restarts are prevented

Via API:

# Disable a service
curl -X POST "http://localhost:8000/api/system/services/{name}/disable"

Enabling a Disabled Service¶

Via Dashboard:

Click the toggle button (gray when disabled)
Service becomes eligible for self-healing
May need manual restart to start immediately

Via API:

# Enable a service
curl -X POST "http://localhost:8000/api/system/services/{name}/enable"

Restarting Services¶

Manual Restart¶

Via Dashboard:

Click the Restart button on the service card
Confirm the restart in the dialog
Status changes to "Restarting"
Service recovers to "Healthy" when ready

Via API:

# Restart a specific service
curl -X POST "http://localhost:8000/api/system/services/{name}/restart"

Restart Behavior¶

When you restart a service:

Confirmation Required: Dashboard prompts for confirmation
Failure Count Reset: Manual restarts reset the failure counter
Status Broadcast: WebSocket notifies all connected clients
Temporary Interruption: Service is briefly unavailable

Restart Policies¶

The container orchestrator implements self-healing restart policies:

Scenario	Behavior
Health check fails	Automatic restart with exponential backoff
Max failures reached	Service disabled (requires manual enable)
Manual restart	Failure count reset, immediate restart
Manual disable	No automatic restarts

Service Health Monitoring¶

Real-Time Updates¶

The Services Panel receives real-time status updates via WebSocket:

Service discovered: When orchestrator finds a new container
Health recovered: When service becomes healthy after failure
Health failed: When health check fails
Restart initiated: When restart begins
Restart completed: When service is back up
Service disabled: When max failures reached or manually disabled
Service enabled: When manually re-enabled

Polling Fallback¶

If WebSocket is unavailable, the panel falls back to polling:

Default interval: 30 seconds
Configurable via pollingInterval prop

Health Endpoints¶

# Overall system health
curl http://localhost:8000/api/system/health

# List all services with status
curl http://localhost:8000/api/system/services

# Filter by category
curl "http://localhost:8000/api/system/services?category=ai"

API Reference¶

List Services¶

GET /api/system/services
GET /api/system/services?category=infrastructure
GET /api/system/services?category=ai
GET /api/system/services?category=monitoring

Response:

{
  "services": [
    {
      "name": "postgres",
      "display_name": "PostgreSQL",
      "category": "infrastructure",
      "status": "running",
      "enabled": true,
      "container_id": "abc123def456",
      "image": "postgres:16",
      "port": 5432,
      "failure_count": 0,
      "restart_count": 1,
      "last_restart_at": "2025-12-23T10:00:00Z",
      "uptime_seconds": 86400
    }
  ],
  "by_category": {
    "infrastructure": { "total": 2, "healthy": 2, "unhealthy": 0 },
    "ai": { "total": 2, "healthy": 2, "unhealthy": 0 },
    "monitoring": { "total": 3, "healthy": 3, "unhealthy": 0 }
  },
  "timestamp": "2025-12-23T10:30:00Z"
}

Service Actions¶

Endpoint	Method	Description
`/api/system/services/{name}/restart`	POST	Restart service
`/api/system/services/{name}/start`	POST	Start stopped service
`/api/system/services/{name}/enable`	POST	Enable disabled service
`/api/system/services/{name}/disable`	POST	Disable service

Error Responses¶

Status	Description
400	Service is disabled (enable first)
400	Service is already running
404	Service not found
503	Container orchestrator not available

Service Definitions¶

Infrastructure Services¶

Service	Port	Description
PostgreSQL	5432	Primary database for events and detections
Redis	6379	Cache and message queue for pipeline

AI Services¶

Service	Port	Description
YOLO26	8095	Real-time object detection model
Nemotron	8091	Risk analysis LLM for security reasoning

Monitoring Services¶

Service	Description
File Watcher	Monitors camera FTP directories for new images
Batch Aggregator	Aggregates detections into analysis batches
Cleanup Service	Removes old data based on retention policy

WebSocket Events¶

Service status changes are broadcast via the /ws/system channel:

Event Structure:

{
  "type": "service_status",
  "data": {
    "name": "yolo26",
    "display_name": "YOLO26",
    "category": "ai",
    "status": "running",
    "enabled": true,
    "failure_count": 0,
    "restart_count": 5,
    "uptime_seconds": 3600
  },
  "message": "Service recovered"
}

Event Types:

Message	Meaning
Service discovered	Container found during discovery
Service recovered	Health check passed after failure
Health check failed	Service became unhealthy
Manual restart initiated	User triggered restart
Restart completed	Restart finished successfully
Restart failed	Restart did not succeed
Service disabled	Max failures or manual disable
Service enabled	Manual re-enable
Service started	Start operation completed

Container Orchestrator¶

The backend's Container Orchestrator manages service lifecycle:

Discovery¶

On startup, the orchestrator:

Connects to Docker daemon
Discovers containers matching name patterns
Registers services in the service registry
Loads persisted state from Redis
Starts health monitoring

Health Monitoring¶

Continuous health checks:

Configurable check interval
Automatic restart on failure
Exponential backoff between restarts
Max failure threshold before disable

Self-Healing¶

When a service fails health check:

Orchestrator initiates restart
Failure count incremented
If count exceeds threshold, service disabled
WebSocket broadcast notifies clients

State Persistence¶

Service state (failure counts, restart history) is persisted to Redis:

Survives backend restarts
Provides accurate failure tracking
Enables proper backoff timing

Troubleshooting¶

Service Won't Start¶

Symptoms: Start button clicked but service remains stopped

Solutions:

Check if service is disabled (enable first)
Verify container image exists
Check Docker/Podman daemon is running
Review backend logs for errors

Service Keeps Restarting¶

Symptoms: Service cycles between running and restarting

Solutions:

Check service logs for crash reason
Verify resource availability (GPU memory, disk)
Check configuration for errors
May need to disable and investigate

Orchestrator Not Available¶

Symptoms: 503 error when accessing services API

Solutions:

Verify backend is running
Check orchestrator is enabled in settings
Verify Docker socket is accessible
Review backend startup logs

WebSocket Not Updating¶

Symptoms: Status changes not reflected in real-time

Solutions:

Check WebSocket connection status
Refresh the page
Verify backend is healthy
Polling fallback should still work

Configuration Reference¶

Variable	Default	Description
`ORCHESTRATOR_ENABLED`	true	Enable container orchestrator
`HEALTH_CHECK_INTERVAL`	30s	Seconds between health checks
`MAX_FAILURES_BEFORE_DISABLE`	5	Failures before auto-disable

AI Services - Starting, stopping AI services
Monitoring Guide - System health monitoring
WebSocket API - Real-time events
Troubleshooting - Common issues

Back to Operator Hub

Service Control¶

Overview¶

Service Categories¶

Service Management UI¶

Accessing the Panel¶

Category Summary Bar¶

Service Cards¶

Service Status States¶

Starting and Stopping Services¶

Starting a Stopped Service¶

Disabling a Service¶

Enabling a Disabled Service¶

Restarting Services¶

Manual Restart¶

Restart Behavior¶

Restart Policies¶

Service Health Monitoring¶

Real-Time Updates¶

Polling Fallback¶

Health Endpoints¶

API Reference¶

List Services¶

Service Actions¶

Error Responses¶

Service Definitions¶

Infrastructure Services¶

AI Services¶

Monitoring Services¶

WebSocket Events¶

Container Orchestrator¶

Discovery¶

Health Monitoring¶

Self-Healing¶

State Persistence¶

Troubleshooting¶

Service Won't Start¶

Service Keeps Restarting¶

Orchestrator Not Available¶

WebSocket Not Updating¶

Configuration Reference¶

Related Documentation¶