Skip to content

Deployment Modes & AI Networking

Deployment Modes

AI-generated visualization comparing Development, Production, and Hybrid deployment modes.

A practical operator guide for choosing a deployment mode and setting YOLO26_URL / NEMOTRON_URL / FLORENCE_URL / CLIP_URL / ENRICHMENT_URL correctly.

If you're seeing "AI services unreachable" in health checks, it's almost always a networking mode mismatch: the backend is trying to reach the AI services using the wrong hostname.


Deployment Topology Overview

Deployment topology diagram showing the four deployment modes: Production (all containers), All-host development (no containers), Backend container with host AI, and Remote AI host configurations with their respective networking paths

Visual overview of deployment topologies and AI service connectivity options.


Decision Table (pick one)

Deployment Mode Decision Tree

Decision flowchart for choosing between Production (recommended), Development, and Hybrid deployment modes.

Mode When to choose Backend runs AI runs What URLs should look like
Production (recommended) You want the simplest “it just runs” setup Container (docker-compose.prod.yml) Containers (ai-*) http://ai-yolo26:8095, http://ai-llm:8091, ...
All-host development You’re developing locally and want zero container networking complexity Host (uvicorn) Host (./scripts/start-ai.sh) http://localhost:8095, http://localhost:8091, ...
Backend container + host AI You want hot-reload containers, but AI runs on the host (GPU reasons) Container (docker-compose.yml) Host http://host.docker.internal:8095 (Docker Desktop) or http://<host-ip>:8095 (Linux)
Remote AI host AI runs on a separate GPU box Host or container Remote host http://<gpu-host>:8095 etc.

For authoritative ports/env defaults, see docs/reference/config/env-reference.md.


Mode 1: Production (docker-compose.prod.yml)

Start

docker compose -f docker-compose.prod.yml up -d

.env (backend → AI via compose DNS)

YOLO26_URL=http://ai-yolo26:8095
NEMOTRON_URL=http://ai-llm:8091
FLORENCE_URL=http://ai-florence:8092
CLIP_URL=http://ai-clip:8093
ENRICHMENT_URL=http://ai-enrichment:8094

Verify from inside the backend container

docker compose -f docker-compose.prod.yml exec -T backend curl -fsS http://ai-yolo26:8095/health
docker compose -f docker-compose.prod.yml exec -T backend curl -fsS http://ai-llm:8091/health
docker compose -f docker-compose.prod.yml exec -T backend curl -fsS http://ai-florence:8092/health
docker compose -f docker-compose.prod.yml exec -T backend curl -fsS http://ai-clip:8093/health
docker compose -f docker-compose.prod.yml exec -T backend curl -fsS http://ai-enrichment:8094/health

Mode 2: All-host development (no containers for backend/AI)

Start AI (host)

./scripts/start-ai.sh start

Start backend (host)

uvicorn backend.main:app --host 0.0.0.0 --port 8000 --reload

.env

YOLO26_URL=http://localhost:8095
NEMOTRON_URL=http://localhost:8091
FLORENCE_URL=http://localhost:8092
CLIP_URL=http://localhost:8093
ENRICHMENT_URL=http://localhost:8094

Mode 3: Backend container + host AI

This is the most common “works on my machine” failure mode. The backend is in a container; localhost:8095 points to the container itself, not your host.

Docker Desktop (macOS/Windows)

YOLO26_URL=http://host.docker.internal:8095
NEMOTRON_URL=http://host.docker.internal:8091
FLORENCE_URL=http://host.docker.internal:8092
CLIP_URL=http://host.docker.internal:8093
ENRICHMENT_URL=http://host.docker.internal:8094

Podman on macOS

YOLO26_URL=http://host.containers.internal:8095
NEMOTRON_URL=http://host.containers.internal:8091
FLORENCE_URL=http://host.containers.internal:8092
CLIP_URL=http://host.containers.internal:8093
ENRICHMENT_URL=http://host.containers.internal:8094

Linux (Docker/Podman)

Use your host IP:

export AI_HOST=$(ip route get 1 | awk '{print $7}')
YOLO26_URL=http://${AI_HOST}:8095
NEMOTRON_URL=http://${AI_HOST}:8091
FLORENCE_URL=http://${AI_HOST}:8092
CLIP_URL=http://${AI_HOST}:8093
ENRICHMENT_URL=http://${AI_HOST}:8094

Mode 4: Remote AI host (separate GPU machine)

export GPU_HOST=10.0.0.50
YOLO26_URL=http://${GPU_HOST}:8095
NEMOTRON_URL=http://${GPU_HOST}:8091
FLORENCE_URL=http://${GPU_HOST}:8092
CLIP_URL=http://${GPU_HOST}:8093
ENRICHMENT_URL=http://${GPU_HOST}:8094

Tips:

  • Prefer a private LAN/VPN link; don’t expose these ports to the public internet.
  • If you add TLS/reverse proxying for AI, keep the backend URLs aligned (see docs/operator/ai-tls.md).

Common Pitfalls

  • Using localhost from inside a container: it points to that container, not the host.
  • Mixing prod + dev assumptions: prod uses compose DNS (ai-yolo26), dev host AI uses localhost.
  • Optional services down: the system can still run (events still created) but “extra context” may be missing. See docs/reference/troubleshooting/ai-issues.md.