nexusAI/docs/services/inference-service.md

# Inference Service

**Package:** `@nexusai/inference-service`
**Location:** `packages/inference-service`
**Deployed on:** Main PC
**Port:** 3001

## Purpose

Thin adapter layer around the local LLM runtime (Ollama). Receives
assembled context packages from the orchestration service and returns
model responses.

## Dependencies

- `express` — HTTP API
- `ollama` — Ollama client
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3001 | Port to listen on |
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| DEFAULT_MODEL | No | llama3 | Default model to use for inference |

## Endpoints

| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |

> Further endpoints will be documented as the service is built out.