# Orchestration Service

**Package:** `@nexusai/orchestration-service`  
**Location:** `packages/orchestration-service`  
**Deployed on:** Mini PC 2 (192.168.0.205)  
**Port:** 4000

## Purpose

The main entry point for all clients. Assembles context packages from
memory, routes prompts to inference, and writes new episodes back to
memory after each interaction. Clients never talk directly to the memory
or inference services — all traffic flows through orchestration.

## Dependencies

- `express` — HTTP API
- `node-fetch` — inter-service HTTP communication
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 4000 | Port to listen on |
| MEMORY_SERVICE_URL | No | http://localhost:3002 | Memory service URL |
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |

## Internal Structure
src/
├── services/
│   ├── memory.js      # HTTP wrapper functions for memory service calls
│   └── inference.js   # HTTP wrapper functions for inference service calls
├── chat/
│   └── index.js       # Core pipeline logic — context assembly and coordination
├── routes/
│   └── chat.js        # Express route handlers
└── index.js           # Express app entry point

The `services/` layer wraps all downstream HTTP calls in named functions,
keeping the pipeline logic in `chat/index.js` readable and ensuring that
URL or endpoint changes have a single place to be updated.

## Chat Pipeline

When a request hits `POST /chat`, the following steps run in order:

1. **Session resolution** — looks up the session by `externalId` in the memory
   service. If not found, auto-creates a new session. Clients can generate a
   UUID for new conversations and pass it directly — no pre-creation step needed.

2. **Memory retrieval** — fetches the most recent episodes for the session
   (default: 10) from the memory service to use as conversational context.

3. **Prompt assembly** — combines a system prompt, the retrieved episodes, and
   the current user message into a single prompt string.

4. **Inference** — sends the assembled prompt to the inference service and
   waits for the response.

5. **Episode write** — writes the new exchange (user message + AI response)
   back to the memory service as a fire-and-forget operation. The client
   receives the response immediately without waiting for the write to complete.

6. **Response** — returns the AI response, model name, session ID, and token
   count to the client.

## Prompt Structure

The prompt sent to the inference service follows this structure:
[System prompt]
Here are some relevant memories from your past conversations:
User: {past user message}
Assistant: {past ai response}
... (up to 10 recent episodes)
--- End of recent memories ---
User: {current message}
Assistant:

## Endpoints

### Health

| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check — reports downstream service URLs |

### Chat

| Method | Path | Description |
|---|---|---|
| POST | /chat | Send a message and receive a response |

---

**POST /chat**

Request body:
```json
{
  "sessionId": "your-session-uuid",
  "message": "Hello, my name is Tim.",
  "model": "companion:latest",
  "temperature": 0.7
}
```

`model` and `temperature` are optional — fall back to inference service defaults
if omitted.

Response:
```json
{
  "sessionId": "your-session-uuid",
  "response": "Hello Tim! How can I help you today?",
  "model": "companion:latest",
  "tokenCount": 87
}
```

| Field | Description |
|---|---|
| `sessionId` | Echo of the provided session ID |
| `response` | The AI's response text |
| `model` | Model name as reported by the inference service |
| `tokenCount` | Combined prompt + completion token count |

> Note: If `sessionId` does not exist in the memory service, a new session
> is automatically created. Clients can safely generate a UUID for new
> conversations and pass it on the first message.