176 lines
5.8 KiB
Markdown
176 lines
5.8 KiB
Markdown
# Orchestration Service
|
|
|
|
**Package:** `@nexusai/orchestration-service`
|
|
**Location:** `packages/orchestration-service`
|
|
**Deployed on:** Mini PC 2 (192.168.0.205)
|
|
**Port:** 4000
|
|
|
|
## Purpose
|
|
|
|
The main entry point for all clients. Assembles context packages from
|
|
memory, routes prompts to inference, and writes new episodes back to
|
|
memory after each interaction. Clients never talk directly to the memory
|
|
or inference services — all traffic flows through orchestration.
|
|
|
|
## Dependencies
|
|
|
|
- `express` — HTTP API
|
|
- `cors` — cross-origin resource sharing middleware
|
|
- `dotenv` — environment variable loading
|
|
- `@nexusai/shared` — shared utilities
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Required | Default | Description |
|
|
|---|---|---|---|
|
|
| PORT | No | 4000 | Port to listen on |
|
|
| MEMORY_SERVICE_URL | No | http://localhost:3002 | Memory service URL |
|
|
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
|
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
|
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
|
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
|
|
|
|
## Internal Structure
|
|
|
|
```
|
|
src/
|
|
├── services/
|
|
│ ├── memory.js # HTTP client for memory service
|
|
│ ├── inference.js # HTTP client for inference service
|
|
│ ├── embedding.js # HTTP client for embedding service
|
|
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
|
|
├── chat/
|
|
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
|
├── routes/
|
|
│ ├── chat.js # POST /chat and POST /chat/stream
|
|
│ ├── sessions.js # Session CRUD proxy
|
|
│ ├── projects.js # Project CRUD proxy
|
|
│ └── models.js # GET /models — reads models.json from disk
|
|
└── index.js # Express app entry point
|
|
```
|
|
|
|
The `services/` layer wraps all downstream HTTP calls in named functions.
|
|
URL or endpoint changes have a single place to be updated.
|
|
|
|
## Chat Pipeline
|
|
|
|
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
|
difference is how the inference response is delivered to the client.
|
|
|
|
### Steps
|
|
|
|
1. **Session resolution** — look up session by `externalId`. Auto-create if
|
|
not found. Clients generate a UUID for new conversations — no pre-creation
|
|
step needed.
|
|
|
|
2. **Project context resolution** — if the session has a `project_id`, fetch
|
|
the project and all its session IDs. Used to scope semantic search. See
|
|
`memory-isolation.md` for full behaviour.
|
|
|
|
3. **Recent episode retrieval** — fetch the most recent episodes for the
|
|
session (`RECENT_EPISODE_LIMIT`, default 5).
|
|
|
|
4. **Semantic search** — embed the user message, query Qdrant for the top-5
|
|
most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
|
|
recent episodes. Non-critical — if it fails, pipeline continues with
|
|
recency-only context.
|
|
|
|
5. **Prompt assembly** — combine system prompt, semantic episodes, recent
|
|
episodes, and user message.
|
|
|
|
6. **Inference** — send to inference service. `/chat` awaits full response;
|
|
`/chat/stream` pipes SSE chunks to the client.
|
|
|
|
7. **Episode write** — write the exchange back to memory. Fire-and-forget
|
|
for `/chat`; awaited for `/chat/stream` to ensure the full text is
|
|
accumulated before saving.
|
|
|
|
8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
|
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
|
write the result back as `session.name`. Fully fire-and-forget.
|
|
|
|
### Prompt Structure
|
|
|
|
```
|
|
[System prompt]
|
|
|
|
Here are some relevant memories from earlier conversations:
|
|
User: {past user message}
|
|
Assistant: {past ai response}
|
|
... (up to 5 semantic episodes)
|
|
---
|
|
Here are some relevant memories from your past conversations:
|
|
User: {past user message}
|
|
Assistant: {past ai response}
|
|
... (up to 5 recent episodes)
|
|
--- End of recent memories ---
|
|
|
|
User: {current message}
|
|
Assistant:
|
|
```
|
|
|
|
Semantic episodes appear before recent episodes so the model sees
|
|
long-range context before the immediate conversation flow.
|
|
|
|
## SSE Stream Format
|
|
|
|
Inference service → orchestration:
|
|
```
|
|
data: {"response":"Hello","done":false}
|
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
|
data: [DONE]
|
|
```
|
|
|
|
Orchestration → client:
|
|
```
|
|
data: {"text":"Hello"}
|
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
|
```
|
|
|
|
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
|
|
is terminated by `res.end()` after the done event.
|
|
|
|
## Models Manifest
|
|
|
|
`GET /models` reads `models.json` fresh on each request from
|
|
`MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
|
|
accessible via an SMB mount at `/mnt/nexus-models`.
|
|
|
|
```json
|
|
[
|
|
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
|
]
|
|
```
|
|
|
|
`value` must match the model name as reported by `llama-server` (including
|
|
`.gguf` extension). No service restart needed when models are added or removed.
|
|
|
|
## Sessions Route Behaviour
|
|
|
|
`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
|
|
The validation guard only rejects requests where neither is provided:
|
|
|
|
```js
|
|
if (!name?.trim() && projectId === undefined) {
|
|
return res.status(400).json({ error: 'name or projectId is required' });
|
|
}
|
|
```
|
|
|
|
This allows `useChat` to write project assignment separately from rename
|
|
operations.
|
|
|
|
## Caddy Configuration
|
|
|
|
Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
|
|
|
|
```
|
|
handle /chat* { reverse_proxy localhost:4000 }
|
|
handle /sessions* { reverse_proxy localhost:4000 }
|
|
handle /models* { reverse_proxy localhost:4000 }
|
|
handle /projects* { reverse_proxy localhost:4000 }
|
|
```
|
|
|
|
After updating: `caddy reload --config /path/to/Caddyfile`
|
|
|
|
For all HTTP endpoints, see `api-routes.md`. |