updated documentation

2026-04-13 03:42:14 -07:00
parent 5f024093d1
commit 045da0d7f4
5 changed files with 464 additions and 112 deletions
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration.

 ## Dependencies

- `express` : HTTP API
- `cors` : cross-origin resource sharing middleware
- `node-fetch` : inter-service HTTP communication (memory service client only)
- `dotenv` : environment variable loading
- `@nexusai/shared` : shared utilities
-
-> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other
-> service clients use Node.js built-in `fetch`.
+- `express` — HTTP API
+- `cors` — cross-origin resource sharing middleware
+- `dotenv` — environment variable loading
+- `@nexusai/shared` — shared utilities

 ## Environment Variables

@@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration.
 | INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
 | QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
 | CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
+| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |

 ## Internal Structure
 ```
@@ -46,7 +43,8 @@ src/
 │   └── index.js       # Core pipeline logic — context assembly and coordination
 ├── routes/
 │   ├── chat.js        # POST /chat and POST /chat/stream route handlers
-│   └── sessions.js    # GET /sessions/:sessionId/history route handler
+│   ├── sessions.js    # Session list, history, rename, and delete routes
+│   └── models.js      # GET /models — reads models.json manifest from disk
 └── index.js           # Express app entry point
 ```

@@ -65,7 +63,7 @@ the client.
   UUID for new conversations and pass it directly — no pre-creation step needed.

 2. **Recent episode retrieval** — fetches the most recent episodes for the session
-   (default: 10) from the memory service.
+   (default: 5) from the memory service.

 3. **Semantic search** — embeds the user message via the embedding service, then
   queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
@@ -89,37 +87,68 @@ the client.
   count to the client.

 ## Prompt Structure
+```
 [System prompt]
+
 Here are some relevant memories from earlier conversations:
 User: {past user message}
 Assistant: {past ai response}
 ... (up to 5 semantic episodes)
-Here is the recent conversation history:
+---
+Here are some relevant memories from your past conversations:
 User: {past user message}
 Assistant: {past ai response}
-... (up to 10 recent episodes)
--- End of memories ---
+... (up to 5 recent episodes)
+--- End of recent memories ---
+
 User: {current message}
 Assistant:
+```

 Semantic episodes appear before recent episodes so the model encounters
 long-range relevant context before the immediate conversation flow.

 ## SSE Stream Format

-The inference service emits chunks in this format:
-data: {"model":"companion:latest","response":"Hello","done":false}
-data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...}
+The inference service emits chunks from the llama.cpp provider in this format:
+```
+data: {"response":"Hello","done":false}
+data: {"response":"!","done":false}
+data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
 data: [DONE]
+```

 The orchestration service re-emits to the client as:
+```
 data: {"text":"Hello"}
 data: {"text":"!"}
-data: {"done":true}
+data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
+```

 The `[DONE]` sentinel from the inference service is consumed internally
 and not forwarded. The client stream is terminated by `res.end()` after
-the `{"done":true}` event.
+the done event. Model name and token count are included on the done event
+so the client can display them in the UI.
+
+## Models Manifest
+
+The `/models` endpoint reads a `models.json` file from disk at the path
+specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
+the model files, and is accessible to orchestration via a network share
+mounted at `/mnt/nexus-models`.
+
+The manifest is read fresh on each request — no restart needed when models
+are added or removed.
+
+**models.json format:**
+```json
+[
+  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
+]
+```
+
+- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
+- `label` — display name shown in the UI

 ## Endpoints

@@ -142,6 +171,14 @@ the `{"done":true}` event.
 |---|---|---|
 | GET | /sessions | Get paginated list of all sessions |
 | GET | /sessions/:sessionId/history | Get paginated episode history for a session |
+| PATCH | /sessions/:sessionId | Rename a session |
+| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
+
+### Models
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /models | Get list of available models from manifest file |

 ---

@@ -152,7 +189,7 @@ Request body:
 {
  "sessionId": "your-session-uuid",
  "message": "Hello, my name is Tim.",
-  "model": "companion:latest",
+  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7
 }
 ```
@@ -165,7 +202,7 @@ Response:
 {
  "sessionId": "your-session-uuid",
  "response": "Hello Tim! How can I help you today?",
-  "model": "companion:latest",
+  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "tokenCount": 87
 }
 ```
@@ -176,23 +213,34 @@ Response:

 Same request body as `POST /chat`.

-Response is a stream of Server-Sent Events. Each event contains a text
-delta. The stream ends with a `done` event.
+Response is a stream of Server-Sent Events:
+```
 data: {"text":"Hello"}
 data: {"text":" Tim"}
-data: {"text":"!"}
-data: {"done":true}
+data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
+```

-Clients should read the `text` field from each chunk and accumulate them
-to build the full response string. The connection is closed by the server
-after the `{"done":true}` event.
+---
+
+**PATCH /sessions/:sessionId**
+
+Request body:
+```json
+{ "name": "My Renamed Session" }
+```
+
+Returns the updated session object. `name` is required and trimmed of whitespace.
+
+---
+
+**DELETE /sessions/:sessionId**
+
+Returns `204 No Content`. Cascades to delete all episodes for the session.

 ---

 **GET /sessions/:sessionId/history**

-Returns paginated episode history for a session identified by its external ID.
-
 Query parameters:

 | Parameter | Default | Description |
@@ -218,30 +266,17 @@ Response:
 }
 ```

+Episodes are ordered newest first.
+
 ---

-**GET /sessions**
+**GET /models**

-Returns a paginated list of all sessions, ordered by most recently active.
-
-Query parameters:
-
-| Parameter | Default | Description |
-|---|---|---|
-| limit | 20 | Maximum number of sessions to return |
-| offset | 0 | Number of sessions to skip (for pagination) |
-
-Response:
+Returns the parsed contents of `models.json`:
 ```json
 [
-  {
-    "id": 1,
-    "external_id": "test-semantic",
-    "metadata": null,
-    "created_at": 1712345678,
-    "updated_at": 1712345999
-  }
+  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
 ]
 ```

-Episodes are ordered newest first. Returns `404` if the session does not exist.
+Returns `500` if the manifest file cannot be read or parsed.