updated documentation
This commit is contained in:
@@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `express` : HTTP API
|
||||
- `cors` : cross-origin resource sharing middleware
|
||||
- `node-fetch` : inter-service HTTP communication (memory service client only)
|
||||
- `dotenv` : environment variable loading
|
||||
- `@nexusai/shared` : shared utilities
|
||||
|
||||
> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other
|
||||
> service clients use Node.js built-in `fetch`.
|
||||
- `express` — HTTP API
|
||||
- `cors` — cross-origin resource sharing middleware
|
||||
- `dotenv` — environment variable loading
|
||||
- `@nexusai/shared` — shared utilities
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration.
|
||||
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
||||
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
||||
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
|
||||
|
||||
## Internal Structure
|
||||
```
|
||||
@@ -46,7 +43,8 @@ src/
|
||||
│ └── index.js # Core pipeline logic — context assembly and coordination
|
||||
├── routes/
|
||||
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
|
||||
│ └── sessions.js # GET /sessions/:sessionId/history route handler
|
||||
│ ├── sessions.js # Session list, history, rename, and delete routes
|
||||
│ └── models.js # GET /models — reads models.json manifest from disk
|
||||
└── index.js # Express app entry point
|
||||
```
|
||||
|
||||
@@ -65,7 +63,7 @@ the client.
|
||||
UUID for new conversations and pass it directly — no pre-creation step needed.
|
||||
|
||||
2. **Recent episode retrieval** — fetches the most recent episodes for the session
|
||||
(default: 10) from the memory service.
|
||||
(default: 5) from the memory service.
|
||||
|
||||
3. **Semantic search** — embeds the user message via the embedding service, then
|
||||
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
|
||||
@@ -89,37 +87,68 @@ the client.
|
||||
count to the client.
|
||||
|
||||
## Prompt Structure
|
||||
```
|
||||
[System prompt]
|
||||
|
||||
Here are some relevant memories from earlier conversations:
|
||||
User: {past user message}
|
||||
Assistant: {past ai response}
|
||||
... (up to 5 semantic episodes)
|
||||
Here is the recent conversation history:
|
||||
---
|
||||
Here are some relevant memories from your past conversations:
|
||||
User: {past user message}
|
||||
Assistant: {past ai response}
|
||||
... (up to 10 recent episodes)
|
||||
--- End of memories ---
|
||||
... (up to 5 recent episodes)
|
||||
--- End of recent memories ---
|
||||
|
||||
User: {current message}
|
||||
Assistant:
|
||||
```
|
||||
|
||||
Semantic episodes appear before recent episodes so the model encounters
|
||||
long-range relevant context before the immediate conversation flow.
|
||||
|
||||
## SSE Stream Format
|
||||
|
||||
The inference service emits chunks in this format:
|
||||
data: {"model":"companion:latest","response":"Hello","done":false}
|
||||
data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...}
|
||||
The inference service emits chunks from the llama.cpp provider in this format:
|
||||
```
|
||||
data: {"response":"Hello","done":false}
|
||||
data: {"response":"!","done":false}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
The orchestration service re-emits to the client as:
|
||||
```
|
||||
data: {"text":"Hello"}
|
||||
data: {"text":"!"}
|
||||
data: {"done":true}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||
```
|
||||
|
||||
The `[DONE]` sentinel from the inference service is consumed internally
|
||||
and not forwarded. The client stream is terminated by `res.end()` after
|
||||
the `{"done":true}` event.
|
||||
the done event. Model name and token count are included on the done event
|
||||
so the client can display them in the UI.
|
||||
|
||||
## Models Manifest
|
||||
|
||||
The `/models` endpoint reads a `models.json` file from disk at the path
|
||||
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
|
||||
the model files, and is accessible to orchestration via a network share
|
||||
mounted at `/mnt/nexus-models`.
|
||||
|
||||
The manifest is read fresh on each request — no restart needed when models
|
||||
are added or removed.
|
||||
|
||||
**models.json format:**
|
||||
```json
|
||||
[
|
||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||
]
|
||||
```
|
||||
|
||||
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
|
||||
- `label` — display name shown in the UI
|
||||
|
||||
## Endpoints
|
||||
|
||||
@@ -142,6 +171,14 @@ the `{"done":true}` event.
|
||||
|---|---|---|
|
||||
| GET | /sessions | Get paginated list of all sessions |
|
||||
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
|
||||
| PATCH | /sessions/:sessionId | Rename a session |
|
||||
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
|
||||
|
||||
### Models
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /models | Get list of available models from manifest file |
|
||||
|
||||
---
|
||||
|
||||
@@ -152,7 +189,7 @@ Request body:
|
||||
{
|
||||
"sessionId": "your-session-uuid",
|
||||
"message": "Hello, my name is Tim.",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"temperature": 0.7
|
||||
}
|
||||
```
|
||||
@@ -165,7 +202,7 @@ Response:
|
||||
{
|
||||
"sessionId": "your-session-uuid",
|
||||
"response": "Hello Tim! How can I help you today?",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"tokenCount": 87
|
||||
}
|
||||
```
|
||||
@@ -176,23 +213,34 @@ Response:
|
||||
|
||||
Same request body as `POST /chat`.
|
||||
|
||||
Response is a stream of Server-Sent Events. Each event contains a text
|
||||
delta. The stream ends with a `done` event.
|
||||
Response is a stream of Server-Sent Events:
|
||||
```
|
||||
data: {"text":"Hello"}
|
||||
data: {"text":" Tim"}
|
||||
data: {"text":"!"}
|
||||
data: {"done":true}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
||||
```
|
||||
|
||||
Clients should read the `text` field from each chunk and accumulate them
|
||||
to build the full response string. The connection is closed by the server
|
||||
after the `{"done":true}` event.
|
||||
---
|
||||
|
||||
**PATCH /sessions/:sessionId**
|
||||
|
||||
Request body:
|
||||
```json
|
||||
{ "name": "My Renamed Session" }
|
||||
```
|
||||
|
||||
Returns the updated session object. `name` is required and trimmed of whitespace.
|
||||
|
||||
---
|
||||
|
||||
**DELETE /sessions/:sessionId**
|
||||
|
||||
Returns `204 No Content`. Cascades to delete all episodes for the session.
|
||||
|
||||
---
|
||||
|
||||
**GET /sessions/:sessionId/history**
|
||||
|
||||
Returns paginated episode history for a session identified by its external ID.
|
||||
|
||||
Query parameters:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
@@ -218,30 +266,17 @@ Response:
|
||||
}
|
||||
```
|
||||
|
||||
Episodes are ordered newest first.
|
||||
|
||||
---
|
||||
|
||||
**GET /sessions**
|
||||
**GET /models**
|
||||
|
||||
Returns a paginated list of all sessions, ordered by most recently active.
|
||||
|
||||
Query parameters:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|---|---|---|
|
||||
| limit | 20 | Maximum number of sessions to return |
|
||||
| offset | 0 | Number of sessions to skip (for pagination) |
|
||||
|
||||
Response:
|
||||
Returns the parsed contents of `models.json`:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"external_id": "test-semantic",
|
||||
"metadata": null,
|
||||
"created_at": 1712345678,
|
||||
"updated_at": 1712345999
|
||||
}
|
||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||
]
|
||||
```
|
||||
|
||||
Episodes are ordered newest first. Returns `404` if the session does not exist.
|
||||
Returns `500` if the manifest file cannot be read or parsed.
|
||||
Reference in New Issue
Block a user