From 045da0d7f419c4d5890f9560369112e539a5bf19 Mon Sep 17 00:00:00 2001 From: Storme-bit Date: Mon, 13 Apr 2026 03:42:14 -0700 Subject: [PATCH] updated documentation --- docs/services/chat-client.md | 150 ++++++++++++++++++++----- docs/services/inference-service.md | 143 ++++++++++++++++++----- docs/services/memory-service.md | 42 ++++++- docs/services/orchestration-service.md | 131 +++++++++++++-------- docs/services/shared.md | 110 +++++++++++++++++- 5 files changed, 464 insertions(+), 112 deletions(-) diff --git a/docs/services/chat-client.md b/docs/services/chat-client.md index 13287d8..480a293 100644 --- a/docs/services/chat-client.md +++ b/docs/services/chat-client.md @@ -27,33 +27,46 @@ npm run dev # local dev server on port 5173 Vite bakes environment variables into the bundle at build time. The `.env` file is only needed on the machine running the build, not where files are served. +After building, copy `dist/` contents to `/srv/nexusai` on Mini PC 2 for Caddy to serve. + ## Environment Variables | Variable | Required | Default | Description | |---|---|---|---| -| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Empty string uses Vite proxy in dev, Caddy proxy in production. | +| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Must be set to the HTTPS domain in production to avoid mixed content errors. | + +Production value: +``` +VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com +``` ## Internal Structure ``` src/ ├── api/ -│ └── orchestration.js # All fetch calls to the orchestration service +│ └── orchestration.js # All fetch calls to the orchestration service +├── config/ +│ └── constants.js # FALLBACK_MODELS, DEFAULT_MODEL, API_DEFAULTS ├── hooks/ -│ ├── useSession.js # Session list, history loading, active session state -│ └── useChat.js # Message sending, SSE streaming, message state +│ ├── useSession.js # Session list, history loading, active session state +│ ├── useChat.js # Message sending, SSE streaming, message state +│ ├── useModels.js # Dynamic model list fetched from /models endpoint +│ └── useContextMenu.js # Right-click context menu position and visibility ├── components/ -│ ├── App.jsx # Root component — layout and shared state -│ ├── SessionList.jsx # Left sidebar — session list and new chat button -│ ├── ChatWindow.jsx # Centre panel — message thread and input bar -│ ├── MessageBubble.jsx # Individual message bubble (user or assistant) -│ └── InfoPanel.jsx # Right panel — model selector and session metadata -├── index.css # Global reset and CSS variables -└── main.jsx # React entry point +│ ├── App.jsx # Root component — layout and shared state +│ ├── SessionList.jsx # Left sidebar — session list, rename, delete +│ ├── ChatWindow.jsx # Centre panel — message thread and input bar +│ ├── MessageBubble.jsx # Individual message bubble (user or assistant) +│ ├── InfoPanel.jsx # Right panel — model selector and session metadata +│ └── SessionModal.jsx # Modal dialog for session settings (rename) +├── index.css # Global reset, CSS variables, utility classes +└── main.jsx # React entry point ``` ## Layout Three-panel layout with collapsible sidebars: +``` ┌─────────────────┬──────────────────────────┬─────────────┐ │ Session List │ Chat Window │ Info Panel │ │ (collapsible) │ │ (collapsible)│ @@ -64,9 +77,54 @@ Three-panel layout with collapsible sidebars: │ Session 2 │ │ │ │ │ [input bar] │ │ └─────────────────┴──────────────────────────┴─────────────┘ +``` -On mobile, sidebars collapse to a 56px icon rail. The centre chat window -always fills the remaining space. +Sidebars collapse to a 56px icon rail. The centre chat window always +fills the remaining space. + +## CSS Architecture + +Styles follow a hybrid approach — CSS utility classes for static reusable +rules, inline styles for dynamic prop-driven values. + +### CSS Variables (`:root`) + +| Variable | Value | Description | +|---|---|---| +| `--bg-base` | `#0f1117` | Page background | +| `--bg-surface` | `#1a1d27` | Panel backgrounds | +| `--bg-elevated` | `#222536` | Elevated elements (inputs, cards) | +| `--border` | `#2e3150` | Border colour | +| `--accent` | `#6c63ff` | Primary accent (buttons, highlights) | +| `--accent-hover` | `#574fd6` | Accent hover state | +| `--text-primary` | `#e8e8f0` | Primary text | +| `--text-secondary` | `#8b8fa8` | Secondary text | +| `--text-muted` | `#555870` | Muted / placeholder text | +| `--bubble-user` | `#6c63ff` | User message bubble background | +| `--bubble-ai` | `#222536` | AI message bubble background | +| `--sidebar-width` | `280px` | Expanded sidebar width | +| `--panel-width` | `260px` | Expanded info panel width | +| `--header-height` | `56px` | Shared header height across all panels | +| `--radius-sm` | `6px` | Small border radius | +| `--radius-md` | `8px` | Medium border radius | +| `--radius-lg` | `12px` | Large border radius | + +### Utility Classes + +| Class | Description | +|---|---| +| `.panel-header` | Shared header row — used in all three panels | +| `.btn-reset` | Resets button styles (no border, bg, cursor pointer) | +| `.btn-icon` | Icon button with hover state | +| `.btn-primary` | Accent-coloured action button with `:hover` and `:disabled` states | +| `.flex` / `.flex-col` | Flex layout helpers | +| `.flex-1` / `.flex-shrink` | Flex sizing helpers | +| `.items-center` / `.justify-center` / `.justify-between` | Alignment helpers | +| `.overflow-hidden` / `.scroll-y` | Overflow helpers | +| `.text-xs` / `.text-sm` / `.text-base` | Font size helpers | +| `.text-muted` / `.text-secondary` / `.text-accent` | Colour helpers | +| `.label-upper` | Uppercase section label style | +| `.truncate` | Text overflow ellipsis | ## API Layer @@ -78,39 +136,71 @@ All orchestration calls are centralised in `src/api/orchestration.js`: | `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select | | `sendMessage` | POST | /chat | Send message, await full response | | `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream | +| `fetchModels` | GET | /models | Load available models from manifest | +| `renameSession` | PATCH | /sessions/:id | Rename a session | +| `deleteSession` | DELETE | /sessions/:id | Delete a session | `streamMessage` returns an abort function — call it to cancel a stream mid-flight. -It uses a buffer pattern to handle SSE chunks that may span multiple network packets. +Uses a buffer pattern to handle SSE chunks that may span multiple network packets. ## Streaming The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events: +``` data: {"text":"Hello"} data: {"text":" Tim"} -data: {"done":true} +data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87} +``` An empty assistant bubble is appended immediately when the stream opens, then updated token by token using `updateLastMessage`. The blinking cursor in `MessageBubble` is shown while `message.streaming === true` and disappears -when `done` is received. +when the done event is received. Model name and token count from the done +event are stored in `useChat` state and displayed in the InfoPanel. -## Model Selector +## Dynamic Model Selector -Available models are defined in `InfoPanel.jsx`: +Available models are fetched from `GET /models` on mount via the `useModels` hook. +The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them +with the server response on success. If the fetch fails, the fallback list is used +silently — a warning is logged to the console. -| Label | Value | -|---|---| -| Companion | `companion:latest` | -| Mistral Nemo | `mistral-nemo:latest` | -| Coder | `coder:latest` | -| Qwen 2.5 Coder 14B | `qwen2.5-coder:14b` | +```js +// constants.js +export const FALLBACK_MODELS = [ + { value: 'companion:latest', label: 'Companion' }, + // ... +]; +``` -The selected model is passed with every chat request. To add a new model, -update the `MODELS` array in `InfoPanel.jsx`. +The selected model is passed with every chat request. To add a model, update +`models.json` on the main PC — no client rebuild needed. ## Session Management -Sessions are identified by a `external_id` — a human-readable string or UUID -generated client-side. New sessions are created locally with `uuid` and auto-registered -in the memory service on the first message. The session list refreshes after each -completed response to surface newly created sessions. \ No newline at end of file +Sessions are identified by `external_id` — a UUID generated client-side via the +`uuid` package. New sessions are created locally and auto-registered in the memory +service on the first message. The session list refreshes after each completed +response to surface newly created sessions. + +### Session Actions + +The session list supports rename and delete: + +- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons on the session row +- **Right-click** — opens a context menu with the same actions + +Rename opens a `SessionModal` dialog. The modal is designed to expand into a full +session settings panel in future — the title is already "Session Settings" to +reflect this intent. + +Delete is immediate with no confirmation dialog (planned for a future update). + +Actions are disabled on unsaved (new) sessions that haven't had a message sent yet. + +### Context Menu + +Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and +attaches a `window` click listener to dismiss on any outside click. Rendered +outside the sidebar div (via React fragment) to avoid being clipped by +`overflow: hidden`. \ No newline at end of file diff --git a/docs/services/inference-service.md b/docs/services/inference-service.md index 2f96a09..49d668c 100644 --- a/docs/services/inference-service.md +++ b/docs/services/inference-service.md @@ -2,7 +2,7 @@ **Package:** `@nexusai/inference-service` **Location:** `packages/inference-service` -**Deployed on:** Main PC +**Deployed on:** Main PC (192.168.0.79) **Port:** 3001 ## Purpose @@ -15,7 +15,7 @@ to switch inference backends without changes to the rest of the system. ## Dependencies - `express` — HTTP API -- `ollama` — Ollama client (used by the Ollama provider) +- `ollama` — Ollama client (used by the Ollama provider, kept as fallback) - `dotenv` — environment variable loading - `@nexusai/shared` — shared utilities @@ -24,9 +24,13 @@ to switch inference backends without changes to the rest of the system. | Variable | Required | Default | Description | |---|---|---|---| | PORT | No | 3001 | Port to listen on | -| INFERENCE_PROVIDER | No | ollama | Active inference provider (ollama, llamacpp) | -| INFERENCE_URL | No | http://localhost:11434 | URL of the inference runtime | -| DEFAULT_MODEL | No | llama3.2 | Default model name passed to the provider | +| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) | +| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime | +| DEFAULT_MODEL | No | local-model | Default model name passed to the provider | + +> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this +> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to +> reach this service on port 3001. ## Provider Architecture @@ -39,14 +43,87 @@ signatures, so the rest of the service is unaware of which backend is active. | Provider | Value | Runtime | |---|---|---| -| Ollama | `ollama` | Ollama via the `ollama` npm package | -| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) | +| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** | +| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback | -Switching providers requires only a `.env` change — no code modifications needed. +Switching providers requires only a `.env` change — no code modifications needed: +``` INFERENCE_PROVIDER=llamacpp INFERENCE_URL=http://localhost:8080 +``` + +### Provider Validation + +The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately +if an unknown value is set — prevents silent misconfiguration: +``` +Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp +``` + +## llama.cpp Provider + +The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`. + +### Starting llama-server + +`llama-server` must be started manually on the main PC before the inference service +can handle requests. It loads a single model at startup: + +```powershell +.\llama-gpu\llama-server.exe ` + -m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf ` + -ngl 99 ` + --reasoning off ` + --host 0.0.0.0 ` + --port 8080 ` + -c 64000 +``` + +Key flags: + +| Flag | Description | +|---|---| +| `-m` | Path to the `.gguf` model file | +| `-ngl 99` | Offload as many layers as possible to GPU | +| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models | +| `--host 0.0.0.0` | Allows connections from other machines on the LAN | +| `--port 8080` | Port for the llama-server HTTP API | +| `-c 64000` | Context window size in tokens | + +> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds, +> reduce this value. The NexusAI memory architecture handles context injection +> so a smaller window (6–8K) is often sufficient. + +### Model Naming + +The model name sent in API requests must match the name as reported by +`llama-server` — including the `.gguf` extension. The reported name can be +verified with: + +```powershell +Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models" +``` + +Set `DEFAULT_MODEL` in `.env` to the exact reported name: +``` +DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf +``` + +### Inference Parameters + +The llamacpp provider maps NexusAI options to OpenAI-compatible fields: + +| NexusAI option | API field | Default | +|---|---|---| +| `temperature` | `temperature` | 0.7 | +| `maxTokens` | `max_tokens` | 1024 | +| `topP` | `top_p` | 0.9 | +| `topK` | `top_k` | 40 | +| `repeatPenalty` | `repeat_penalty` | 1.1 | +| `seed` | `seed` | null (random) | ## Internal Structure +``` src/ ├── providers/ │ ├── ollama.js # Ollama provider — uses ollama npm package @@ -55,6 +132,27 @@ src/ │ └── inference.js # /complete and /complete/stream route handlers ├── infer.js # Provider loader — selects and re-exports active provider └── index.js # Express app + route definitions +``` + +## Streaming Response Format + +The llama.cpp provider yields chunks in this shape: +```js +{ response: "token text", done: false } +// final chunk: +{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 } +``` + +The inference route re-emits these as SSE events: +``` +data: {"response":"token text"} +data: {"done":true,"model":"model-name.gguf","tokenCount":42} +data: [DONE] +``` + +`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop` +chunk (`usage.completion_tokens`) and emitted on the done event so the +orchestration layer can forward them to the client. ## Endpoints @@ -79,7 +177,7 @@ Request body: ```json { "prompt": "What is the capital of France?", - "model": "companion:latest", + "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "temperature": 0.7, "maxTokens": 1024 } @@ -93,33 +191,26 @@ Response: ```json { "text": "The capital of France is Paris.", - "model": "companion:latest", + "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "done": true, "evalCount": 8, "promptEvalCount": 41 } ``` -| Field | Description | -|---|---| -| `text` | The model's response | -| `model` | Model name as reported by the provider | -| `done` | Whether generation completed normally | -| `evalCount` | Number of tokens generated | -| `promptEvalCount` | Number of tokens in the prompt | - --- **POST /complete/stream** -Same request body as `/complete` (`maxTokens` not applicable for streaming). +Same request body as `/complete`. -Response is a stream of Server-Sent Events. Each event contains a partial -response chunk as JSON. The stream closes with a final `data: [DONE]` event. -data: {"model":"companion:latest","response":"The","done":false} -data: {"model":"companion:latest","response":" capital","done":false} -data: {"model":"companion:latest","response":" of France is Paris.","done":false} +Response is a stream of Server-Sent Events: +``` +data: {"response":"The"} +data: {"response":" capital of France is Paris."} +data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8} data: [DONE] +``` -Clients should read the `response` field from each chunk and accumulate -them to build the full response string. \ No newline at end of file +Clients should accumulate `response` fields to build the full response string. +The `done` event carries `model` and `tokenCount` for display in the UI. \ No newline at end of file diff --git a/docs/services/memory-service.md b/docs/services/memory-service.md index c1f152b..b2ae6f5 100644 --- a/docs/services/memory-service.md +++ b/docs/services/memory-service.md @@ -34,7 +34,7 @@ service to generate and store a vector in Qdrant. ``` src/ ├── db/ -│ ├── index.js # SQLite connection + initialization +│ ├── index.js # SQLite connection + initialization + migrations │ └── schema.js # Table definitions, indexes, FTS5, triggers ├── episodic/ │ └── index.js # Session + episode CRUD, FTS search, embedding write path @@ -49,12 +49,29 @@ src/ Five core tables: -- **sessions** — top-level conversation containers, identified by an `external_id` +- **sessions** — top-level conversation containers, identified by an `external_id` and optional `name` - **episodes** — individual exchanges (user message + AI response) tied to a session - **entities** — named things the system learns about (people, places, concepts) - **relationships** — directional labeled links between entities - **summaries** — condensed episode groups for efficient context retrieval +### Migrations + +Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied +as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore +already-applied changes: + +```js +try { + db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); +} catch { + // Column already exists — safe to ignore on subsequent startups +} +``` + +Current migrations: +- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions + ### FTS5 Full-Text Search An `episodes_fts` virtual table enables keyword search across all episodes. @@ -144,9 +161,14 @@ Entities and relationships are stored in SQLite with two key constraints: | Method | Path | Description | |---|---|---| | POST | /sessions | Create a new session | +| GET | /sessions | Get paginated list of all sessions | | GET | /sessions/:id | Get session by internal ID | | GET | /sessions/by-external/:externalId | Get session by external ID | -| DELETE | /sessions/:id | Delete session (cascades to episodes + summaries) | +| PATCH | /sessions/by-external/:externalId | Update session name | +| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) | + +> Route ordering matters in Express: `by-external/:externalId` must be defined before +> `/:id` to prevent the literal string `by-external` being captured as an ID parameter. **POST /sessions body:** ```json @@ -156,6 +178,20 @@ Entities and relationships are stored in SQLite with two key constraints: } ``` +**PATCH /sessions/by-external/:externalId body:** +```json +{ + "name": "My Renamed Session" +} +``` + +Returns the updated session object. `name` is required and must be non-empty. + +**DELETE /sessions/by-external/:externalId** + +Returns `204 No Content` on success. Cascades to delete all associated episodes +and summaries via SQLite `ON DELETE CASCADE`. + ### Episodes | Method | Path | Description | diff --git a/docs/services/orchestration-service.md b/docs/services/orchestration-service.md index 56c796a..5346492 100644 --- a/docs/services/orchestration-service.md +++ b/docs/services/orchestration-service.md @@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration. ## Dependencies -- `express` : HTTP API -- `cors` : cross-origin resource sharing middleware -- `node-fetch` : inter-service HTTP communication (memory service client only) -- `dotenv` : environment variable loading -- `@nexusai/shared` : shared utilities - -> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other -> service clients use Node.js built-in `fetch`. +- `express` — HTTP API +- `cors` — cross-origin resource sharing middleware +- `dotenv` — environment variable loading +- `@nexusai/shared` — shared utilities ## Environment Variables @@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration. | INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL | | QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search | | CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests | +| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file | ## Internal Structure ``` @@ -46,7 +43,8 @@ src/ │ └── index.js # Core pipeline logic — context assembly and coordination ├── routes/ │ ├── chat.js # POST /chat and POST /chat/stream route handlers -│ └── sessions.js # GET /sessions/:sessionId/history route handler +│ ├── sessions.js # Session list, history, rename, and delete routes +│ └── models.js # GET /models — reads models.json manifest from disk └── index.js # Express app entry point ``` @@ -65,7 +63,7 @@ the client. UUID for new conversations and pass it directly — no pre-creation step needed. 2. **Recent episode retrieval** — fetches the most recent episodes for the session - (default: 10) from the memory service. + (default: 5) from the memory service. 3. **Semantic search** — embeds the user message via the embedding service, then queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75). @@ -89,37 +87,68 @@ the client. count to the client. ## Prompt Structure +``` [System prompt] + Here are some relevant memories from earlier conversations: User: {past user message} Assistant: {past ai response} ... (up to 5 semantic episodes) -Here is the recent conversation history: +--- +Here are some relevant memories from your past conversations: User: {past user message} Assistant: {past ai response} -... (up to 10 recent episodes) ---- End of memories --- +... (up to 5 recent episodes) +--- End of recent memories --- + User: {current message} Assistant: +``` Semantic episodes appear before recent episodes so the model encounters long-range relevant context before the immediate conversation flow. ## SSE Stream Format -The inference service emits chunks in this format: -data: {"model":"companion:latest","response":"Hello","done":false} -data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...} +The inference service emits chunks from the llama.cpp provider in this format: +``` +data: {"response":"Hello","done":false} +data: {"response":"!","done":false} +data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42} data: [DONE] +``` The orchestration service re-emits to the client as: +``` data: {"text":"Hello"} data: {"text":"!"} -data: {"done":true} +data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42} +``` The `[DONE]` sentinel from the inference service is consumed internally and not forwarded. The client stream is terminated by `res.end()` after -the `{"done":true}` event. +the done event. Model name and token count are included on the done event +so the client can display them in the UI. + +## Models Manifest + +The `/models` endpoint reads a `models.json` file from disk at the path +specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside +the model files, and is accessible to orchestration via a network share +mounted at `/mnt/nexus-models`. + +The manifest is read fresh on each request — no restart needed when models +are added or removed. + +**models.json format:** +```json +[ + { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" } +] +``` + +- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension) +- `label` — display name shown in the UI ## Endpoints @@ -142,6 +171,14 @@ the `{"done":true}` event. |---|---|---| | GET | /sessions | Get paginated list of all sessions | | GET | /sessions/:sessionId/history | Get paginated episode history for a session | +| PATCH | /sessions/:sessionId | Rename a session | +| DELETE | /sessions/:sessionId | Delete a session and all its episodes | + +### Models + +| Method | Path | Description | +|---|---|---| +| GET | /models | Get list of available models from manifest file | --- @@ -152,7 +189,7 @@ Request body: { "sessionId": "your-session-uuid", "message": "Hello, my name is Tim.", - "model": "companion:latest", + "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "temperature": 0.7 } ``` @@ -165,7 +202,7 @@ Response: { "sessionId": "your-session-uuid", "response": "Hello Tim! How can I help you today?", - "model": "companion:latest", + "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "tokenCount": 87 } ``` @@ -176,23 +213,34 @@ Response: Same request body as `POST /chat`. -Response is a stream of Server-Sent Events. Each event contains a text -delta. The stream ends with a `done` event. +Response is a stream of Server-Sent Events: +``` data: {"text":"Hello"} data: {"text":" Tim"} -data: {"text":"!"} -data: {"done":true} +data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87} +``` -Clients should read the `text` field from each chunk and accumulate them -to build the full response string. The connection is closed by the server -after the `{"done":true}` event. +--- + +**PATCH /sessions/:sessionId** + +Request body: +```json +{ "name": "My Renamed Session" } +``` + +Returns the updated session object. `name` is required and trimmed of whitespace. + +--- + +**DELETE /sessions/:sessionId** + +Returns `204 No Content`. Cascades to delete all episodes for the session. --- **GET /sessions/:sessionId/history** -Returns paginated episode history for a session identified by its external ID. - Query parameters: | Parameter | Default | Description | @@ -218,30 +266,17 @@ Response: } ``` +Episodes are ordered newest first. + --- -**GET /sessions** +**GET /models** -Returns a paginated list of all sessions, ordered by most recently active. - -Query parameters: - -| Parameter | Default | Description | -|---|---|---| -| limit | 20 | Maximum number of sessions to return | -| offset | 0 | Number of sessions to skip (for pagination) | - -Response: +Returns the parsed contents of `models.json`: ```json [ - { - "id": 1, - "external_id": "test-semantic", - "metadata": null, - "created_at": 1712345678, - "updated_at": 1712345999 - } + { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" } ] ``` -Episodes are ordered newest first. Returns `404` if the session does not exist. \ No newline at end of file +Returns `500` if the manifest file cannot be read or parsed. \ No newline at end of file diff --git a/docs/services/shared.md b/docs/services/shared.md index 098f066..3ffbd34 100644 --- a/docs/services/shared.md +++ b/docs/services/shared.md @@ -24,13 +24,40 @@ const DB = getEnv('SQLITE_PATH'); // required — throws if missing --- +### `parseRow(row)` + +Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields +into plain objects. Returns `null` if the row is `null` or `undefined`. + +```js +const { parseRow } = require('@nexusai/shared'); +const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id)); +``` + +--- + +### `formatEpisodeText(userMessage, aiResponse)` + +Combines a user message and AI response into the canonical text format used +for embedding: + +``` +User: {userMessage} +Assistant: {aiResponse} +``` + +Used by the memory service's embedding write path to ensure consistent +vector representations across all episodes. + +--- + ### Constants Tuneable values and shared identifiers are centralised in `constants.js` rather than hardcoded across services. Import the relevant group by name. ```js -const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared'); +const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared'); ``` #### `QDRANT` @@ -40,15 +67,14 @@ embedding model and Qdrant collection setup. | Key | Value | Description | |---|---|---| -| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set | +| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL | | `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` | | `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections | | `DEFAULT_LIMIT` | `10` | Default top-k for vector searches | #### `COLLECTIONS` -Canonical Qdrant collection names. Used by both the semantic layer and -any service that constructs Qdrant queries directly. +Canonical Qdrant collection names. | Key | Value | |---|---| @@ -65,6 +91,8 @@ Default pagination and result limits for SQLite episode queries. | `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve | | `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries | | `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return | +| `DEFAULT_OFFSET` | `0` | Default pagination offset | +| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return | #### `SERVICES` @@ -73,4 +101,76 @@ when the corresponding environment variable is not set. | Key | Value | Description | |---|---|---| -| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL | \ No newline at end of file +| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL | +| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL | +| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL | + +#### `PORTS` + +Default port numbers for each service. + +| Key | Value | +|---|---| +| `INFERENCE` | `'3001'` | +| `MEMORY` | `'3002'` | +| `EMBEDDING` | `'3003'` | +| `ORCHESTRATION` | `'4000'` | + +#### `OLLAMA` + +Ollama runtime defaults — used by the Ollama inference provider. + +| Key | Value | Description | +|---|---|---| +| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL | +| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model | +| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model | + +#### `LLAMACPP` + +llama.cpp runtime defaults — used by the llama.cpp inference provider. + +| Key | Value | Description | +|---|---|---| +| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL | +| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) | + +> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model +> name reported by `llama-server` (including `.gguf` extension). The shared +> constant is a last-resort fallback only. + +#### `INFERENCE_DEFAULTS` + +Default inference parameters applied when not specified in a request. + +| Key | Value | Description | +|---|---|---| +| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) | +| `MAX_TOKENS` | `1024` | Maximum tokens to generate | +| `TOP_P` | `0.9` | Nucleus sampling probability mass | +| `TOP_K` | `40` | Top-K candidates at each step | +| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens | +| `SEED` | `null` | null = random; set integer for reproducible outputs | + +#### `ORCHESTRATION` + +Orchestration pipeline defaults. + +| Key | Value | Description | +|---|---|---| +| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt | +| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt | +| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results | +| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin | +| `SYSTEM_PROMPT` | *(see below)* | Default system prompt | + +Default system prompt: +> "You are a helpful, context-aware AI assistant. You have access to memories +> of past conversations with the user. Use them to provide consistent, +> personalised responses." + +#### `SQLITE` + +| Key | Value | Description | +|---|---|---| +| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path | \ No newline at end of file