updated documentation
This commit is contained in:
@@ -27,33 +27,46 @@ npm run dev # local dev server on port 5173
|
||||
Vite bakes environment variables into the bundle at build time. The `.env`
|
||||
file is only needed on the machine running the build, not where files are served.
|
||||
|
||||
After building, copy `dist/` contents to `/srv/nexusai` on Mini PC 2 for Caddy to serve.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Empty string uses Vite proxy in dev, Caddy proxy in production. |
|
||||
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Must be set to the HTTPS domain in production to avoid mixed content errors. |
|
||||
|
||||
Production value:
|
||||
```
|
||||
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
|
||||
```
|
||||
|
||||
## Internal Structure
|
||||
```
|
||||
src/
|
||||
├── api/
|
||||
│ └── orchestration.js # All fetch calls to the orchestration service
|
||||
├── config/
|
||||
│ └── constants.js # FALLBACK_MODELS, DEFAULT_MODEL, API_DEFAULTS
|
||||
├── hooks/
|
||||
│ ├── useSession.js # Session list, history loading, active session state
|
||||
│ └── useChat.js # Message sending, SSE streaming, message state
|
||||
│ ├── useChat.js # Message sending, SSE streaming, message state
|
||||
│ ├── useModels.js # Dynamic model list fetched from /models endpoint
|
||||
│ └── useContextMenu.js # Right-click context menu position and visibility
|
||||
├── components/
|
||||
│ ├── App.jsx # Root component — layout and shared state
|
||||
│ ├── SessionList.jsx # Left sidebar — session list and new chat button
|
||||
│ ├── SessionList.jsx # Left sidebar — session list, rename, delete
|
||||
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
||||
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
||||
│ └── InfoPanel.jsx # Right panel — model selector and session metadata
|
||||
├── index.css # Global reset and CSS variables
|
||||
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata
|
||||
│ └── SessionModal.jsx # Modal dialog for session settings (rename)
|
||||
├── index.css # Global reset, CSS variables, utility classes
|
||||
└── main.jsx # React entry point
|
||||
```
|
||||
|
||||
## Layout
|
||||
|
||||
Three-panel layout with collapsible sidebars:
|
||||
```
|
||||
┌─────────────────┬──────────────────────────┬─────────────┐
|
||||
│ Session List │ Chat Window │ Info Panel │
|
||||
│ (collapsible) │ │ (collapsible)│
|
||||
@@ -64,9 +77,54 @@ Three-panel layout with collapsible sidebars:
|
||||
│ Session 2 │ │ │
|
||||
│ │ [input bar] │ │
|
||||
└─────────────────┴──────────────────────────┴─────────────┘
|
||||
```
|
||||
|
||||
On mobile, sidebars collapse to a 56px icon rail. The centre chat window
|
||||
always fills the remaining space.
|
||||
Sidebars collapse to a 56px icon rail. The centre chat window always
|
||||
fills the remaining space.
|
||||
|
||||
## CSS Architecture
|
||||
|
||||
Styles follow a hybrid approach — CSS utility classes for static reusable
|
||||
rules, inline styles for dynamic prop-driven values.
|
||||
|
||||
### CSS Variables (`:root`)
|
||||
|
||||
| Variable | Value | Description |
|
||||
|---|---|---|
|
||||
| `--bg-base` | `#0f1117` | Page background |
|
||||
| `--bg-surface` | `#1a1d27` | Panel backgrounds |
|
||||
| `--bg-elevated` | `#222536` | Elevated elements (inputs, cards) |
|
||||
| `--border` | `#2e3150` | Border colour |
|
||||
| `--accent` | `#6c63ff` | Primary accent (buttons, highlights) |
|
||||
| `--accent-hover` | `#574fd6` | Accent hover state |
|
||||
| `--text-primary` | `#e8e8f0` | Primary text |
|
||||
| `--text-secondary` | `#8b8fa8` | Secondary text |
|
||||
| `--text-muted` | `#555870` | Muted / placeholder text |
|
||||
| `--bubble-user` | `#6c63ff` | User message bubble background |
|
||||
| `--bubble-ai` | `#222536` | AI message bubble background |
|
||||
| `--sidebar-width` | `280px` | Expanded sidebar width |
|
||||
| `--panel-width` | `260px` | Expanded info panel width |
|
||||
| `--header-height` | `56px` | Shared header height across all panels |
|
||||
| `--radius-sm` | `6px` | Small border radius |
|
||||
| `--radius-md` | `8px` | Medium border radius |
|
||||
| `--radius-lg` | `12px` | Large border radius |
|
||||
|
||||
### Utility Classes
|
||||
|
||||
| Class | Description |
|
||||
|---|---|
|
||||
| `.panel-header` | Shared header row — used in all three panels |
|
||||
| `.btn-reset` | Resets button styles (no border, bg, cursor pointer) |
|
||||
| `.btn-icon` | Icon button with hover state |
|
||||
| `.btn-primary` | Accent-coloured action button with `:hover` and `:disabled` states |
|
||||
| `.flex` / `.flex-col` | Flex layout helpers |
|
||||
| `.flex-1` / `.flex-shrink` | Flex sizing helpers |
|
||||
| `.items-center` / `.justify-center` / `.justify-between` | Alignment helpers |
|
||||
| `.overflow-hidden` / `.scroll-y` | Overflow helpers |
|
||||
| `.text-xs` / `.text-sm` / `.text-base` | Font size helpers |
|
||||
| `.text-muted` / `.text-secondary` / `.text-accent` | Colour helpers |
|
||||
| `.label-upper` | Uppercase section label style |
|
||||
| `.truncate` | Text overflow ellipsis |
|
||||
|
||||
## API Layer
|
||||
|
||||
@@ -78,39 +136,71 @@ All orchestration calls are centralised in `src/api/orchestration.js`:
|
||||
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
|
||||
| `sendMessage` | POST | /chat | Send message, await full response |
|
||||
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
|
||||
| `fetchModels` | GET | /models | Load available models from manifest |
|
||||
| `renameSession` | PATCH | /sessions/:id | Rename a session |
|
||||
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
|
||||
|
||||
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
|
||||
It uses a buffer pattern to handle SSE chunks that may span multiple network packets.
|
||||
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
|
||||
|
||||
## Streaming
|
||||
|
||||
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
|
||||
```
|
||||
data: {"text":"Hello"}
|
||||
data: {"text":" Tim"}
|
||||
data: {"done":true}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
||||
```
|
||||
|
||||
An empty assistant bubble is appended immediately when the stream opens, then
|
||||
updated token by token using `updateLastMessage`. The blinking cursor in
|
||||
`MessageBubble` is shown while `message.streaming === true` and disappears
|
||||
when `done` is received.
|
||||
when the done event is received. Model name and token count from the done
|
||||
event are stored in `useChat` state and displayed in the InfoPanel.
|
||||
|
||||
## Model Selector
|
||||
## Dynamic Model Selector
|
||||
|
||||
Available models are defined in `InfoPanel.jsx`:
|
||||
Available models are fetched from `GET /models` on mount via the `useModels` hook.
|
||||
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
|
||||
with the server response on success. If the fetch fails, the fallback list is used
|
||||
silently — a warning is logged to the console.
|
||||
|
||||
| Label | Value |
|
||||
|---|---|
|
||||
| Companion | `companion:latest` |
|
||||
| Mistral Nemo | `mistral-nemo:latest` |
|
||||
| Coder | `coder:latest` |
|
||||
| Qwen 2.5 Coder 14B | `qwen2.5-coder:14b` |
|
||||
```js
|
||||
// constants.js
|
||||
export const FALLBACK_MODELS = [
|
||||
{ value: 'companion:latest', label: 'Companion' },
|
||||
// ...
|
||||
];
|
||||
```
|
||||
|
||||
The selected model is passed with every chat request. To add a new model,
|
||||
update the `MODELS` array in `InfoPanel.jsx`.
|
||||
The selected model is passed with every chat request. To add a model, update
|
||||
`models.json` on the main PC — no client rebuild needed.
|
||||
|
||||
## Session Management
|
||||
|
||||
Sessions are identified by a `external_id` — a human-readable string or UUID
|
||||
generated client-side. New sessions are created locally with `uuid` and auto-registered
|
||||
in the memory service on the first message. The session list refreshes after each
|
||||
completed response to surface newly created sessions.
|
||||
Sessions are identified by `external_id` — a UUID generated client-side via the
|
||||
`uuid` package. New sessions are created locally and auto-registered in the memory
|
||||
service on the first message. The session list refreshes after each completed
|
||||
response to surface newly created sessions.
|
||||
|
||||
### Session Actions
|
||||
|
||||
The session list supports rename and delete:
|
||||
|
||||
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons on the session row
|
||||
- **Right-click** — opens a context menu with the same actions
|
||||
|
||||
Rename opens a `SessionModal` dialog. The modal is designed to expand into a full
|
||||
session settings panel in future — the title is already "Session Settings" to
|
||||
reflect this intent.
|
||||
|
||||
Delete is immediate with no confirmation dialog (planned for a future update).
|
||||
|
||||
Actions are disabled on unsaved (new) sessions that haven't had a message sent yet.
|
||||
|
||||
### Context Menu
|
||||
|
||||
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
|
||||
attaches a `window` click listener to dismiss on any outside click. Rendered
|
||||
outside the sidebar div (via React fragment) to avoid being clipped by
|
||||
`overflow: hidden`.
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
**Package:** `@nexusai/inference-service`
|
||||
**Location:** `packages/inference-service`
|
||||
**Deployed on:** Main PC
|
||||
**Deployed on:** Main PC (192.168.0.79)
|
||||
**Port:** 3001
|
||||
|
||||
## Purpose
|
||||
@@ -15,7 +15,7 @@ to switch inference backends without changes to the rest of the system.
|
||||
## Dependencies
|
||||
|
||||
- `express` — HTTP API
|
||||
- `ollama` — Ollama client (used by the Ollama provider)
|
||||
- `ollama` — Ollama client (used by the Ollama provider, kept as fallback)
|
||||
- `dotenv` — environment variable loading
|
||||
- `@nexusai/shared` — shared utilities
|
||||
|
||||
@@ -24,9 +24,13 @@ to switch inference backends without changes to the rest of the system.
|
||||
| Variable | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| PORT | No | 3001 | Port to listen on |
|
||||
| INFERENCE_PROVIDER | No | ollama | Active inference provider (ollama, llamacpp) |
|
||||
| INFERENCE_URL | No | http://localhost:11434 | URL of the inference runtime |
|
||||
| DEFAULT_MODEL | No | llama3.2 | Default model name passed to the provider |
|
||||
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
|
||||
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
|
||||
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
|
||||
|
||||
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
|
||||
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
|
||||
> reach this service on port 3001.
|
||||
|
||||
## Provider Architecture
|
||||
|
||||
@@ -39,14 +43,87 @@ signatures, so the rest of the service is unaware of which backend is active.
|
||||
|
||||
| Provider | Value | Runtime |
|
||||
|---|---|---|
|
||||
| Ollama | `ollama` | Ollama via the `ollama` npm package |
|
||||
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) |
|
||||
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
|
||||
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
|
||||
|
||||
Switching providers requires only a `.env` change — no code modifications needed.
|
||||
Switching providers requires only a `.env` change — no code modifications needed:
|
||||
```
|
||||
INFERENCE_PROVIDER=llamacpp
|
||||
INFERENCE_URL=http://localhost:8080
|
||||
```
|
||||
|
||||
### Provider Validation
|
||||
|
||||
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
|
||||
if an unknown value is set — prevents silent misconfiguration:
|
||||
```
|
||||
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
|
||||
```
|
||||
|
||||
## llama.cpp Provider
|
||||
|
||||
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
|
||||
|
||||
### Starting llama-server
|
||||
|
||||
`llama-server` must be started manually on the main PC before the inference service
|
||||
can handle requests. It loads a single model at startup:
|
||||
|
||||
```powershell
|
||||
.\llama-gpu\llama-server.exe `
|
||||
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
|
||||
-ngl 99 `
|
||||
--reasoning off `
|
||||
--host 0.0.0.0 `
|
||||
--port 8080 `
|
||||
-c 64000
|
||||
```
|
||||
|
||||
Key flags:
|
||||
|
||||
| Flag | Description |
|
||||
|---|---|
|
||||
| `-m` | Path to the `.gguf` model file |
|
||||
| `-ngl 99` | Offload as many layers as possible to GPU |
|
||||
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
|
||||
| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
|
||||
| `--port 8080` | Port for the llama-server HTTP API |
|
||||
| `-c 64000` | Context window size in tokens |
|
||||
|
||||
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
|
||||
> reduce this value. The NexusAI memory architecture handles context injection
|
||||
> so a smaller window (6–8K) is often sufficient.
|
||||
|
||||
### Model Naming
|
||||
|
||||
The model name sent in API requests must match the name as reported by
|
||||
`llama-server` — including the `.gguf` extension. The reported name can be
|
||||
verified with:
|
||||
|
||||
```powershell
|
||||
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
|
||||
```
|
||||
|
||||
Set `DEFAULT_MODEL` in `.env` to the exact reported name:
|
||||
```
|
||||
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
|
||||
```
|
||||
|
||||
### Inference Parameters
|
||||
|
||||
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
|
||||
|
||||
| NexusAI option | API field | Default |
|
||||
|---|---|---|
|
||||
| `temperature` | `temperature` | 0.7 |
|
||||
| `maxTokens` | `max_tokens` | 1024 |
|
||||
| `topP` | `top_p` | 0.9 |
|
||||
| `topK` | `top_k` | 40 |
|
||||
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
||||
| `seed` | `seed` | null (random) |
|
||||
|
||||
## Internal Structure
|
||||
```
|
||||
src/
|
||||
├── providers/
|
||||
│ ├── ollama.js # Ollama provider — uses ollama npm package
|
||||
@@ -55,6 +132,27 @@ src/
|
||||
│ └── inference.js # /complete and /complete/stream route handlers
|
||||
├── infer.js # Provider loader — selects and re-exports active provider
|
||||
└── index.js # Express app + route definitions
|
||||
```
|
||||
|
||||
## Streaming Response Format
|
||||
|
||||
The llama.cpp provider yields chunks in this shape:
|
||||
```js
|
||||
{ response: "token text", done: false }
|
||||
// final chunk:
|
||||
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
|
||||
```
|
||||
|
||||
The inference route re-emits these as SSE events:
|
||||
```
|
||||
data: {"response":"token text"}
|
||||
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
|
||||
chunk (`usage.completion_tokens`) and emitted on the done event so the
|
||||
orchestration layer can forward them to the client.
|
||||
|
||||
## Endpoints
|
||||
|
||||
@@ -79,7 +177,7 @@ Request body:
|
||||
```json
|
||||
{
|
||||
"prompt": "What is the capital of France?",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"temperature": 0.7,
|
||||
"maxTokens": 1024
|
||||
}
|
||||
@@ -93,33 +191,26 @@ Response:
|
||||
```json
|
||||
{
|
||||
"text": "The capital of France is Paris.",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"done": true,
|
||||
"evalCount": 8,
|
||||
"promptEvalCount": 41
|
||||
}
|
||||
```
|
||||
|
||||
| Field | Description |
|
||||
|---|---|
|
||||
| `text` | The model's response |
|
||||
| `model` | Model name as reported by the provider |
|
||||
| `done` | Whether generation completed normally |
|
||||
| `evalCount` | Number of tokens generated |
|
||||
| `promptEvalCount` | Number of tokens in the prompt |
|
||||
|
||||
---
|
||||
|
||||
**POST /complete/stream**
|
||||
|
||||
Same request body as `/complete` (`maxTokens` not applicable for streaming).
|
||||
Same request body as `/complete`.
|
||||
|
||||
Response is a stream of Server-Sent Events. Each event contains a partial
|
||||
response chunk as JSON. The stream closes with a final `data: [DONE]` event.
|
||||
data: {"model":"companion:latest","response":"The","done":false}
|
||||
data: {"model":"companion:latest","response":" capital","done":false}
|
||||
data: {"model":"companion:latest","response":" of France is Paris.","done":false}
|
||||
Response is a stream of Server-Sent Events:
|
||||
```
|
||||
data: {"response":"The"}
|
||||
data: {"response":" capital of France is Paris."}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
Clients should read the `response` field from each chunk and accumulate
|
||||
them to build the full response string.
|
||||
Clients should accumulate `response` fields to build the full response string.
|
||||
The `done` event carries `model` and `tokenCount` for display in the UI.
|
||||
@@ -34,7 +34,7 @@ service to generate and store a vector in Qdrant.
|
||||
```
|
||||
src/
|
||||
├── db/
|
||||
│ ├── index.js # SQLite connection + initialization
|
||||
│ ├── index.js # SQLite connection + initialization + migrations
|
||||
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
||||
├── episodic/
|
||||
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
||||
@@ -49,12 +49,29 @@ src/
|
||||
|
||||
Five core tables:
|
||||
|
||||
- **sessions** — top-level conversation containers, identified by an `external_id`
|
||||
- **sessions** — top-level conversation containers, identified by an `external_id` and optional `name`
|
||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||
- **entities** — named things the system learns about (people, places, concepts)
|
||||
- **relationships** — directional labeled links between entities
|
||||
- **summaries** — condensed episode groups for efficient context retrieval
|
||||
|
||||
### Migrations
|
||||
|
||||
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
|
||||
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
|
||||
already-applied changes:
|
||||
|
||||
```js
|
||||
try {
|
||||
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
|
||||
} catch {
|
||||
// Column already exists — safe to ignore on subsequent startups
|
||||
}
|
||||
```
|
||||
|
||||
Current migrations:
|
||||
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
|
||||
|
||||
### FTS5 Full-Text Search
|
||||
|
||||
An `episodes_fts` virtual table enables keyword search across all episodes.
|
||||
@@ -144,9 +161,14 @@ Entities and relationships are stored in SQLite with two key constraints:
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| POST | /sessions | Create a new session |
|
||||
| GET | /sessions | Get paginated list of all sessions |
|
||||
| GET | /sessions/:id | Get session by internal ID |
|
||||
| GET | /sessions/by-external/:externalId | Get session by external ID |
|
||||
| DELETE | /sessions/:id | Delete session (cascades to episodes + summaries) |
|
||||
| PATCH | /sessions/by-external/:externalId | Update session name |
|
||||
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
|
||||
|
||||
> Route ordering matters in Express: `by-external/:externalId` must be defined before
|
||||
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
|
||||
|
||||
**POST /sessions body:**
|
||||
```json
|
||||
@@ -156,6 +178,20 @@ Entities and relationships are stored in SQLite with two key constraints:
|
||||
}
|
||||
```
|
||||
|
||||
**PATCH /sessions/by-external/:externalId body:**
|
||||
```json
|
||||
{
|
||||
"name": "My Renamed Session"
|
||||
}
|
||||
```
|
||||
|
||||
Returns the updated session object. `name` is required and must be non-empty.
|
||||
|
||||
**DELETE /sessions/by-external/:externalId**
|
||||
|
||||
Returns `204 No Content` on success. Cascades to delete all associated episodes
|
||||
and summaries via SQLite `ON DELETE CASCADE`.
|
||||
|
||||
### Episodes
|
||||
|
||||
| Method | Path | Description |
|
||||
|
||||
@@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `express` : HTTP API
|
||||
- `cors` : cross-origin resource sharing middleware
|
||||
- `node-fetch` : inter-service HTTP communication (memory service client only)
|
||||
- `dotenv` : environment variable loading
|
||||
- `@nexusai/shared` : shared utilities
|
||||
|
||||
> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other
|
||||
> service clients use Node.js built-in `fetch`.
|
||||
- `express` — HTTP API
|
||||
- `cors` — cross-origin resource sharing middleware
|
||||
- `dotenv` — environment variable loading
|
||||
- `@nexusai/shared` — shared utilities
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration.
|
||||
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
||||
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
||||
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
|
||||
|
||||
## Internal Structure
|
||||
```
|
||||
@@ -46,7 +43,8 @@ src/
|
||||
│ └── index.js # Core pipeline logic — context assembly and coordination
|
||||
├── routes/
|
||||
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
|
||||
│ └── sessions.js # GET /sessions/:sessionId/history route handler
|
||||
│ ├── sessions.js # Session list, history, rename, and delete routes
|
||||
│ └── models.js # GET /models — reads models.json manifest from disk
|
||||
└── index.js # Express app entry point
|
||||
```
|
||||
|
||||
@@ -65,7 +63,7 @@ the client.
|
||||
UUID for new conversations and pass it directly — no pre-creation step needed.
|
||||
|
||||
2. **Recent episode retrieval** — fetches the most recent episodes for the session
|
||||
(default: 10) from the memory service.
|
||||
(default: 5) from the memory service.
|
||||
|
||||
3. **Semantic search** — embeds the user message via the embedding service, then
|
||||
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
|
||||
@@ -89,37 +87,68 @@ the client.
|
||||
count to the client.
|
||||
|
||||
## Prompt Structure
|
||||
```
|
||||
[System prompt]
|
||||
|
||||
Here are some relevant memories from earlier conversations:
|
||||
User: {past user message}
|
||||
Assistant: {past ai response}
|
||||
... (up to 5 semantic episodes)
|
||||
Here is the recent conversation history:
|
||||
---
|
||||
Here are some relevant memories from your past conversations:
|
||||
User: {past user message}
|
||||
Assistant: {past ai response}
|
||||
... (up to 10 recent episodes)
|
||||
--- End of memories ---
|
||||
... (up to 5 recent episodes)
|
||||
--- End of recent memories ---
|
||||
|
||||
User: {current message}
|
||||
Assistant:
|
||||
```
|
||||
|
||||
Semantic episodes appear before recent episodes so the model encounters
|
||||
long-range relevant context before the immediate conversation flow.
|
||||
|
||||
## SSE Stream Format
|
||||
|
||||
The inference service emits chunks in this format:
|
||||
data: {"model":"companion:latest","response":"Hello","done":false}
|
||||
data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...}
|
||||
The inference service emits chunks from the llama.cpp provider in this format:
|
||||
```
|
||||
data: {"response":"Hello","done":false}
|
||||
data: {"response":"!","done":false}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||
data: [DONE]
|
||||
```
|
||||
|
||||
The orchestration service re-emits to the client as:
|
||||
```
|
||||
data: {"text":"Hello"}
|
||||
data: {"text":"!"}
|
||||
data: {"done":true}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||
```
|
||||
|
||||
The `[DONE]` sentinel from the inference service is consumed internally
|
||||
and not forwarded. The client stream is terminated by `res.end()` after
|
||||
the `{"done":true}` event.
|
||||
the done event. Model name and token count are included on the done event
|
||||
so the client can display them in the UI.
|
||||
|
||||
## Models Manifest
|
||||
|
||||
The `/models` endpoint reads a `models.json` file from disk at the path
|
||||
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
|
||||
the model files, and is accessible to orchestration via a network share
|
||||
mounted at `/mnt/nexus-models`.
|
||||
|
||||
The manifest is read fresh on each request — no restart needed when models
|
||||
are added or removed.
|
||||
|
||||
**models.json format:**
|
||||
```json
|
||||
[
|
||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||
]
|
||||
```
|
||||
|
||||
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
|
||||
- `label` — display name shown in the UI
|
||||
|
||||
## Endpoints
|
||||
|
||||
@@ -142,6 +171,14 @@ the `{"done":true}` event.
|
||||
|---|---|---|
|
||||
| GET | /sessions | Get paginated list of all sessions |
|
||||
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
|
||||
| PATCH | /sessions/:sessionId | Rename a session |
|
||||
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
|
||||
|
||||
### Models
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /models | Get list of available models from manifest file |
|
||||
|
||||
---
|
||||
|
||||
@@ -152,7 +189,7 @@ Request body:
|
||||
{
|
||||
"sessionId": "your-session-uuid",
|
||||
"message": "Hello, my name is Tim.",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"temperature": 0.7
|
||||
}
|
||||
```
|
||||
@@ -165,7 +202,7 @@ Response:
|
||||
{
|
||||
"sessionId": "your-session-uuid",
|
||||
"response": "Hello Tim! How can I help you today?",
|
||||
"model": "companion:latest",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"tokenCount": 87
|
||||
}
|
||||
```
|
||||
@@ -176,23 +213,34 @@ Response:
|
||||
|
||||
Same request body as `POST /chat`.
|
||||
|
||||
Response is a stream of Server-Sent Events. Each event contains a text
|
||||
delta. The stream ends with a `done` event.
|
||||
Response is a stream of Server-Sent Events:
|
||||
```
|
||||
data: {"text":"Hello"}
|
||||
data: {"text":" Tim"}
|
||||
data: {"text":"!"}
|
||||
data: {"done":true}
|
||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
||||
```
|
||||
|
||||
Clients should read the `text` field from each chunk and accumulate them
|
||||
to build the full response string. The connection is closed by the server
|
||||
after the `{"done":true}` event.
|
||||
---
|
||||
|
||||
**PATCH /sessions/:sessionId**
|
||||
|
||||
Request body:
|
||||
```json
|
||||
{ "name": "My Renamed Session" }
|
||||
```
|
||||
|
||||
Returns the updated session object. `name` is required and trimmed of whitespace.
|
||||
|
||||
---
|
||||
|
||||
**DELETE /sessions/:sessionId**
|
||||
|
||||
Returns `204 No Content`. Cascades to delete all episodes for the session.
|
||||
|
||||
---
|
||||
|
||||
**GET /sessions/:sessionId/history**
|
||||
|
||||
Returns paginated episode history for a session identified by its external ID.
|
||||
|
||||
Query parameters:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
@@ -218,30 +266,17 @@ Response:
|
||||
}
|
||||
```
|
||||
|
||||
Episodes are ordered newest first.
|
||||
|
||||
---
|
||||
|
||||
**GET /sessions**
|
||||
**GET /models**
|
||||
|
||||
Returns a paginated list of all sessions, ordered by most recently active.
|
||||
|
||||
Query parameters:
|
||||
|
||||
| Parameter | Default | Description |
|
||||
|---|---|---|
|
||||
| limit | 20 | Maximum number of sessions to return |
|
||||
| offset | 0 | Number of sessions to skip (for pagination) |
|
||||
|
||||
Response:
|
||||
Returns the parsed contents of `models.json`:
|
||||
```json
|
||||
[
|
||||
{
|
||||
"id": 1,
|
||||
"external_id": "test-semantic",
|
||||
"metadata": null,
|
||||
"created_at": 1712345678,
|
||||
"updated_at": 1712345999
|
||||
}
|
||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||
]
|
||||
```
|
||||
|
||||
Episodes are ordered newest first. Returns `404` if the session does not exist.
|
||||
Returns `500` if the manifest file cannot be read or parsed.
|
||||
@@ -24,13 +24,40 @@ const DB = getEnv('SQLITE_PATH'); // required — throws if missing
|
||||
|
||||
---
|
||||
|
||||
### `parseRow(row)`
|
||||
|
||||
Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields
|
||||
into plain objects. Returns `null` if the row is `null` or `undefined`.
|
||||
|
||||
```js
|
||||
const { parseRow } = require('@nexusai/shared');
|
||||
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### `formatEpisodeText(userMessage, aiResponse)`
|
||||
|
||||
Combines a user message and AI response into the canonical text format used
|
||||
for embedding:
|
||||
|
||||
```
|
||||
User: {userMessage}
|
||||
Assistant: {aiResponse}
|
||||
```
|
||||
|
||||
Used by the memory service's embedding write path to ensure consistent
|
||||
vector representations across all episodes.
|
||||
|
||||
---
|
||||
|
||||
### Constants
|
||||
|
||||
Tuneable values and shared identifiers are centralised in `constants.js`
|
||||
rather than hardcoded across services. Import the relevant group by name.
|
||||
|
||||
```js
|
||||
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
|
||||
const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');
|
||||
```
|
||||
|
||||
#### `QDRANT`
|
||||
@@ -40,15 +67,14 @@ embedding model and Qdrant collection setup.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
|
||||
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL |
|
||||
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
|
||||
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
|
||||
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
|
||||
|
||||
#### `COLLECTIONS`
|
||||
|
||||
Canonical Qdrant collection names. Used by both the semantic layer and
|
||||
any service that constructs Qdrant queries directly.
|
||||
Canonical Qdrant collection names.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
@@ -65,6 +91,8 @@ Default pagination and result limits for SQLite episode queries.
|
||||
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
|
||||
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
|
||||
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
|
||||
| `DEFAULT_OFFSET` | `0` | Default pagination offset |
|
||||
| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return |
|
||||
|
||||
#### `SERVICES`
|
||||
|
||||
@@ -74,3 +102,75 @@ when the corresponding environment variable is not set.
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
|
||||
| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL |
|
||||
| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL |
|
||||
|
||||
#### `PORTS`
|
||||
|
||||
Default port numbers for each service.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| `INFERENCE` | `'3001'` |
|
||||
| `MEMORY` | `'3002'` |
|
||||
| `EMBEDDING` | `'3003'` |
|
||||
| `ORCHESTRATION` | `'4000'` |
|
||||
|
||||
#### `OLLAMA`
|
||||
|
||||
Ollama runtime defaults — used by the Ollama inference provider.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL |
|
||||
| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model |
|
||||
| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model |
|
||||
|
||||
#### `LLAMACPP`
|
||||
|
||||
llama.cpp runtime defaults — used by the llama.cpp inference provider.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL |
|
||||
| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) |
|
||||
|
||||
> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model
|
||||
> name reported by `llama-server` (including `.gguf` extension). The shared
|
||||
> constant is a last-resort fallback only.
|
||||
|
||||
#### `INFERENCE_DEFAULTS`
|
||||
|
||||
Default inference parameters applied when not specified in a request.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) |
|
||||
| `MAX_TOKENS` | `1024` | Maximum tokens to generate |
|
||||
| `TOP_P` | `0.9` | Nucleus sampling probability mass |
|
||||
| `TOP_K` | `40` | Top-K candidates at each step |
|
||||
| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens |
|
||||
| `SEED` | `null` | null = random; set integer for reproducible outputs |
|
||||
|
||||
#### `ORCHESTRATION`
|
||||
|
||||
Orchestration pipeline defaults.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
||||
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
||||
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
||||
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
||||
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
||||
|
||||
Default system prompt:
|
||||
> "You are a helpful, context-aware AI assistant. You have access to memories
|
||||
> of past conversations with the user. Use them to provide consistent,
|
||||
> personalised responses."
|
||||
|
||||
#### `SQLITE`
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path |
|
||||
Reference in New Issue
Block a user