updated documentation

This commit is contained in:
Storme-bit
2026-04-13 03:42:14 -07:00
parent 5f024093d1
commit 045da0d7f4
5 changed files with 464 additions and 112 deletions

View File

@@ -27,33 +27,46 @@ npm run dev # local dev server on port 5173
Vite bakes environment variables into the bundle at build time. The `.env`
file is only needed on the machine running the build, not where files are served.
After building, copy `dist/` contents to `/srv/nexusai` on Mini PC 2 for Caddy to serve.
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Empty string uses Vite proxy in dev, Caddy proxy in production. |
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Must be set to the HTTPS domain in production to avoid mixed content errors. |
Production value:
```
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
```
## Internal Structure
```
src/
├── api/
│ └── orchestration.js # All fetch calls to the orchestration service
├── config/
│ └── constants.js # FALLBACK_MODELS, DEFAULT_MODEL, API_DEFAULTS
├── hooks/
│ ├── useSession.js # Session list, history loading, active session state
── useChat.js # Message sending, SSE streaming, message state
── useChat.js # Message sending, SSE streaming, message state
│ ├── useModels.js # Dynamic model list fetched from /models endpoint
│ └── useContextMenu.js # Right-click context menu position and visibility
├── components/
│ ├── App.jsx # Root component — layout and shared state
│ ├── SessionList.jsx # Left sidebar — session list and new chat button
│ ├── SessionList.jsx # Left sidebar — session list, rename, delete
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
── InfoPanel.jsx # Right panel — model selector and session metadata
├── index.css # Global reset and CSS variables
── InfoPanel.jsx # Right panel — model selector and session metadata
│ └── SessionModal.jsx # Modal dialog for session settings (rename)
├── index.css # Global reset, CSS variables, utility classes
└── main.jsx # React entry point
```
## Layout
Three-panel layout with collapsible sidebars:
```
┌─────────────────┬──────────────────────────┬─────────────┐
│ Session List │ Chat Window │ Info Panel │
│ (collapsible) │ │ (collapsible)│
@@ -64,9 +77,54 @@ Three-panel layout with collapsible sidebars:
│ Session 2 │ │ │
│ │ [input bar] │ │
└─────────────────┴──────────────────────────┴─────────────┘
```
On mobile, sidebars collapse to a 56px icon rail. The centre chat window
always fills the remaining space.
Sidebars collapse to a 56px icon rail. The centre chat window always
fills the remaining space.
## CSS Architecture
Styles follow a hybrid approach — CSS utility classes for static reusable
rules, inline styles for dynamic prop-driven values.
### CSS Variables (`:root`)
| Variable | Value | Description |
|---|---|---|
| `--bg-base` | `#0f1117` | Page background |
| `--bg-surface` | `#1a1d27` | Panel backgrounds |
| `--bg-elevated` | `#222536` | Elevated elements (inputs, cards) |
| `--border` | `#2e3150` | Border colour |
| `--accent` | `#6c63ff` | Primary accent (buttons, highlights) |
| `--accent-hover` | `#574fd6` | Accent hover state |
| `--text-primary` | `#e8e8f0` | Primary text |
| `--text-secondary` | `#8b8fa8` | Secondary text |
| `--text-muted` | `#555870` | Muted / placeholder text |
| `--bubble-user` | `#6c63ff` | User message bubble background |
| `--bubble-ai` | `#222536` | AI message bubble background |
| `--sidebar-width` | `280px` | Expanded sidebar width |
| `--panel-width` | `260px` | Expanded info panel width |
| `--header-height` | `56px` | Shared header height across all panels |
| `--radius-sm` | `6px` | Small border radius |
| `--radius-md` | `8px` | Medium border radius |
| `--radius-lg` | `12px` | Large border radius |
### Utility Classes
| Class | Description |
|---|---|
| `.panel-header` | Shared header row — used in all three panels |
| `.btn-reset` | Resets button styles (no border, bg, cursor pointer) |
| `.btn-icon` | Icon button with hover state |
| `.btn-primary` | Accent-coloured action button with `:hover` and `:disabled` states |
| `.flex` / `.flex-col` | Flex layout helpers |
| `.flex-1` / `.flex-shrink` | Flex sizing helpers |
| `.items-center` / `.justify-center` / `.justify-between` | Alignment helpers |
| `.overflow-hidden` / `.scroll-y` | Overflow helpers |
| `.text-xs` / `.text-sm` / `.text-base` | Font size helpers |
| `.text-muted` / `.text-secondary` / `.text-accent` | Colour helpers |
| `.label-upper` | Uppercase section label style |
| `.truncate` | Text overflow ellipsis |
## API Layer
@@ -78,39 +136,71 @@ All orchestration calls are centralised in `src/api/orchestration.js`:
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
| `sendMessage` | POST | /chat | Send message, await full response |
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
| `fetchModels` | GET | /models | Load available models from manifest |
| `renameSession` | PATCH | /sessions/:id | Rename a session |
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
It uses a buffer pattern to handle SSE chunks that may span multiple network packets.
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
## Streaming
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
An empty assistant bubble is appended immediately when the stream opens, then
updated token by token using `updateLastMessage`. The blinking cursor in
`MessageBubble` is shown while `message.streaming === true` and disappears
when `done` is received.
when the done event is received. Model name and token count from the done
event are stored in `useChat` state and displayed in the InfoPanel.
## Model Selector
## Dynamic Model Selector
Available models are defined in `InfoPanel.jsx`:
Available models are fetched from `GET /models` on mount via the `useModels` hook.
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
with the server response on success. If the fetch fails, the fallback list is used
silently — a warning is logged to the console.
| Label | Value |
|---|---|
| Companion | `companion:latest` |
| Mistral Nemo | `mistral-nemo:latest` |
| Coder | `coder:latest` |
| Qwen 2.5 Coder 14B | `qwen2.5-coder:14b` |
```js
// constants.js
export const FALLBACK_MODELS = [
{ value: 'companion:latest', label: 'Companion' },
// ...
];
```
The selected model is passed with every chat request. To add a new model,
update the `MODELS` array in `InfoPanel.jsx`.
The selected model is passed with every chat request. To add a model, update
`models.json` on the main PC — no client rebuild needed.
## Session Management
Sessions are identified by a `external_id` — a human-readable string or UUID
generated client-side. New sessions are created locally with `uuid` and auto-registered
in the memory service on the first message. The session list refreshes after each
completed response to surface newly created sessions.
Sessions are identified by `external_id` — a UUID generated client-side via the
`uuid` package. New sessions are created locally and auto-registered in the memory
service on the first message. The session list refreshes after each completed
response to surface newly created sessions.
### Session Actions
The session list supports rename and delete:
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons on the session row
- **Right-click** — opens a context menu with the same actions
Rename opens a `SessionModal` dialog. The modal is designed to expand into a full
session settings panel in future — the title is already "Session Settings" to
reflect this intent.
Delete is immediate with no confirmation dialog (planned for a future update).
Actions are disabled on unsaved (new) sessions that haven't had a message sent yet.
### Context Menu
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
attaches a `window` click listener to dismiss on any outside click. Rendered
outside the sidebar div (via React fragment) to avoid being clipped by
`overflow: hidden`.

View File

@@ -2,7 +2,7 @@
**Package:** `@nexusai/inference-service`
**Location:** `packages/inference-service`
**Deployed on:** Main PC
**Deployed on:** Main PC (192.168.0.79)
**Port:** 3001
## Purpose
@@ -15,7 +15,7 @@ to switch inference backends without changes to the rest of the system.
## Dependencies
- `express` — HTTP API
- `ollama` — Ollama client (used by the Ollama provider)
- `ollama` — Ollama client (used by the Ollama provider, kept as fallback)
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities
@@ -24,9 +24,13 @@ to switch inference backends without changes to the rest of the system.
| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3001 | Port to listen on |
| INFERENCE_PROVIDER | No | ollama | Active inference provider (ollama, llamacpp) |
| INFERENCE_URL | No | http://localhost:11434 | URL of the inference runtime |
| DEFAULT_MODEL | No | llama3.2 | Default model name passed to the provider |
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
> reach this service on port 3001.
## Provider Architecture
@@ -39,14 +43,87 @@ signatures, so the rest of the service is unaware of which backend is active.
| Provider | Value | Runtime |
|---|---|---|
| Ollama | `ollama` | Ollama via the `ollama` npm package |
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) |
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
Switching providers requires only a `.env` change — no code modifications needed.
Switching providers requires only a `.env` change — no code modifications needed:
```
INFERENCE_PROVIDER=llamacpp
INFERENCE_URL=http://localhost:8080
```
### Provider Validation
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
if an unknown value is set — prevents silent misconfiguration:
```
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
```
## llama.cpp Provider
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
### Starting llama-server
`llama-server` must be started manually on the main PC before the inference service
can handle requests. It loads a single model at startup:
```powershell
.\llama-gpu\llama-server.exe `
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
-ngl 99 `
--reasoning off `
--host 0.0.0.0 `
--port 8080 `
-c 64000
```
Key flags:
| Flag | Description |
|---|---|
| `-m` | Path to the `.gguf` model file |
| `-ngl 99` | Offload as many layers as possible to GPU |
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
| `--port 8080` | Port for the llama-server HTTP API |
| `-c 64000` | Context window size in tokens |
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
> reduce this value. The NexusAI memory architecture handles context injection
> so a smaller window (68K) is often sufficient.
### Model Naming
The model name sent in API requests must match the name as reported by
`llama-server` — including the `.gguf` extension. The reported name can be
verified with:
```powershell
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
```
Set `DEFAULT_MODEL` in `.env` to the exact reported name:
```
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
```
### Inference Parameters
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
| NexusAI option | API field | Default |
|---|---|---|
| `temperature` | `temperature` | 0.7 |
| `maxTokens` | `max_tokens` | 1024 |
| `topP` | `top_p` | 0.9 |
| `topK` | `top_k` | 40 |
| `repeatPenalty` | `repeat_penalty` | 1.1 |
| `seed` | `seed` | null (random) |
## Internal Structure
```
src/
├── providers/
│ ├── ollama.js # Ollama provider — uses ollama npm package
@@ -55,6 +132,27 @@ src/
│ └── inference.js # /complete and /complete/stream route handlers
├── infer.js # Provider loader — selects and re-exports active provider
└── index.js # Express app + route definitions
```
## Streaming Response Format
The llama.cpp provider yields chunks in this shape:
```js
{ response: "token text", done: false }
// final chunk:
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
```
The inference route re-emits these as SSE events:
```
data: {"response":"token text"}
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
data: [DONE]
```
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
chunk (`usage.completion_tokens`) and emitted on the done event so the
orchestration layer can forward them to the client.
## Endpoints
@@ -79,7 +177,7 @@ Request body:
```json
{
"prompt": "What is the capital of France?",
"model": "companion:latest",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
}
@@ -93,33 +191,26 @@ Response:
```json
{
"text": "The capital of France is Paris.",
"model": "companion:latest",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"done": true,
"evalCount": 8,
"promptEvalCount": 41
}
```
| Field | Description |
|---|---|
| `text` | The model's response |
| `model` | Model name as reported by the provider |
| `done` | Whether generation completed normally |
| `evalCount` | Number of tokens generated |
| `promptEvalCount` | Number of tokens in the prompt |
---
**POST /complete/stream**
Same request body as `/complete` (`maxTokens` not applicable for streaming).
Same request body as `/complete`.
Response is a stream of Server-Sent Events. Each event contains a partial
response chunk as JSON. The stream closes with a final `data: [DONE]` event.
data: {"model":"companion:latest","response":"The","done":false}
data: {"model":"companion:latest","response":" capital","done":false}
data: {"model":"companion:latest","response":" of France is Paris.","done":false}
Response is a stream of Server-Sent Events:
```
data: {"response":"The"}
data: {"response":" capital of France is Paris."}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
data: [DONE]
```
Clients should read the `response` field from each chunk and accumulate
them to build the full response string.
Clients should accumulate `response` fields to build the full response string.
The `done` event carries `model` and `tokenCount` for display in the UI.

View File

@@ -34,7 +34,7 @@ service to generate and store a vector in Qdrant.
```
src/
├── db/
│ ├── index.js # SQLite connection + initialization
│ ├── index.js # SQLite connection + initialization + migrations
│ └── schema.js # Table definitions, indexes, FTS5, triggers
├── episodic/
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
@@ -49,12 +49,29 @@ src/
Five core tables:
- **sessions** — top-level conversation containers, identified by an `external_id`
- **sessions** — top-level conversation containers, identified by an `external_id` and optional `name`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval
### Migrations
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
already-applied changes:
```js
try {
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
} catch {
// Column already exists — safe to ignore on subsequent startups
}
```
Current migrations:
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
### FTS5 Full-Text Search
An `episodes_fts` virtual table enables keyword search across all episodes.
@@ -144,9 +161,14 @@ Entities and relationships are stored in SQLite with two key constraints:
| Method | Path | Description |
|---|---|---|
| POST | /sessions | Create a new session |
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:id | Get session by internal ID |
| GET | /sessions/by-external/:externalId | Get session by external ID |
| DELETE | /sessions/:id | Delete session (cascades to episodes + summaries) |
| PATCH | /sessions/by-external/:externalId | Update session name |
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
> Route ordering matters in Express: `by-external/:externalId` must be defined before
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
**POST /sessions body:**
```json
@@ -156,6 +178,20 @@ Entities and relationships are stored in SQLite with two key constraints:
}
```
**PATCH /sessions/by-external/:externalId body:**
```json
{
"name": "My Renamed Session"
}
```
Returns the updated session object. `name` is required and must be non-empty.
**DELETE /sessions/by-external/:externalId**
Returns `204 No Content` on success. Cascades to delete all associated episodes
and summaries via SQLite `ON DELETE CASCADE`.
### Episodes
| Method | Path | Description |

View File

@@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration.
## Dependencies
- `express` : HTTP API
- `cors` : cross-origin resource sharing middleware
- `node-fetch` : inter-service HTTP communication (memory service client only)
- `dotenv` : environment variable loading
- `@nexusai/shared` : shared utilities
> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other
> service clients use Node.js built-in `fetch`.
- `express` HTTP API
- `cors` cross-origin resource sharing middleware
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities
## Environment Variables
@@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration.
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
## Internal Structure
```
@@ -46,7 +43,8 @@ src/
│ └── index.js # Core pipeline logic — context assembly and coordination
├── routes/
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
── sessions.js # GET /sessions/:sessionId/history route handler
── sessions.js # Session list, history, rename, and delete routes
│ └── models.js # GET /models — reads models.json manifest from disk
└── index.js # Express app entry point
```
@@ -65,7 +63,7 @@ the client.
UUID for new conversations and pass it directly — no pre-creation step needed.
2. **Recent episode retrieval** — fetches the most recent episodes for the session
(default: 10) from the memory service.
(default: 5) from the memory service.
3. **Semantic search** — embeds the user message via the embedding service, then
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
@@ -89,37 +87,68 @@ the client.
count to the client.
## Prompt Structure
```
[System prompt]
Here are some relevant memories from earlier conversations:
User: {past user message}
Assistant: {past ai response}
... (up to 5 semantic episodes)
Here is the recent conversation history:
---
Here are some relevant memories from your past conversations:
User: {past user message}
Assistant: {past ai response}
... (up to 10 recent episodes)
--- End of memories ---
... (up to 5 recent episodes)
--- End of recent memories ---
User: {current message}
Assistant:
```
Semantic episodes appear before recent episodes so the model encounters
long-range relevant context before the immediate conversation flow.
## SSE Stream Format
The inference service emits chunks in this format:
data: {"model":"companion:latest","response":"Hello","done":false}
data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...}
The inference service emits chunks from the llama.cpp provider in this format:
```
data: {"response":"Hello","done":false}
data: {"response":"!","done":false}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
data: [DONE]
```
The orchestration service re-emits to the client as:
```
data: {"text":"Hello"}
data: {"text":"!"}
data: {"done":true}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
```
The `[DONE]` sentinel from the inference service is consumed internally
and not forwarded. The client stream is terminated by `res.end()` after
the `{"done":true}` event.
the done event. Model name and token count are included on the done event
so the client can display them in the UI.
## Models Manifest
The `/models` endpoint reads a `models.json` file from disk at the path
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
the model files, and is accessible to orchestration via a network share
mounted at `/mnt/nexus-models`.
The manifest is read fresh on each request — no restart needed when models
are added or removed.
**models.json format:**
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
- `label` — display name shown in the UI
## Endpoints
@@ -142,6 +171,14 @@ the `{"done":true}` event.
|---|---|---|
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
| PATCH | /sessions/:sessionId | Rename a session |
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
### Models
| Method | Path | Description |
|---|---|---|
| GET | /models | Get list of available models from manifest file |
---
@@ -152,7 +189,7 @@ Request body:
{
"sessionId": "your-session-uuid",
"message": "Hello, my name is Tim.",
"model": "companion:latest",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7
}
```
@@ -165,7 +202,7 @@ Response:
{
"sessionId": "your-session-uuid",
"response": "Hello Tim! How can I help you today?",
"model": "companion:latest",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"tokenCount": 87
}
```
@@ -176,23 +213,34 @@ Response:
Same request body as `POST /chat`.
Response is a stream of Server-Sent Events. Each event contains a text
delta. The stream ends with a `done` event.
Response is a stream of Server-Sent Events:
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"text":"!"}
data: {"done":true}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
Clients should read the `text` field from each chunk and accumulate them
to build the full response string. The connection is closed by the server
after the `{"done":true}` event.
---
**PATCH /sessions/:sessionId**
Request body:
```json
{ "name": "My Renamed Session" }
```
Returns the updated session object. `name` is required and trimmed of whitespace.
---
**DELETE /sessions/:sessionId**
Returns `204 No Content`. Cascades to delete all episodes for the session.
---
**GET /sessions/:sessionId/history**
Returns paginated episode history for a session identified by its external ID.
Query parameters:
| Parameter | Default | Description |
@@ -218,30 +266,17 @@ Response:
}
```
Episodes are ordered newest first.
---
**GET /sessions**
**GET /models**
Returns a paginated list of all sessions, ordered by most recently active.
Query parameters:
| Parameter | Default | Description |
|---|---|---|
| limit | 20 | Maximum number of sessions to return |
| offset | 0 | Number of sessions to skip (for pagination) |
Response:
Returns the parsed contents of `models.json`:
```json
[
{
"id": 1,
"external_id": "test-semantic",
"metadata": null,
"created_at": 1712345678,
"updated_at": 1712345999
}
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
Episodes are ordered newest first. Returns `404` if the session does not exist.
Returns `500` if the manifest file cannot be read or parsed.

View File

@@ -24,13 +24,40 @@ const DB = getEnv('SQLITE_PATH'); // required — throws if missing
---
### `parseRow(row)`
Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields
into plain objects. Returns `null` if the row is `null` or `undefined`.
```js
const { parseRow } = require('@nexusai/shared');
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));
```
---
### `formatEpisodeText(userMessage, aiResponse)`
Combines a user message and AI response into the canonical text format used
for embedding:
```
User: {userMessage}
Assistant: {aiResponse}
```
Used by the memory service's embedding write path to ensure consistent
vector representations across all episodes.
---
### Constants
Tuneable values and shared identifiers are centralised in `constants.js`
rather than hardcoded across services. Import the relevant group by name.
```js
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');
```
#### `QDRANT`
@@ -40,15 +67,14 @@ embedding model and Qdrant collection setup.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL |
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
#### `COLLECTIONS`
Canonical Qdrant collection names. Used by both the semantic layer and
any service that constructs Qdrant queries directly.
Canonical Qdrant collection names.
| Key | Value |
|---|---|
@@ -65,6 +91,8 @@ Default pagination and result limits for SQLite episode queries.
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
| `DEFAULT_OFFSET` | `0` | Default pagination offset |
| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return |
#### `SERVICES`
@@ -74,3 +102,75 @@ when the corresponding environment variable is not set.
| Key | Value | Description |
|---|---|---|
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL |
| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL |
#### `PORTS`
Default port numbers for each service.
| Key | Value |
|---|---|
| `INFERENCE` | `'3001'` |
| `MEMORY` | `'3002'` |
| `EMBEDDING` | `'3003'` |
| `ORCHESTRATION` | `'4000'` |
#### `OLLAMA`
Ollama runtime defaults — used by the Ollama inference provider.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL |
| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model |
| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model |
#### `LLAMACPP`
llama.cpp runtime defaults — used by the llama.cpp inference provider.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL |
| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) |
> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model
> name reported by `llama-server` (including `.gguf` extension). The shared
> constant is a last-resort fallback only.
#### `INFERENCE_DEFAULTS`
Default inference parameters applied when not specified in a request.
| Key | Value | Description |
|---|---|---|
| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) |
| `MAX_TOKENS` | `1024` | Maximum tokens to generate |
| `TOP_P` | `0.9` | Nucleus sampling probability mass |
| `TOP_K` | `40` | Top-K candidates at each step |
| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens |
| `SEED` | `null` | null = random; set integer for reproducible outputs |
#### `ORCHESTRATION`
Orchestration pipeline defaults.
| Key | Value | Description |
|---|---|---|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
Default system prompt:
> "You are a helpful, context-aware AI assistant. You have access to memories
> of past conversations with the user. Use them to provide consistent,
> personalised responses."
#### `SQLITE`
| Key | Value | Description |
|---|---|---|
| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path |