updated documentation
This commit is contained in:
@@ -27,33 +27,46 @@ npm run dev # local dev server on port 5173
|
|||||||
Vite bakes environment variables into the bundle at build time. The `.env`
|
Vite bakes environment variables into the bundle at build time. The `.env`
|
||||||
file is only needed on the machine running the build, not where files are served.
|
file is only needed on the machine running the build, not where files are served.
|
||||||
|
|
||||||
|
After building, copy `dist/` contents to `/srv/nexusai` on Mini PC 2 for Caddy to serve.
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
| Variable | Required | Default | Description |
|
| Variable | Required | Default | Description |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Empty string uses Vite proxy in dev, Caddy proxy in production. |
|
| VITE_ORCHESTRATION_URL | No | `''` (empty) | Orchestration base URL. Must be set to the HTTPS domain in production to avoid mixed content errors. |
|
||||||
|
|
||||||
|
Production value:
|
||||||
|
```
|
||||||
|
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
|
||||||
|
```
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
```
|
```
|
||||||
src/
|
src/
|
||||||
├── api/
|
├── api/
|
||||||
│ └── orchestration.js # All fetch calls to the orchestration service
|
│ └── orchestration.js # All fetch calls to the orchestration service
|
||||||
|
├── config/
|
||||||
|
│ └── constants.js # FALLBACK_MODELS, DEFAULT_MODEL, API_DEFAULTS
|
||||||
├── hooks/
|
├── hooks/
|
||||||
│ ├── useSession.js # Session list, history loading, active session state
|
│ ├── useSession.js # Session list, history loading, active session state
|
||||||
│ └── useChat.js # Message sending, SSE streaming, message state
|
│ ├── useChat.js # Message sending, SSE streaming, message state
|
||||||
|
│ ├── useModels.js # Dynamic model list fetched from /models endpoint
|
||||||
|
│ └── useContextMenu.js # Right-click context menu position and visibility
|
||||||
├── components/
|
├── components/
|
||||||
│ ├── App.jsx # Root component — layout and shared state
|
│ ├── App.jsx # Root component — layout and shared state
|
||||||
│ ├── SessionList.jsx # Left sidebar — session list and new chat button
|
│ ├── SessionList.jsx # Left sidebar — session list, rename, delete
|
||||||
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
||||||
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
||||||
│ └── InfoPanel.jsx # Right panel — model selector and session metadata
|
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata
|
||||||
├── index.css # Global reset and CSS variables
|
│ └── SessionModal.jsx # Modal dialog for session settings (rename)
|
||||||
└── main.jsx # React entry point
|
├── index.css # Global reset, CSS variables, utility classes
|
||||||
|
└── main.jsx # React entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
Three-panel layout with collapsible sidebars:
|
Three-panel layout with collapsible sidebars:
|
||||||
|
```
|
||||||
┌─────────────────┬──────────────────────────┬─────────────┐
|
┌─────────────────┬──────────────────────────┬─────────────┐
|
||||||
│ Session List │ Chat Window │ Info Panel │
|
│ Session List │ Chat Window │ Info Panel │
|
||||||
│ (collapsible) │ │ (collapsible)│
|
│ (collapsible) │ │ (collapsible)│
|
||||||
@@ -64,9 +77,54 @@ Three-panel layout with collapsible sidebars:
|
|||||||
│ Session 2 │ │ │
|
│ Session 2 │ │ │
|
||||||
│ │ [input bar] │ │
|
│ │ [input bar] │ │
|
||||||
└─────────────────┴──────────────────────────┴─────────────┘
|
└─────────────────┴──────────────────────────┴─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
On mobile, sidebars collapse to a 56px icon rail. The centre chat window
|
Sidebars collapse to a 56px icon rail. The centre chat window always
|
||||||
always fills the remaining space.
|
fills the remaining space.
|
||||||
|
|
||||||
|
## CSS Architecture
|
||||||
|
|
||||||
|
Styles follow a hybrid approach — CSS utility classes for static reusable
|
||||||
|
rules, inline styles for dynamic prop-driven values.
|
||||||
|
|
||||||
|
### CSS Variables (`:root`)
|
||||||
|
|
||||||
|
| Variable | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `--bg-base` | `#0f1117` | Page background |
|
||||||
|
| `--bg-surface` | `#1a1d27` | Panel backgrounds |
|
||||||
|
| `--bg-elevated` | `#222536` | Elevated elements (inputs, cards) |
|
||||||
|
| `--border` | `#2e3150` | Border colour |
|
||||||
|
| `--accent` | `#6c63ff` | Primary accent (buttons, highlights) |
|
||||||
|
| `--accent-hover` | `#574fd6` | Accent hover state |
|
||||||
|
| `--text-primary` | `#e8e8f0` | Primary text |
|
||||||
|
| `--text-secondary` | `#8b8fa8` | Secondary text |
|
||||||
|
| `--text-muted` | `#555870` | Muted / placeholder text |
|
||||||
|
| `--bubble-user` | `#6c63ff` | User message bubble background |
|
||||||
|
| `--bubble-ai` | `#222536` | AI message bubble background |
|
||||||
|
| `--sidebar-width` | `280px` | Expanded sidebar width |
|
||||||
|
| `--panel-width` | `260px` | Expanded info panel width |
|
||||||
|
| `--header-height` | `56px` | Shared header height across all panels |
|
||||||
|
| `--radius-sm` | `6px` | Small border radius |
|
||||||
|
| `--radius-md` | `8px` | Medium border radius |
|
||||||
|
| `--radius-lg` | `12px` | Large border radius |
|
||||||
|
|
||||||
|
### Utility Classes
|
||||||
|
|
||||||
|
| Class | Description |
|
||||||
|
|---|---|
|
||||||
|
| `.panel-header` | Shared header row — used in all three panels |
|
||||||
|
| `.btn-reset` | Resets button styles (no border, bg, cursor pointer) |
|
||||||
|
| `.btn-icon` | Icon button with hover state |
|
||||||
|
| `.btn-primary` | Accent-coloured action button with `:hover` and `:disabled` states |
|
||||||
|
| `.flex` / `.flex-col` | Flex layout helpers |
|
||||||
|
| `.flex-1` / `.flex-shrink` | Flex sizing helpers |
|
||||||
|
| `.items-center` / `.justify-center` / `.justify-between` | Alignment helpers |
|
||||||
|
| `.overflow-hidden` / `.scroll-y` | Overflow helpers |
|
||||||
|
| `.text-xs` / `.text-sm` / `.text-base` | Font size helpers |
|
||||||
|
| `.text-muted` / `.text-secondary` / `.text-accent` | Colour helpers |
|
||||||
|
| `.label-upper` | Uppercase section label style |
|
||||||
|
| `.truncate` | Text overflow ellipsis |
|
||||||
|
|
||||||
## API Layer
|
## API Layer
|
||||||
|
|
||||||
@@ -78,39 +136,71 @@ All orchestration calls are centralised in `src/api/orchestration.js`:
|
|||||||
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
|
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
|
||||||
| `sendMessage` | POST | /chat | Send message, await full response |
|
| `sendMessage` | POST | /chat | Send message, await full response |
|
||||||
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
|
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
|
||||||
|
| `fetchModels` | GET | /models | Load available models from manifest |
|
||||||
|
| `renameSession` | PATCH | /sessions/:id | Rename a session |
|
||||||
|
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
|
||||||
|
|
||||||
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
|
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
|
||||||
It uses a buffer pattern to handle SSE chunks that may span multiple network packets.
|
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
|
||||||
|
|
||||||
## Streaming
|
## Streaming
|
||||||
|
|
||||||
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
|
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
|
||||||
|
```
|
||||||
data: {"text":"Hello"}
|
data: {"text":"Hello"}
|
||||||
data: {"text":" Tim"}
|
data: {"text":" Tim"}
|
||||||
data: {"done":true}
|
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
||||||
|
```
|
||||||
|
|
||||||
An empty assistant bubble is appended immediately when the stream opens, then
|
An empty assistant bubble is appended immediately when the stream opens, then
|
||||||
updated token by token using `updateLastMessage`. The blinking cursor in
|
updated token by token using `updateLastMessage`. The blinking cursor in
|
||||||
`MessageBubble` is shown while `message.streaming === true` and disappears
|
`MessageBubble` is shown while `message.streaming === true` and disappears
|
||||||
when `done` is received.
|
when the done event is received. Model name and token count from the done
|
||||||
|
event are stored in `useChat` state and displayed in the InfoPanel.
|
||||||
|
|
||||||
## Model Selector
|
## Dynamic Model Selector
|
||||||
|
|
||||||
Available models are defined in `InfoPanel.jsx`:
|
Available models are fetched from `GET /models` on mount via the `useModels` hook.
|
||||||
|
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
|
||||||
|
with the server response on success. If the fetch fails, the fallback list is used
|
||||||
|
silently — a warning is logged to the console.
|
||||||
|
|
||||||
| Label | Value |
|
```js
|
||||||
|---|---|
|
// constants.js
|
||||||
| Companion | `companion:latest` |
|
export const FALLBACK_MODELS = [
|
||||||
| Mistral Nemo | `mistral-nemo:latest` |
|
{ value: 'companion:latest', label: 'Companion' },
|
||||||
| Coder | `coder:latest` |
|
// ...
|
||||||
| Qwen 2.5 Coder 14B | `qwen2.5-coder:14b` |
|
];
|
||||||
|
```
|
||||||
|
|
||||||
The selected model is passed with every chat request. To add a new model,
|
The selected model is passed with every chat request. To add a model, update
|
||||||
update the `MODELS` array in `InfoPanel.jsx`.
|
`models.json` on the main PC — no client rebuild needed.
|
||||||
|
|
||||||
## Session Management
|
## Session Management
|
||||||
|
|
||||||
Sessions are identified by a `external_id` — a human-readable string or UUID
|
Sessions are identified by `external_id` — a UUID generated client-side via the
|
||||||
generated client-side. New sessions are created locally with `uuid` and auto-registered
|
`uuid` package. New sessions are created locally and auto-registered in the memory
|
||||||
in the memory service on the first message. The session list refreshes after each
|
service on the first message. The session list refreshes after each completed
|
||||||
completed response to surface newly created sessions.
|
response to surface newly created sessions.
|
||||||
|
|
||||||
|
### Session Actions
|
||||||
|
|
||||||
|
The session list supports rename and delete:
|
||||||
|
|
||||||
|
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons on the session row
|
||||||
|
- **Right-click** — opens a context menu with the same actions
|
||||||
|
|
||||||
|
Rename opens a `SessionModal` dialog. The modal is designed to expand into a full
|
||||||
|
session settings panel in future — the title is already "Session Settings" to
|
||||||
|
reflect this intent.
|
||||||
|
|
||||||
|
Delete is immediate with no confirmation dialog (planned for a future update).
|
||||||
|
|
||||||
|
Actions are disabled on unsaved (new) sessions that haven't had a message sent yet.
|
||||||
|
|
||||||
|
### Context Menu
|
||||||
|
|
||||||
|
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
|
||||||
|
attaches a `window` click listener to dismiss on any outside click. Rendered
|
||||||
|
outside the sidebar div (via React fragment) to avoid being clipped by
|
||||||
|
`overflow: hidden`.
|
||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
**Package:** `@nexusai/inference-service`
|
**Package:** `@nexusai/inference-service`
|
||||||
**Location:** `packages/inference-service`
|
**Location:** `packages/inference-service`
|
||||||
**Deployed on:** Main PC
|
**Deployed on:** Main PC (192.168.0.79)
|
||||||
**Port:** 3001
|
**Port:** 3001
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
@@ -15,7 +15,7 @@ to switch inference backends without changes to the rest of the system.
|
|||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `express` — HTTP API
|
- `express` — HTTP API
|
||||||
- `ollama` — Ollama client (used by the Ollama provider)
|
- `ollama` — Ollama client (used by the Ollama provider, kept as fallback)
|
||||||
- `dotenv` — environment variable loading
|
- `dotenv` — environment variable loading
|
||||||
- `@nexusai/shared` — shared utilities
|
- `@nexusai/shared` — shared utilities
|
||||||
|
|
||||||
@@ -24,9 +24,13 @@ to switch inference backends without changes to the rest of the system.
|
|||||||
| Variable | Required | Default | Description |
|
| Variable | Required | Default | Description |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| PORT | No | 3001 | Port to listen on |
|
| PORT | No | 3001 | Port to listen on |
|
||||||
| INFERENCE_PROVIDER | No | ollama | Active inference provider (ollama, llamacpp) |
|
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
|
||||||
| INFERENCE_URL | No | http://localhost:11434 | URL of the inference runtime |
|
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
|
||||||
| DEFAULT_MODEL | No | llama3.2 | Default model name passed to the provider |
|
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
|
||||||
|
|
||||||
|
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
|
||||||
|
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
|
||||||
|
> reach this service on port 3001.
|
||||||
|
|
||||||
## Provider Architecture
|
## Provider Architecture
|
||||||
|
|
||||||
@@ -39,14 +43,87 @@ signatures, so the rest of the service is unaware of which backend is active.
|
|||||||
|
|
||||||
| Provider | Value | Runtime |
|
| Provider | Value | Runtime |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Ollama | `ollama` | Ollama via the `ollama` npm package |
|
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
|
||||||
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) |
|
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
|
||||||
|
|
||||||
Switching providers requires only a `.env` change — no code modifications needed.
|
Switching providers requires only a `.env` change — no code modifications needed:
|
||||||
|
```
|
||||||
INFERENCE_PROVIDER=llamacpp
|
INFERENCE_PROVIDER=llamacpp
|
||||||
INFERENCE_URL=http://localhost:8080
|
INFERENCE_URL=http://localhost:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### Provider Validation
|
||||||
|
|
||||||
|
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
|
||||||
|
if an unknown value is set — prevents silent misconfiguration:
|
||||||
|
```
|
||||||
|
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
|
||||||
|
```
|
||||||
|
|
||||||
|
## llama.cpp Provider
|
||||||
|
|
||||||
|
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
|
||||||
|
|
||||||
|
### Starting llama-server
|
||||||
|
|
||||||
|
`llama-server` must be started manually on the main PC before the inference service
|
||||||
|
can handle requests. It loads a single model at startup:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
.\llama-gpu\llama-server.exe `
|
||||||
|
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
|
||||||
|
-ngl 99 `
|
||||||
|
--reasoning off `
|
||||||
|
--host 0.0.0.0 `
|
||||||
|
--port 8080 `
|
||||||
|
-c 64000
|
||||||
|
```
|
||||||
|
|
||||||
|
Key flags:
|
||||||
|
|
||||||
|
| Flag | Description |
|
||||||
|
|---|---|
|
||||||
|
| `-m` | Path to the `.gguf` model file |
|
||||||
|
| `-ngl 99` | Offload as many layers as possible to GPU |
|
||||||
|
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
|
||||||
|
| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
|
||||||
|
| `--port 8080` | Port for the llama-server HTTP API |
|
||||||
|
| `-c 64000` | Context window size in tokens |
|
||||||
|
|
||||||
|
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
|
||||||
|
> reduce this value. The NexusAI memory architecture handles context injection
|
||||||
|
> so a smaller window (6–8K) is often sufficient.
|
||||||
|
|
||||||
|
### Model Naming
|
||||||
|
|
||||||
|
The model name sent in API requests must match the name as reported by
|
||||||
|
`llama-server` — including the `.gguf` extension. The reported name can be
|
||||||
|
verified with:
|
||||||
|
|
||||||
|
```powershell
|
||||||
|
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
|
||||||
|
```
|
||||||
|
|
||||||
|
Set `DEFAULT_MODEL` in `.env` to the exact reported name:
|
||||||
|
```
|
||||||
|
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
### Inference Parameters
|
||||||
|
|
||||||
|
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
|
||||||
|
|
||||||
|
| NexusAI option | API field | Default |
|
||||||
|
|---|---|---|
|
||||||
|
| `temperature` | `temperature` | 0.7 |
|
||||||
|
| `maxTokens` | `max_tokens` | 1024 |
|
||||||
|
| `topP` | `top_p` | 0.9 |
|
||||||
|
| `topK` | `top_k` | 40 |
|
||||||
|
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
||||||
|
| `seed` | `seed` | null (random) |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
```
|
||||||
src/
|
src/
|
||||||
├── providers/
|
├── providers/
|
||||||
│ ├── ollama.js # Ollama provider — uses ollama npm package
|
│ ├── ollama.js # Ollama provider — uses ollama npm package
|
||||||
@@ -55,6 +132,27 @@ src/
|
|||||||
│ └── inference.js # /complete and /complete/stream route handlers
|
│ └── inference.js # /complete and /complete/stream route handlers
|
||||||
├── infer.js # Provider loader — selects and re-exports active provider
|
├── infer.js # Provider loader — selects and re-exports active provider
|
||||||
└── index.js # Express app + route definitions
|
└── index.js # Express app + route definitions
|
||||||
|
```
|
||||||
|
|
||||||
|
## Streaming Response Format
|
||||||
|
|
||||||
|
The llama.cpp provider yields chunks in this shape:
|
||||||
|
```js
|
||||||
|
{ response: "token text", done: false }
|
||||||
|
// final chunk:
|
||||||
|
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
|
||||||
|
```
|
||||||
|
|
||||||
|
The inference route re-emits these as SSE events:
|
||||||
|
```
|
||||||
|
data: {"response":"token text"}
|
||||||
|
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
|
||||||
|
data: [DONE]
|
||||||
|
```
|
||||||
|
|
||||||
|
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
|
||||||
|
chunk (`usage.completion_tokens`) and emitted on the done event so the
|
||||||
|
orchestration layer can forward them to the client.
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
@@ -79,7 +177,7 @@ Request body:
|
|||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"prompt": "What is the capital of France?",
|
"prompt": "What is the capital of France?",
|
||||||
"model": "companion:latest",
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
"temperature": 0.7,
|
"temperature": 0.7,
|
||||||
"maxTokens": 1024
|
"maxTokens": 1024
|
||||||
}
|
}
|
||||||
@@ -93,33 +191,26 @@ Response:
|
|||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"text": "The capital of France is Paris.",
|
"text": "The capital of France is Paris.",
|
||||||
"model": "companion:latest",
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
"done": true,
|
"done": true,
|
||||||
"evalCount": 8,
|
"evalCount": 8,
|
||||||
"promptEvalCount": 41
|
"promptEvalCount": 41
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
| Field | Description |
|
|
||||||
|---|---|
|
|
||||||
| `text` | The model's response |
|
|
||||||
| `model` | Model name as reported by the provider |
|
|
||||||
| `done` | Whether generation completed normally |
|
|
||||||
| `evalCount` | Number of tokens generated |
|
|
||||||
| `promptEvalCount` | Number of tokens in the prompt |
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**POST /complete/stream**
|
**POST /complete/stream**
|
||||||
|
|
||||||
Same request body as `/complete` (`maxTokens` not applicable for streaming).
|
Same request body as `/complete`.
|
||||||
|
|
||||||
Response is a stream of Server-Sent Events. Each event contains a partial
|
Response is a stream of Server-Sent Events:
|
||||||
response chunk as JSON. The stream closes with a final `data: [DONE]` event.
|
```
|
||||||
data: {"model":"companion:latest","response":"The","done":false}
|
data: {"response":"The"}
|
||||||
data: {"model":"companion:latest","response":" capital","done":false}
|
data: {"response":" capital of France is Paris."}
|
||||||
data: {"model":"companion:latest","response":" of France is Paris.","done":false}
|
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
|
||||||
data: [DONE]
|
data: [DONE]
|
||||||
|
```
|
||||||
|
|
||||||
Clients should read the `response` field from each chunk and accumulate
|
Clients should accumulate `response` fields to build the full response string.
|
||||||
them to build the full response string.
|
The `done` event carries `model` and `tokenCount` for display in the UI.
|
||||||
@@ -34,7 +34,7 @@ service to generate and store a vector in Qdrant.
|
|||||||
```
|
```
|
||||||
src/
|
src/
|
||||||
├── db/
|
├── db/
|
||||||
│ ├── index.js # SQLite connection + initialization
|
│ ├── index.js # SQLite connection + initialization + migrations
|
||||||
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
||||||
├── episodic/
|
├── episodic/
|
||||||
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
||||||
@@ -49,12 +49,29 @@ src/
|
|||||||
|
|
||||||
Five core tables:
|
Five core tables:
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers, identified by an `external_id`
|
- **sessions** — top-level conversation containers, identified by an `external_id` and optional `name`
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||||
- **entities** — named things the system learns about (people, places, concepts)
|
- **entities** — named things the system learns about (people, places, concepts)
|
||||||
- **relationships** — directional labeled links between entities
|
- **relationships** — directional labeled links between entities
|
||||||
- **summaries** — condensed episode groups for efficient context retrieval
|
- **summaries** — condensed episode groups for efficient context retrieval
|
||||||
|
|
||||||
|
### Migrations
|
||||||
|
|
||||||
|
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
|
||||||
|
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
|
||||||
|
already-applied changes:
|
||||||
|
|
||||||
|
```js
|
||||||
|
try {
|
||||||
|
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
|
||||||
|
} catch {
|
||||||
|
// Column already exists — safe to ignore on subsequent startups
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Current migrations:
|
||||||
|
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
|
||||||
|
|
||||||
### FTS5 Full-Text Search
|
### FTS5 Full-Text Search
|
||||||
|
|
||||||
An `episodes_fts` virtual table enables keyword search across all episodes.
|
An `episodes_fts` virtual table enables keyword search across all episodes.
|
||||||
@@ -144,9 +161,14 @@ Entities and relationships are stored in SQLite with two key constraints:
|
|||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| POST | /sessions | Create a new session |
|
| POST | /sessions | Create a new session |
|
||||||
|
| GET | /sessions | Get paginated list of all sessions |
|
||||||
| GET | /sessions/:id | Get session by internal ID |
|
| GET | /sessions/:id | Get session by internal ID |
|
||||||
| GET | /sessions/by-external/:externalId | Get session by external ID |
|
| GET | /sessions/by-external/:externalId | Get session by external ID |
|
||||||
| DELETE | /sessions/:id | Delete session (cascades to episodes + summaries) |
|
| PATCH | /sessions/by-external/:externalId | Update session name |
|
||||||
|
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
|
||||||
|
|
||||||
|
> Route ordering matters in Express: `by-external/:externalId` must be defined before
|
||||||
|
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
|
||||||
|
|
||||||
**POST /sessions body:**
|
**POST /sessions body:**
|
||||||
```json
|
```json
|
||||||
@@ -156,6 +178,20 @@ Entities and relationships are stored in SQLite with two key constraints:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**PATCH /sessions/by-external/:externalId body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "My Renamed Session"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns the updated session object. `name` is required and must be non-empty.
|
||||||
|
|
||||||
|
**DELETE /sessions/by-external/:externalId**
|
||||||
|
|
||||||
|
Returns `204 No Content` on success. Cascades to delete all associated episodes
|
||||||
|
and summaries via SQLite `ON DELETE CASCADE`.
|
||||||
|
|
||||||
### Episodes
|
### Episodes
|
||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|
|||||||
@@ -14,14 +14,10 @@ or inference services — all traffic flows through orchestration.
|
|||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `express` : HTTP API
|
- `express` — HTTP API
|
||||||
- `cors` : cross-origin resource sharing middleware
|
- `cors` — cross-origin resource sharing middleware
|
||||||
- `node-fetch` : inter-service HTTP communication (memory service client only)
|
- `dotenv` — environment variable loading
|
||||||
- `dotenv` : environment variable loading
|
- `@nexusai/shared` — shared utilities
|
||||||
- `@nexusai/shared` : shared utilities
|
|
||||||
|
|
||||||
> `memory.js` uses `node-fetch` v2 (pinned) because it is CommonJS. All other
|
|
||||||
> service clients use Node.js built-in `fetch`.
|
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
@@ -33,6 +29,7 @@ or inference services — all traffic flows through orchestration.
|
|||||||
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
||||||
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
||||||
|
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
```
|
```
|
||||||
@@ -46,7 +43,8 @@ src/
|
|||||||
│ └── index.js # Core pipeline logic — context assembly and coordination
|
│ └── index.js # Core pipeline logic — context assembly and coordination
|
||||||
├── routes/
|
├── routes/
|
||||||
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
|
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
|
||||||
│ └── sessions.js # GET /sessions/:sessionId/history route handler
|
│ ├── sessions.js # Session list, history, rename, and delete routes
|
||||||
|
│ └── models.js # GET /models — reads models.json manifest from disk
|
||||||
└── index.js # Express app entry point
|
└── index.js # Express app entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -65,7 +63,7 @@ the client.
|
|||||||
UUID for new conversations and pass it directly — no pre-creation step needed.
|
UUID for new conversations and pass it directly — no pre-creation step needed.
|
||||||
|
|
||||||
2. **Recent episode retrieval** — fetches the most recent episodes for the session
|
2. **Recent episode retrieval** — fetches the most recent episodes for the session
|
||||||
(default: 10) from the memory service.
|
(default: 5) from the memory service.
|
||||||
|
|
||||||
3. **Semantic search** — embeds the user message via the embedding service, then
|
3. **Semantic search** — embeds the user message via the embedding service, then
|
||||||
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
|
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
|
||||||
@@ -89,37 +87,68 @@ the client.
|
|||||||
count to the client.
|
count to the client.
|
||||||
|
|
||||||
## Prompt Structure
|
## Prompt Structure
|
||||||
|
```
|
||||||
[System prompt]
|
[System prompt]
|
||||||
|
|
||||||
Here are some relevant memories from earlier conversations:
|
Here are some relevant memories from earlier conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to 5 semantic episodes)
|
... (up to 5 semantic episodes)
|
||||||
Here is the recent conversation history:
|
---
|
||||||
|
Here are some relevant memories from your past conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to 10 recent episodes)
|
... (up to 5 recent episodes)
|
||||||
--- End of memories ---
|
--- End of recent memories ---
|
||||||
|
|
||||||
User: {current message}
|
User: {current message}
|
||||||
Assistant:
|
Assistant:
|
||||||
|
```
|
||||||
|
|
||||||
Semantic episodes appear before recent episodes so the model encounters
|
Semantic episodes appear before recent episodes so the model encounters
|
||||||
long-range relevant context before the immediate conversation flow.
|
long-range relevant context before the immediate conversation flow.
|
||||||
|
|
||||||
## SSE Stream Format
|
## SSE Stream Format
|
||||||
|
|
||||||
The inference service emits chunks in this format:
|
The inference service emits chunks from the llama.cpp provider in this format:
|
||||||
data: {"model":"companion:latest","response":"Hello","done":false}
|
```
|
||||||
data: {"model":"companion:latest","response":"!","done":true,"eval_count":3,...}
|
data: {"response":"Hello","done":false}
|
||||||
|
data: {"response":"!","done":false}
|
||||||
|
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||||
data: [DONE]
|
data: [DONE]
|
||||||
|
```
|
||||||
|
|
||||||
The orchestration service re-emits to the client as:
|
The orchestration service re-emits to the client as:
|
||||||
|
```
|
||||||
data: {"text":"Hello"}
|
data: {"text":"Hello"}
|
||||||
data: {"text":"!"}
|
data: {"text":"!"}
|
||||||
data: {"done":true}
|
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
||||||
|
```
|
||||||
|
|
||||||
The `[DONE]` sentinel from the inference service is consumed internally
|
The `[DONE]` sentinel from the inference service is consumed internally
|
||||||
and not forwarded. The client stream is terminated by `res.end()` after
|
and not forwarded. The client stream is terminated by `res.end()` after
|
||||||
the `{"done":true}` event.
|
the done event. Model name and token count are included on the done event
|
||||||
|
so the client can display them in the UI.
|
||||||
|
|
||||||
|
## Models Manifest
|
||||||
|
|
||||||
|
The `/models` endpoint reads a `models.json` file from disk at the path
|
||||||
|
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
|
||||||
|
the model files, and is accessible to orchestration via a network share
|
||||||
|
mounted at `/mnt/nexus-models`.
|
||||||
|
|
||||||
|
The manifest is read fresh on each request — no restart needed when models
|
||||||
|
are added or removed.
|
||||||
|
|
||||||
|
**models.json format:**
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
|
||||||
|
- `label` — display name shown in the UI
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
@@ -142,6 +171,14 @@ the `{"done":true}` event.
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| GET | /sessions | Get paginated list of all sessions |
|
| GET | /sessions | Get paginated list of all sessions |
|
||||||
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
|
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
|
||||||
|
| PATCH | /sessions/:sessionId | Rename a session |
|
||||||
|
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
|
||||||
|
|
||||||
|
### Models
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /models | Get list of available models from manifest file |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -152,7 +189,7 @@ Request body:
|
|||||||
{
|
{
|
||||||
"sessionId": "your-session-uuid",
|
"sessionId": "your-session-uuid",
|
||||||
"message": "Hello, my name is Tim.",
|
"message": "Hello, my name is Tim.",
|
||||||
"model": "companion:latest",
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
"temperature": 0.7
|
"temperature": 0.7
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@@ -165,7 +202,7 @@ Response:
|
|||||||
{
|
{
|
||||||
"sessionId": "your-session-uuid",
|
"sessionId": "your-session-uuid",
|
||||||
"response": "Hello Tim! How can I help you today?",
|
"response": "Hello Tim! How can I help you today?",
|
||||||
"model": "companion:latest",
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
"tokenCount": 87
|
"tokenCount": 87
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@@ -176,23 +213,34 @@ Response:
|
|||||||
|
|
||||||
Same request body as `POST /chat`.
|
Same request body as `POST /chat`.
|
||||||
|
|
||||||
Response is a stream of Server-Sent Events. Each event contains a text
|
Response is a stream of Server-Sent Events:
|
||||||
delta. The stream ends with a `done` event.
|
```
|
||||||
data: {"text":"Hello"}
|
data: {"text":"Hello"}
|
||||||
data: {"text":" Tim"}
|
data: {"text":" Tim"}
|
||||||
data: {"text":"!"}
|
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
||||||
data: {"done":true}
|
```
|
||||||
|
|
||||||
Clients should read the `text` field from each chunk and accumulate them
|
---
|
||||||
to build the full response string. The connection is closed by the server
|
|
||||||
after the `{"done":true}` event.
|
**PATCH /sessions/:sessionId**
|
||||||
|
|
||||||
|
Request body:
|
||||||
|
```json
|
||||||
|
{ "name": "My Renamed Session" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns the updated session object. `name` is required and trimmed of whitespace.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**DELETE /sessions/:sessionId**
|
||||||
|
|
||||||
|
Returns `204 No Content`. Cascades to delete all episodes for the session.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**GET /sessions/:sessionId/history**
|
**GET /sessions/:sessionId/history**
|
||||||
|
|
||||||
Returns paginated episode history for a session identified by its external ID.
|
|
||||||
|
|
||||||
Query parameters:
|
Query parameters:
|
||||||
|
|
||||||
| Parameter | Default | Description |
|
| Parameter | Default | Description |
|
||||||
@@ -218,30 +266,17 @@ Response:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Episodes are ordered newest first.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**GET /sessions**
|
**GET /models**
|
||||||
|
|
||||||
Returns a paginated list of all sessions, ordered by most recently active.
|
Returns the parsed contents of `models.json`:
|
||||||
|
|
||||||
Query parameters:
|
|
||||||
|
|
||||||
| Parameter | Default | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| limit | 20 | Maximum number of sessions to return |
|
|
||||||
| offset | 0 | Number of sessions to skip (for pagination) |
|
|
||||||
|
|
||||||
Response:
|
|
||||||
```json
|
```json
|
||||||
[
|
[
|
||||||
{
|
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||||
"id": 1,
|
|
||||||
"external_id": "test-semantic",
|
|
||||||
"metadata": null,
|
|
||||||
"created_at": 1712345678,
|
|
||||||
"updated_at": 1712345999
|
|
||||||
}
|
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
Episodes are ordered newest first. Returns `404` if the session does not exist.
|
Returns `500` if the manifest file cannot be read or parsed.
|
||||||
@@ -24,13 +24,40 @@ const DB = getEnv('SQLITE_PATH'); // required — throws if missing
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
### `parseRow(row)`
|
||||||
|
|
||||||
|
Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields
|
||||||
|
into plain objects. Returns `null` if the row is `null` or `undefined`.
|
||||||
|
|
||||||
|
```js
|
||||||
|
const { parseRow } = require('@nexusai/shared');
|
||||||
|
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### `formatEpisodeText(userMessage, aiResponse)`
|
||||||
|
|
||||||
|
Combines a user message and AI response into the canonical text format used
|
||||||
|
for embedding:
|
||||||
|
|
||||||
|
```
|
||||||
|
User: {userMessage}
|
||||||
|
Assistant: {aiResponse}
|
||||||
|
```
|
||||||
|
|
||||||
|
Used by the memory service's embedding write path to ensure consistent
|
||||||
|
vector representations across all episodes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
### Constants
|
### Constants
|
||||||
|
|
||||||
Tuneable values and shared identifiers are centralised in `constants.js`
|
Tuneable values and shared identifiers are centralised in `constants.js`
|
||||||
rather than hardcoded across services. Import the relevant group by name.
|
rather than hardcoded across services. Import the relevant group by name.
|
||||||
|
|
||||||
```js
|
```js
|
||||||
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
|
const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');
|
||||||
```
|
```
|
||||||
|
|
||||||
#### `QDRANT`
|
#### `QDRANT`
|
||||||
@@ -40,15 +67,14 @@ embedding model and Qdrant collection setup.
|
|||||||
|
|
||||||
| Key | Value | Description |
|
| Key | Value | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL |
|
||||||
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
|
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
|
||||||
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
|
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
|
||||||
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
|
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
|
||||||
|
|
||||||
#### `COLLECTIONS`
|
#### `COLLECTIONS`
|
||||||
|
|
||||||
Canonical Qdrant collection names. Used by both the semantic layer and
|
Canonical Qdrant collection names.
|
||||||
any service that constructs Qdrant queries directly.
|
|
||||||
|
|
||||||
| Key | Value |
|
| Key | Value |
|
||||||
|---|---|
|
|---|---|
|
||||||
@@ -65,6 +91,8 @@ Default pagination and result limits for SQLite episode queries.
|
|||||||
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
|
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
|
||||||
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
|
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
|
||||||
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
|
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
|
||||||
|
| `DEFAULT_OFFSET` | `0` | Default pagination offset |
|
||||||
|
| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return |
|
||||||
|
|
||||||
#### `SERVICES`
|
#### `SERVICES`
|
||||||
|
|
||||||
@@ -74,3 +102,75 @@ when the corresponding environment variable is not set.
|
|||||||
| Key | Value | Description |
|
| Key | Value | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
|
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
|
||||||
|
| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL |
|
||||||
|
| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL |
|
||||||
|
|
||||||
|
#### `PORTS`
|
||||||
|
|
||||||
|
Default port numbers for each service.
|
||||||
|
|
||||||
|
| Key | Value |
|
||||||
|
|---|---|
|
||||||
|
| `INFERENCE` | `'3001'` |
|
||||||
|
| `MEMORY` | `'3002'` |
|
||||||
|
| `EMBEDDING` | `'3003'` |
|
||||||
|
| `ORCHESTRATION` | `'4000'` |
|
||||||
|
|
||||||
|
#### `OLLAMA`
|
||||||
|
|
||||||
|
Ollama runtime defaults — used by the Ollama inference provider.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL |
|
||||||
|
| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model |
|
||||||
|
| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model |
|
||||||
|
|
||||||
|
#### `LLAMACPP`
|
||||||
|
|
||||||
|
llama.cpp runtime defaults — used by the llama.cpp inference provider.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL |
|
||||||
|
| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) |
|
||||||
|
|
||||||
|
> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model
|
||||||
|
> name reported by `llama-server` (including `.gguf` extension). The shared
|
||||||
|
> constant is a last-resort fallback only.
|
||||||
|
|
||||||
|
#### `INFERENCE_DEFAULTS`
|
||||||
|
|
||||||
|
Default inference parameters applied when not specified in a request.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) |
|
||||||
|
| `MAX_TOKENS` | `1024` | Maximum tokens to generate |
|
||||||
|
| `TOP_P` | `0.9` | Nucleus sampling probability mass |
|
||||||
|
| `TOP_K` | `40` | Top-K candidates at each step |
|
||||||
|
| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens |
|
||||||
|
| `SEED` | `null` | null = random; set integer for reproducible outputs |
|
||||||
|
|
||||||
|
#### `ORCHESTRATION`
|
||||||
|
|
||||||
|
Orchestration pipeline defaults.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
||||||
|
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
||||||
|
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
||||||
|
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
||||||
|
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
||||||
|
|
||||||
|
Default system prompt:
|
||||||
|
> "You are a helpful, context-aware AI assistant. You have access to memories
|
||||||
|
> of past conversations with the user. Use them to provide consistent,
|
||||||
|
> personalised responses."
|
||||||
|
|
||||||
|
#### `SQLITE`
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path |
|
||||||
Reference in New Issue
Block a user