documentation updated for model inference settings
This commit is contained in:
@@ -30,7 +30,10 @@ here for reference and direct debugging use.
|
|||||||
"temperature": 0.7
|
"temperature": 0.7
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
`model` and `temperature` are optional.
|
`model` and `temperature` are optional. Inference parameters (temperature,
|
||||||
|
topP, topK, repeatPenalty) are read from `settings.json` on every request —
|
||||||
|
the request body values are not used for these; they are controlled via
|
||||||
|
`PATCH /settings`.
|
||||||
|
|
||||||
**POST /chat — response:**
|
**POST /chat — response:**
|
||||||
```json
|
```json
|
||||||
@@ -110,9 +113,74 @@ Returns `201` with the created project object.
|
|||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| GET | /models | Available models from `models.json` manifest |
|
| GET | /models | Available models scanned live from models folder |
|
||||||
|
| GET | /models/props | Live model props from llama-server (context window, loaded model) |
|
||||||
|
|
||||||
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
|
**GET /models** — returns array:
|
||||||
|
```json
|
||||||
|
[{ "value": "model-name.gguf", "label": "Display Name", "description": null, "size": "19.7 GB" }]
|
||||||
|
```
|
||||||
|
Scans `.gguf` files live from `modelsFolderPath` (set in settings). Merges
|
||||||
|
with `models.json` in the same folder for label and description metadata.
|
||||||
|
|
||||||
|
**GET /models/props** — returns:
|
||||||
|
```json
|
||||||
|
{ "contextWindow": 64000, "modelAlias": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf" }
|
||||||
|
```
|
||||||
|
Fetches directly from llama-server `/props`. Returns `503` if llama-server
|
||||||
|
is unreachable.
|
||||||
|
|
||||||
|
### Settings
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /settings | Get all current settings |
|
||||||
|
| PATCH | /settings | Update one or more settings |
|
||||||
|
|
||||||
|
**GET /settings — response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"recentEpisodeLimit": 9,
|
||||||
|
"semanticLimit": 5,
|
||||||
|
"scoreThreshold": 0.6,
|
||||||
|
"modelsFolderPath": "/mnt/nexus-models",
|
||||||
|
"temperature": 0.65,
|
||||||
|
"repeatPenalty": 1.3,
|
||||||
|
"topP": 0.9,
|
||||||
|
"topK": 41
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**PATCH /settings — body:** any subset of the above fields.
|
||||||
|
|
||||||
|
| Field | Type | Range | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||||
|
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||||
|
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
||||||
|
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||||
|
| `temperature` | float | 0–2 | Inference randomness |
|
||||||
|
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||||
|
| `topP` | float | 0–1 | Nucleus sampling probability mass |
|
||||||
|
| `topK` | integer | 1–100 | Top-K token candidates per step |
|
||||||
|
|
||||||
|
Settings are persisted to `data/settings.json` and read on every request —
|
||||||
|
changes take effect immediately without a service restart.
|
||||||
|
|
||||||
|
### Episodes
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /episodes | Paginated episode list across all sessions |
|
||||||
|
| DELETE | /episodes/:id | Delete an episode (SQLite + Qdrant) |
|
||||||
|
|
||||||
|
**GET /episodes — query params:**
|
||||||
|
|
||||||
|
| Param | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| limit | 20 | Episodes per page |
|
||||||
|
| offset | 0 | Pagination offset |
|
||||||
|
| q | — | Keyword search (FTS) |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -158,10 +226,11 @@ are not touched.
|
|||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| POST | /episodes | Create episode + auto-embed into Qdrant |
|
| POST | /episodes | Create episode + auto-embed into Qdrant |
|
||||||
|
| GET | /episodes | Paginated episode list across all sessions |
|
||||||
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
|
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
|
||||||
| GET | /episodes/:id | Get episode by ID |
|
| GET | /episodes/:id | Get episode by ID |
|
||||||
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
|
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
|
||||||
| DELETE | /episodes/:id | Delete an episode |
|
| DELETE | /episodes/:id | Delete episode (SQLite + Qdrant cleanup) |
|
||||||
|
|
||||||
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
|
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
|
||||||
|
|
||||||
@@ -266,10 +335,14 @@ is awkward to encode in a path.
|
|||||||
"prompt": "What is the capital of France?",
|
"prompt": "What is the capital of France?",
|
||||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
"temperature": 0.7,
|
"temperature": 0.7,
|
||||||
"maxTokens": 1024
|
"maxTokens": 1024,
|
||||||
|
"topP": 0.9,
|
||||||
|
"topK": 40,
|
||||||
|
"repeatPenalty": 1.1
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
All fields except `prompt` are optional.
|
All fields except `prompt` are optional. In normal usage these are forwarded
|
||||||
|
from orchestration, which reads them from `settings.json`.
|
||||||
|
|
||||||
**POST /complete — response:**
|
**POST /complete — response:**
|
||||||
```json
|
```json
|
||||||
|
|||||||
@@ -14,6 +14,7 @@ inference services. Served as static files by Caddy on Mini PC 2.
|
|||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `react` + `react-dom` — UI framework
|
- `react` + `react-dom` — UI framework
|
||||||
|
- `react-markdown` — Markdown rendering in message bubbles and memory viewer
|
||||||
- `uuid` — session ID generation
|
- `uuid` — session ID generation
|
||||||
- `vite` + `@vitejs/plugin-react` — build tooling
|
- `vite` + `@vitejs/plugin-react` — build tooling
|
||||||
|
|
||||||
@@ -63,13 +64,16 @@ export default defineConfig({
|
|||||||
'/sessions': 'http://192.168.0.205:4000',
|
'/sessions': 'http://192.168.0.205:4000',
|
||||||
'/chat': 'http://192.168.0.205:4000',
|
'/chat': 'http://192.168.0.205:4000',
|
||||||
'/projects': 'http://192.168.0.205:4000',
|
'/projects': 'http://192.168.0.205:4000',
|
||||||
|
'/episodes': 'http://192.168.0.205:4000',
|
||||||
|
'/settings': 'http://192.168.0.205:4000',
|
||||||
|
'/health': 'http://192.168.0.205:4000',
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
```
|
```
|
||||||
|
|
||||||
When adding new top-level routes to the orchestration service, add a matching
|
When adding new top-level routes to the orchestration service, add a matching
|
||||||
entry here too.
|
entry here and in the Caddy config.
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
@@ -84,19 +88,22 @@ src/
|
|||||||
│ ├── useChat.js # Message sending, SSE streaming, message state
|
│ ├── useChat.js # Message sending, SSE streaming, message state
|
||||||
│ ├── useModels.js # Dynamic model list fetched from /models endpoint
|
│ ├── useModels.js # Dynamic model list fetched from /models endpoint
|
||||||
│ ├── useProjects.js # Project list fetched from /projects endpoint
|
│ ├── useProjects.js # Project list fetched from /projects endpoint
|
||||||
|
│ ├── useSettings.js # Settings fetch + saveSetting helper
|
||||||
│ └── useContextMenu.js # Right-click context menu position and visibility
|
│ └── useContextMenu.js # Right-click context menu position and visibility
|
||||||
├── components/
|
├── components/
|
||||||
│ ├── App.jsx # Root component — layout, shared state, view routing
|
│ ├── App.jsx # Root component — layout, shared state, view routing
|
||||||
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
|
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
|
||||||
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
||||||
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
│ ├── MessageBubble.jsx # Individual message bubble — renders markdown via react-markdown
|
||||||
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata (slide-in)
|
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata (slide-in)
|
||||||
│ ├── SessionModal.jsx # Modal for session rename, project assignment, delete
|
│ ├── SessionModal.jsx # Modal for session rename, project assignment, delete
|
||||||
│ ├── ProjectModal.jsx # Modal for project create, edit, delete
|
│ ├── ProjectModal.jsx # Modal for project create, edit, delete
|
||||||
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
|
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
|
||||||
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
|
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
|
||||||
│ ├── ProjectView.jsx # Individual project — session list, new chat button
|
│ ├── ProjectView.jsx # Individual project — session list, new chat button
|
||||||
│ └── SettingsView.jsx # Settings placeholder (Appearance, Memory, Models, About)
|
│ ├── MemoryView.jsx # Paginated, searchable, expandable, deletable episode viewer
|
||||||
|
│ └── SettingsView.jsx # Settings — Memory limits, Models (inference params, active
|
||||||
|
│ # model, context window), Service Health, Appearance placeholder
|
||||||
├── index.css # Global reset, CSS variables, utility classes
|
├── index.css # Global reset, CSS variables, utility classes
|
||||||
└── main.jsx # React entry point
|
└── main.jsx # React entry point
|
||||||
```
|
```
|
||||||
@@ -118,7 +125,7 @@ panel are persistent across all views.
|
|||||||
│ ⊞ View Projects │ all-projects → AllProjectsView│
|
│ ⊞ View Projects │ all-projects → AllProjectsView│
|
||||||
│ │ project → ProjectView │
|
│ │ project → ProjectView │
|
||||||
│ PROJECTS ▾ │ settings → SettingsView │
|
│ PROJECTS ▾ │ settings → SettingsView │
|
||||||
│ [tile] [tile] │ │
|
│ [tile] [tile] │ memory → MemoryView │
|
||||||
│ All Projects → │ │
|
│ All Projects → │ │
|
||||||
│ │ │
|
│ │ │
|
||||||
│ RECENT CHATS ▾ │ │
|
│ RECENT CHATS ▾ │ │
|
||||||
@@ -143,6 +150,7 @@ via the `⊹` button in the `ChatWindow` header.
|
|||||||
| `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
|
| `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
|
||||||
| `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
|
| `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
|
||||||
| `'settings'` | `SettingsView` | Settings button or ⚙ icon |
|
| `'settings'` | `SettingsView` | Settings button or ⚙ icon |
|
||||||
|
| `'memory'` | `MemoryView` | "Open →" button in Settings → Memory section |
|
||||||
|
|
||||||
`activeProject` state in `App.jsx` tracks which project `ProjectView` is
|
`activeProject` state in `App.jsx` tracks which project `ProjectView` is
|
||||||
displaying. Set via `onSelectProject` before navigating to `'project'`.
|
displaying. Set via `onSelectProject` before navigating to `'project'`.
|
||||||
@@ -262,3 +270,18 @@ and a filtered session list. The "+ New Chat" button creates a new session,
|
|||||||
navigates to `'chat'`, and writes the project assignment after the first message.
|
navigates to `'chat'`, and writes the project assignment after the first message.
|
||||||
|
|
||||||
For memory isolation behaviour, see `memory-isolation.md`.
|
For memory isolation behaviour, see `memory-isolation.md`.
|
||||||
|
|
||||||
|
## Settings
|
||||||
|
|
||||||
|
`useSettings` fetches from `GET /settings` on mount and exposes a `saveSetting(key, value)`
|
||||||
|
helper that issues a `PATCH /settings` with a single key-value pair. The `saving`
|
||||||
|
boolean is exposed for disabling save buttons during in-flight requests.
|
||||||
|
|
||||||
|
`SettingsView` is organised into sections:
|
||||||
|
|
||||||
|
- **Memory** — recent episode limit, semantic limit, score threshold, link to MemoryView
|
||||||
|
- **Models** — models folder path, temperature, repeat penalty, Top-P, Top-K,
|
||||||
|
active model dropdown, read-only model info panel (file, size, context window,
|
||||||
|
loaded model from llama-server)
|
||||||
|
- **About** — service health check panel, version
|
||||||
|
- **Appearance** — theme (coming soon)
|
||||||
@@ -54,6 +54,11 @@ INFERENCE_URL=http://localhost:8080
|
|||||||
The provider loader throws immediately on an unknown value, preventing silent
|
The provider loader throws immediately on an unknown value, preventing silent
|
||||||
misconfiguration.
|
misconfiguration.
|
||||||
|
|
||||||
|
> **LM Studio compatibility note:** LM Studio exposes an OpenAI-compatible
|
||||||
|
> `/v1/chat/completions` endpoint with the same request shape as llama.cpp.
|
||||||
|
> A future `lmstudio.js` provider would be nearly identical to `llamacpp.js` —
|
||||||
|
> only the `BASE_URL` would differ. No architectural changes required.
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -109,14 +114,19 @@ Set `DEFAULT_MODEL` in `.env` to the exact reported name.
|
|||||||
|
|
||||||
### Inference Parameters
|
### Inference Parameters
|
||||||
|
|
||||||
| NexusAI option | API field | Default |
|
All parameters are resolved in `resolveOptions()` — falling back to
|
||||||
|---|---|---|
|
`INFERENCE_DEFAULTS` from `@nexusai/shared` if not provided in the request.
|
||||||
| `temperature` | `temperature` | 0.7 |
|
In normal usage, orchestration reads these from `settings.json` and forwards
|
||||||
| `maxTokens` | `max_tokens` | 1024 |
|
them on every request.
|
||||||
| `topP` | `top_p` | 0.9 |
|
|
||||||
| `topK` | `top_k` | 40 |
|
| NexusAI option | API field | Default | Description |
|
||||||
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
|---|---|---|---|
|
||||||
| `seed` | `seed` | null (random) |
|
| `temperature` | `temperature` | 0.7 | Response randomness (0 = deterministic) |
|
||||||
|
| `maxTokens` | `max_tokens` | 1024 | Max tokens to generate |
|
||||||
|
| `topP` | `top_p` | 0.9 | Nucleus sampling probability mass |
|
||||||
|
| `topK` | `top_k` | 40 | Top-K token candidates per step |
|
||||||
|
| `repeatPenalty` | `repeat_penalty` | 1.1 | Penalty for recently used tokens |
|
||||||
|
| `seed` | `seed` | null | null = random; integer for reproducible output |
|
||||||
|
|
||||||
## Streaming Response Format
|
## Streaming Response Format
|
||||||
|
|
||||||
|
|||||||
@@ -27,9 +27,10 @@ or inference services — all traffic flows through orchestration.
|
|||||||
| MEMORY_SERVICE_URL | No | http://localhost:3002 | Memory service URL |
|
| MEMORY_SERVICE_URL | No | http://localhost:3002 | Memory service URL |
|
||||||
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
||||||
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
| INFERENCE_SERVICE_URL | No | http://localhost:3001 | Inference service URL |
|
||||||
|
| LLAMA_SERVER_URL | No | http://localhost:8080 | Direct llama-server URL for /models/props |
|
||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
||||||
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
||||||
| MODELS_MANIFEST_PATH | Yes | — | Path to `models.json` manifest file |
|
| MODELS_MANIFEST_PATH | No | — | Legacy — superseded by `modelsFolderPath` in settings.json |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
@@ -42,17 +43,42 @@ src/
|
|||||||
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
|
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
|
||||||
├── chat/
|
├── chat/
|
||||||
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
||||||
|
├── config/
|
||||||
|
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
||||||
├── routes/
|
├── routes/
|
||||||
│ ├── chat.js # POST /chat and POST /chat/stream
|
│ ├── chat.js # POST /chat and POST /chat/stream
|
||||||
│ ├── sessions.js # Session CRUD proxy
|
│ ├── sessions.js # Session CRUD proxy
|
||||||
│ ├── projects.js # Project CRUD proxy
|
│ ├── projects.js # Project CRUD proxy
|
||||||
│ └── models.js # GET /models — reads models.json from disk
|
│ ├── episodes.js # Episode list and delete proxy
|
||||||
|
│ ├── settings.js # GET /settings and PATCH /settings
|
||||||
|
│ ├── health.js # GET /health — pings all four services
|
||||||
|
│ └── models.js # GET /models — scans .gguf files live, merges with models.json
|
||||||
|
# GET /models/props — context window + loaded model from llama-server
|
||||||
└── index.js # Express app entry point
|
└── index.js # Express app entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
The `services/` layer wraps all downstream HTTP calls in named functions.
|
The `services/` layer wraps all downstream HTTP calls in named functions.
|
||||||
URL or endpoint changes have a single place to be updated.
|
URL or endpoint changes have a single place to be updated.
|
||||||
|
|
||||||
|
## Settings
|
||||||
|
|
||||||
|
Settings are persisted to `data/settings.json` and loaded on every request
|
||||||
|
via `appSettings.load()` — changes apply immediately without a service restart.
|
||||||
|
|
||||||
|
| Setting | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||||
|
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||||
|
| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
|
||||||
|
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||||
|
| `temperature` | 0.7 | Inference temperature |
|
||||||
|
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||||
|
| `topP` | 0.9 | Nucleus sampling probability mass |
|
||||||
|
| `topK` | 40 | Top-K token candidates per step |
|
||||||
|
|
||||||
|
Defaults are defined in `config/settings.js` and fall back to constants in
|
||||||
|
`@nexusai/shared`. Values saved in `settings.json` take precedence.
|
||||||
|
|
||||||
## Chat Pipeline
|
## Chat Pipeline
|
||||||
|
|
||||||
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
||||||
@@ -69,11 +95,11 @@ difference is how the inference response is delivered to the client.
|
|||||||
`memory-isolation.md` for full behaviour.
|
`memory-isolation.md` for full behaviour.
|
||||||
|
|
||||||
3. **Recent episode retrieval** — fetch the most recent episodes for the
|
3. **Recent episode retrieval** — fetch the most recent episodes for the
|
||||||
session (`RECENT_EPISODE_LIMIT`, default 5).
|
session (`recentEpisodeLimit`, default 5).
|
||||||
|
|
||||||
4. **Semantic search** — embed the user message, query Qdrant for the top-5
|
4. **Semantic search** — embed the user message, query Qdrant for the top
|
||||||
most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
|
most similar past episodes (`semanticLimit`, `scoreThreshold`). Deduplicated
|
||||||
recent episodes. Non-critical — if it fails, pipeline continues with
|
against recent episodes. Non-critical — if it fails, pipeline continues with
|
||||||
recency-only context.
|
recency-only context.
|
||||||
|
|
||||||
5. **Entity search** — reuse the embedded user message vector to query the
|
5. **Entity search** — reuse the embedded user message vector to query the
|
||||||
@@ -84,7 +110,8 @@ difference is how the inference response is delivered to the client.
|
|||||||
6. **Prompt assembly** — combine system prompt, entity context, semantic
|
6. **Prompt assembly** — combine system prompt, entity context, semantic
|
||||||
episodes, recent episodes, and user message.
|
episodes, recent episodes, and user message.
|
||||||
|
|
||||||
7. **Inference** — send to inference service. `/chat` awaits full response;
|
7. **Inference** — send to inference service with settings-derived parameters
|
||||||
|
(temperature, topP, topK, repeatPenalty). `/chat` awaits full response;
|
||||||
`/chat/stream` pipes SSE chunks to the client.
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
|
|
||||||
8. **Episode write** — write the exchange back to memory. Fire-and-forget
|
8. **Episode write** — write the exchange back to memory. Fire-and-forget
|
||||||
@@ -107,12 +134,12 @@ Here is what you know about entities relevant to this conversation:
|
|||||||
Here are some relevant memories from earlier conversations:
|
Here are some relevant memories from earlier conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to 5 semantic episodes)
|
... (up to semanticLimit semantic episodes)
|
||||||
---
|
---
|
||||||
Here are some relevant memories from your past conversations:
|
Here are some relevant memories from your past conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to 5 recent episodes)
|
... (up to recentEpisodeLimit recent episodes)
|
||||||
--- End of recent memories ---
|
--- End of recent memories ---
|
||||||
|
|
||||||
User: {current message}
|
User: {current message}
|
||||||
@@ -141,20 +168,16 @@ data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
|||||||
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
|
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
|
||||||
is terminated by `res.end()` after the done event.
|
is terminated by `res.end()` after the done event.
|
||||||
|
|
||||||
## Models Manifest
|
## Models Route
|
||||||
|
|
||||||
`GET /models` reads `models.json` fresh on each request from
|
`GET /models` scans `.gguf` files live on each request from `modelsFolderPath`
|
||||||
`MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
|
(read from settings). Merges results with a `models.json` file in the same
|
||||||
accessible via an SMB mount at `/mnt/nexus-models`.
|
folder for richer metadata (label, description). Returns file size in GB.
|
||||||
|
|
||||||
```json
|
`GET /models/props` fetches directly from llama-server via `LLAMA_SERVER_URL`.
|
||||||
[
|
Returns `{ contextWindow, modelAlias }`. Used by the client to display
|
||||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
read-only context window size and the currently loaded model in the settings
|
||||||
]
|
panel. Returns `503` if llama-server is unreachable.
|
||||||
```
|
|
||||||
|
|
||||||
`value` must match the model name as reported by `llama-server` (including
|
|
||||||
`.gguf` extension). No service restart needed when models are added or removed.
|
|
||||||
|
|
||||||
## Sessions Route Behaviour
|
## Sessions Route Behaviour
|
||||||
|
|
||||||
@@ -179,6 +202,9 @@ handle /chat* { reverse_proxy localhost:4000 }
|
|||||||
handle /sessions* { reverse_proxy localhost:4000 }
|
handle /sessions* { reverse_proxy localhost:4000 }
|
||||||
handle /models* { reverse_proxy localhost:4000 }
|
handle /models* { reverse_proxy localhost:4000 }
|
||||||
handle /projects* { reverse_proxy localhost:4000 }
|
handle /projects* { reverse_proxy localhost:4000 }
|
||||||
|
handle /episodes* { reverse_proxy localhost:4000 }
|
||||||
|
handle /settings* { reverse_proxy localhost:4000 }
|
||||||
|
handle /health* { reverse_proxy localhost:4000 }
|
||||||
```
|
```
|
||||||
|
|
||||||
After updating: `caddy reload --config /path/to/Caddyfile`
|
After updating: `caddy reload --config /path/to/Caddyfile`
|
||||||
|
|||||||
@@ -142,6 +142,9 @@ llama.cpp runtime defaults — used by the llama.cpp inference provider.
|
|||||||
#### `INFERENCE_DEFAULTS`
|
#### `INFERENCE_DEFAULTS`
|
||||||
|
|
||||||
Default inference parameters applied when not specified in a request.
|
Default inference parameters applied when not specified in a request.
|
||||||
|
These are used as fallbacks in `resolveOptions()` in both providers.
|
||||||
|
Orchestration reads live values from `settings.json` and forwards them
|
||||||
|
on every request — these constants are the fallback layer only.
|
||||||
|
|
||||||
| Key | Value | Description |
|
| Key | Value | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
@@ -154,16 +157,22 @@ Default inference parameters applied when not specified in a request.
|
|||||||
|
|
||||||
#### `ORCHESTRATION`
|
#### `ORCHESTRATION`
|
||||||
|
|
||||||
Orchestration pipeline defaults.
|
Orchestration pipeline defaults. Used as fallback values in
|
||||||
|
`config/settings.js` when `settings.json` doesn't contain a key.
|
||||||
|
|
||||||
| Key | Value | Description |
|
| Key | Value | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
||||||
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
||||||
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
||||||
|
| `TEMPERATURE` | `0.7` | Default inference temperature |
|
||||||
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
||||||
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
||||||
|
|
||||||
|
> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
|
||||||
|
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
|
||||||
|
> since those constants already define the canonical values.
|
||||||
|
|
||||||
Default system prompt:
|
Default system prompt:
|
||||||
> "You are a helpful, context-aware AI assistant. You have access to memories
|
> "You are a helpful, context-aware AI assistant. You have access to memories
|
||||||
> of past conversations with the user. Use them to provide consistent,
|
> of past conversations with the user. Use them to provide consistent,
|
||||||
|
|||||||
Reference in New Issue
Block a user