documentation updated for model inference settings

This commit is contained in:
Storme-bit
2026-04-18 06:41:50 -07:00
parent c198a00dde
commit 44989a2b8b
5 changed files with 182 additions and 41 deletions

View File

@@ -30,7 +30,10 @@ here for reference and direct debugging use.
"temperature": 0.7
}
```
`model` and `temperature` are optional.
`model` and `temperature` are optional. Inference parameters (temperature,
topP, topK, repeatPenalty) are read from `settings.json` on every request —
the request body values are not used for these; they are controlled via
`PATCH /settings`.
**POST /chat — response:**
```json
@@ -110,9 +113,74 @@ Returns `201` with the created project object.
| Method | Path | Description |
|---|---|---|
| GET | /models | Available models from `models.json` manifest |
| GET | /models | Available models scanned live from models folder |
| GET | /models/props | Live model props from llama-server (context window, loaded model) |
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
**GET /models** — returns array:
```json
[{ "value": "model-name.gguf", "label": "Display Name", "description": null, "size": "19.7 GB" }]
```
Scans `.gguf` files live from `modelsFolderPath` (set in settings). Merges
with `models.json` in the same folder for label and description metadata.
**GET /models/props** — returns:
```json
{ "contextWindow": 64000, "modelAlias": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf" }
```
Fetches directly from llama-server `/props`. Returns `503` if llama-server
is unreachable.
### Settings
| Method | Path | Description |
|---|---|---|
| GET | /settings | Get all current settings |
| PATCH | /settings | Update one or more settings |
**GET /settings — response:**
```json
{
"recentEpisodeLimit": 9,
"semanticLimit": 5,
"scoreThreshold": 0.6,
"modelsFolderPath": "/mnt/nexus-models",
"temperature": 0.65,
"repeatPenalty": 1.3,
"topP": 0.9,
"topK": 41
}
```
**PATCH /settings — body:** any subset of the above fields.
| Field | Type | Range | Description |
|---|---|---|---|
| `recentEpisodeLimit` | integer | 120 | Recent episodes injected into prompt |
| `semanticLimit` | integer | 120 | Max semantic search results |
| `scoreThreshold` | float | 01 | Minimum similarity score |
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
| `temperature` | float | 02 | Inference randomness |
| `repeatPenalty` | float | 12 | Repeat token penalty |
| `topP` | float | 01 | Nucleus sampling probability mass |
| `topK` | integer | 1100 | Top-K token candidates per step |
Settings are persisted to `data/settings.json` and read on every request —
changes take effect immediately without a service restart.
### Episodes
| Method | Path | Description |
|---|---|---|
| GET | /episodes | Paginated episode list across all sessions |
| DELETE | /episodes/:id | Delete an episode (SQLite + Qdrant) |
**GET /episodes — query params:**
| Param | Default | Description |
|---|---|---|
| limit | 20 | Episodes per page |
| offset | 0 | Pagination offset |
| q | — | Keyword search (FTS) |
---
@@ -158,10 +226,11 @@ are not touched.
| Method | Path | Description |
|---|---|---|
| POST | /episodes | Create episode + auto-embed into Qdrant |
| GET | /episodes | Paginated episode list across all sessions |
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
| GET | /episodes/:id | Get episode by ID |
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
| DELETE | /episodes/:id | Delete an episode |
| DELETE | /episodes/:id | Delete episode (SQLite + Qdrant cleanup) |
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
@@ -266,10 +335,14 @@ is awkward to encode in a path.
"prompt": "What is the capital of France?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
"maxTokens": 1024,
"topP": 0.9,
"topK": 40,
"repeatPenalty": 1.1
}
```
All fields except `prompt` are optional.
All fields except `prompt` are optional. In normal usage these are forwarded
from orchestration, which reads them from `settings.json`.
**POST /complete — response:**
```json