documentation updated for model inference settings

2026-04-18 06:41:50 -07:00
parent c198a00dde
commit 44989a2b8b
5 changed files with 182 additions and 41 deletions
--- a/docs/reference/API-routes.md
+++ b/docs/reference/API-routes.md
@@ -30,7 +30,10 @@ here for reference and direct debugging use.
  "temperature": 0.7
 }
 ```
-`model` and `temperature` are optional.
+`model` and `temperature` are optional. Inference parameters (temperature,
+topP, topK, repeatPenalty) are read from `settings.json` on every request —
+the request body values are not used for these; they are controlled via
+`PATCH /settings`.

 **POST /chat — response:**
 ```json
@@ -110,9 +113,74 @@ Returns `201` with the created project object.

 | Method | Path | Description |
 |---|---|---|
-| GET | /models | Available models from `models.json` manifest |
+| GET | /models | Available models scanned live from models folder |
+| GET | /models/props | Live model props from llama-server (context window, loaded model) |

-Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
+**GET /models** — returns array:
+```json
+[{ "value": "model-name.gguf", "label": "Display Name", "description": null, "size": "19.7 GB" }]
+```
+Scans `.gguf` files live from `modelsFolderPath` (set in settings). Merges
+with `models.json` in the same folder for label and description metadata.
+
+**GET /models/props** — returns:
+```json
+{ "contextWindow": 64000, "modelAlias": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf" }
+```
+Fetches directly from llama-server `/props`. Returns `503` if llama-server
+is unreachable.
+
+### Settings
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /settings | Get all current settings |
+| PATCH | /settings | Update one or more settings |
+
+**GET /settings — response:**
+```json
+{
+  "recentEpisodeLimit": 9,
+  "semanticLimit": 5,
+  "scoreThreshold": 0.6,
+  "modelsFolderPath": "/mnt/nexus-models",
+  "temperature": 0.65,
+  "repeatPenalty": 1.3,
+  "topP": 0.9,
+  "topK": 41
+}
+```
+
+**PATCH /settings — body:** any subset of the above fields.
+
+| Field | Type | Range | Description |
+|---|---|---|---|
+| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
+| `semanticLimit` | integer | 1–20 | Max semantic search results |
+| `scoreThreshold` | float | 0–1 | Minimum similarity score |
+| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
+| `temperature` | float | 0–2 | Inference randomness |
+| `repeatPenalty` | float | 1–2 | Repeat token penalty |
+| `topP` | float | 0–1 | Nucleus sampling probability mass |
+| `topK` | integer | 1–100 | Top-K token candidates per step |
+
+Settings are persisted to `data/settings.json` and read on every request —
+changes take effect immediately without a service restart.
+
+### Episodes
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /episodes | Paginated episode list across all sessions |
+| DELETE | /episodes/:id | Delete an episode (SQLite + Qdrant) |
+
+**GET /episodes — query params:**
+
+| Param | Default | Description |
+|---|---|---|
+| limit | 20 | Episodes per page |
+| offset | 0 | Pagination offset |
+| q | — | Keyword search (FTS) |

 ---

@@ -158,10 +226,11 @@ are not touched.
 | Method | Path | Description |
 |---|---|---|
 | POST | /episodes | Create episode + auto-embed into Qdrant |
+| GET | /episodes | Paginated episode list across all sessions |
 | GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
 | GET | /episodes/:id | Get episode by ID |
 | GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
-| DELETE | /episodes/:id | Delete an episode |
+| DELETE | /episodes/:id | Delete episode (SQLite + Qdrant cleanup) |

 > Route ordering: `/episodes/search` must be defined before `/episodes/:id`.

@@ -266,10 +335,14 @@ is awkward to encode in a path.
  "prompt": "What is the capital of France?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7,
-  "maxTokens": 1024
+  "maxTokens": 1024,
+  "topP": 0.9,
+  "topK": 40,
+  "repeatPenalty": 1.1
 }
 ```
-All fields except `prompt` are optional.
+All fields except `prompt` are optional. In normal usage these are forwarded
+from orchestration, which reads them from `settings.json`.

 **POST /complete — response:**
 ```json