documentation updated for model inference settings
This commit is contained in:
@@ -30,7 +30,10 @@ here for reference and direct debugging use.
|
||||
"temperature": 0.7
|
||||
}
|
||||
```
|
||||
`model` and `temperature` are optional.
|
||||
`model` and `temperature` are optional. Inference parameters (temperature,
|
||||
topP, topK, repeatPenalty) are read from `settings.json` on every request —
|
||||
the request body values are not used for these; they are controlled via
|
||||
`PATCH /settings`.
|
||||
|
||||
**POST /chat — response:**
|
||||
```json
|
||||
@@ -110,9 +113,74 @@ Returns `201` with the created project object.
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /models | Available models from `models.json` manifest |
|
||||
| GET | /models | Available models scanned live from models folder |
|
||||
| GET | /models/props | Live model props from llama-server (context window, loaded model) |
|
||||
|
||||
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
|
||||
**GET /models** — returns array:
|
||||
```json
|
||||
[{ "value": "model-name.gguf", "label": "Display Name", "description": null, "size": "19.7 GB" }]
|
||||
```
|
||||
Scans `.gguf` files live from `modelsFolderPath` (set in settings). Merges
|
||||
with `models.json` in the same folder for label and description metadata.
|
||||
|
||||
**GET /models/props** — returns:
|
||||
```json
|
||||
{ "contextWindow": 64000, "modelAlias": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf" }
|
||||
```
|
||||
Fetches directly from llama-server `/props`. Returns `503` if llama-server
|
||||
is unreachable.
|
||||
|
||||
### Settings
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /settings | Get all current settings |
|
||||
| PATCH | /settings | Update one or more settings |
|
||||
|
||||
**GET /settings — response:**
|
||||
```json
|
||||
{
|
||||
"recentEpisodeLimit": 9,
|
||||
"semanticLimit": 5,
|
||||
"scoreThreshold": 0.6,
|
||||
"modelsFolderPath": "/mnt/nexus-models",
|
||||
"temperature": 0.65,
|
||||
"repeatPenalty": 1.3,
|
||||
"topP": 0.9,
|
||||
"topK": 41
|
||||
}
|
||||
```
|
||||
|
||||
**PATCH /settings — body:** any subset of the above fields.
|
||||
|
||||
| Field | Type | Range | Description |
|
||||
|---|---|---|---|
|
||||
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
||||
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||
| `temperature` | float | 0–2 | Inference randomness |
|
||||
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||
| `topP` | float | 0–1 | Nucleus sampling probability mass |
|
||||
| `topK` | integer | 1–100 | Top-K token candidates per step |
|
||||
|
||||
Settings are persisted to `data/settings.json` and read on every request —
|
||||
changes take effect immediately without a service restart.
|
||||
|
||||
### Episodes
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /episodes | Paginated episode list across all sessions |
|
||||
| DELETE | /episodes/:id | Delete an episode (SQLite + Qdrant) |
|
||||
|
||||
**GET /episodes — query params:**
|
||||
|
||||
| Param | Default | Description |
|
||||
|---|---|---|
|
||||
| limit | 20 | Episodes per page |
|
||||
| offset | 0 | Pagination offset |
|
||||
| q | — | Keyword search (FTS) |
|
||||
|
||||
---
|
||||
|
||||
@@ -158,10 +226,11 @@ are not touched.
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| POST | /episodes | Create episode + auto-embed into Qdrant |
|
||||
| GET | /episodes | Paginated episode list across all sessions |
|
||||
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
|
||||
| GET | /episodes/:id | Get episode by ID |
|
||||
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
|
||||
| DELETE | /episodes/:id | Delete an episode |
|
||||
| DELETE | /episodes/:id | Delete episode (SQLite + Qdrant cleanup) |
|
||||
|
||||
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
|
||||
|
||||
@@ -266,10 +335,14 @@ is awkward to encode in a path.
|
||||
"prompt": "What is the capital of France?",
|
||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||
"temperature": 0.7,
|
||||
"maxTokens": 1024
|
||||
"maxTokens": 1024,
|
||||
"topP": 0.9,
|
||||
"topK": 40,
|
||||
"repeatPenalty": 1.1
|
||||
}
|
||||
```
|
||||
All fields except `prompt` are optional.
|
||||
All fields except `prompt` are optional. In normal usage these are forwarded
|
||||
from orchestration, which reads them from `settings.json`.
|
||||
|
||||
**POST /complete — response:**
|
||||
```json
|
||||
|
||||
Reference in New Issue
Block a user