documentation updated for model inference settings

2026-04-18 06:41:50 -07:00
parent c198a00dde
commit 44989a2b8b
5 changed files with 182 additions and 41 deletions
--- a/docs/services/shared.md
+++ b/docs/services/shared.md
@@ -142,6 +142,9 @@ llama.cpp runtime defaults — used by the llama.cpp inference provider.
 #### `INFERENCE_DEFAULTS`

 Default inference parameters applied when not specified in a request.
+These are used as fallbacks in `resolveOptions()` in both providers.
+Orchestration reads live values from `settings.json` and forwards them
+on every request — these constants are the fallback layer only.

 | Key | Value | Description |
 |---|---|---|
@@ -154,16 +157,22 @@ Default inference parameters applied when not specified in a request.

 #### `ORCHESTRATION`

-Orchestration pipeline defaults.
+Orchestration pipeline defaults. Used as fallback values in
+`config/settings.js` when `settings.json` doesn't contain a key.

 | Key | Value | Description |
 |---|---|---|
 | `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
 | `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
 | `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
+| `TEMPERATURE` | `0.7` | Default inference temperature |
 | `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
 | `SYSTEM_PROMPT` | *(see below)* | Default system prompt |

+> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
+> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
+> since those constants already define the canonical values.
+
 Default system prompt:
 > "You are a helpful, context-aware AI assistant. You have access to memories
 > of past conversations with the user. Use them to provide consistent,