documentation updated for model inference settings

2026-04-18 06:41:50 -07:00
parent c198a00dde
commit 44989a2b8b
5 changed files with 182 additions and 41 deletions
--- a/docs/services/inference-service.md
+++ b/docs/services/inference-service.md
@@ -54,6 +54,11 @@ INFERENCE_URL=http://localhost:8080
 The provider loader throws immediately on an unknown value, preventing silent
 misconfiguration.

+> **LM Studio compatibility note:** LM Studio exposes an OpenAI-compatible
+> `/v1/chat/completions` endpoint with the same request shape as llama.cpp.
+> A future `lmstudio.js` provider would be nearly identical to `llamacpp.js` —
+> only the `BASE_URL` would differ. No architectural changes required.
+
 ## Internal Structure

 ```
@@ -109,14 +114,19 @@ Set `DEFAULT_MODEL` in `.env` to the exact reported name.

 ### Inference Parameters

-| NexusAI option | API field | Default |
-|---|---|---|
-| `temperature` | `temperature` | 0.7 |
-| `maxTokens` | `max_tokens` | 1024 |
-| `topP` | `top_p` | 0.9 |
-| `topK` | `top_k` | 40 |
-| `repeatPenalty` | `repeat_penalty` | 1.1 |
-| `seed` | `seed` | null (random) |
+All parameters are resolved in `resolveOptions()` — falling back to
+`INFERENCE_DEFAULTS` from `@nexusai/shared` if not provided in the request.
+In normal usage, orchestration reads these from `settings.json` and forwards
+them on every request.
+
+| NexusAI option | API field | Default | Description |
+|---|---|---|---|
+| `temperature` | `temperature` | 0.7 | Response randomness (0 = deterministic) |
+| `maxTokens` | `max_tokens` | 1024 | Max tokens to generate |
+| `topP` | `top_p` | 0.9 | Nucleus sampling probability mass |
+| `topK` | `top_k` | 40 | Top-K token candidates per step |
+| `repeatPenalty` | `repeat_penalty` | 1.1 | Penalty for recently used tokens |
+| `seed` | `seed` | null | null = random; integer for reproducible output |

 ## Streaming Response Format