documentation updated for model inference settings

This commit is contained in:
Storme-bit
2026-04-18 06:41:50 -07:00
parent c198a00dde
commit 44989a2b8b
5 changed files with 182 additions and 41 deletions

View File

@@ -54,6 +54,11 @@ INFERENCE_URL=http://localhost:8080
The provider loader throws immediately on an unknown value, preventing silent
misconfiguration.
> **LM Studio compatibility note:** LM Studio exposes an OpenAI-compatible
> `/v1/chat/completions` endpoint with the same request shape as llama.cpp.
> A future `lmstudio.js` provider would be nearly identical to `llamacpp.js` —
> only the `BASE_URL` would differ. No architectural changes required.
## Internal Structure
```
@@ -109,14 +114,19 @@ Set `DEFAULT_MODEL` in `.env` to the exact reported name.
### Inference Parameters
| NexusAI option | API field | Default |
|---|---|---|
| `temperature` | `temperature` | 0.7 |
| `maxTokens` | `max_tokens` | 1024 |
| `topP` | `top_p` | 0.9 |
| `topK` | `top_k` | 40 |
| `repeatPenalty` | `repeat_penalty` | 1.1 |
| `seed` | `seed` | null (random) |
All parameters are resolved in `resolveOptions()` — falling back to
`INFERENCE_DEFAULTS` from `@nexusai/shared` if not provided in the request.
In normal usage, orchestration reads these from `settings.json` and forwards
them on every request.
| NexusAI option | API field | Default | Description |
|---|---|---|---|
| `temperature` | `temperature` | 0.7 | Response randomness (0 = deterministic) |
| `maxTokens` | `max_tokens` | 1024 | Max tokens to generate |
| `topP` | `top_p` | 0.9 | Nucleus sampling probability mass |
| `topK` | `top_k` | 40 | Top-K token candidates per step |
| `repeatPenalty` | `repeat_penalty` | 1.1 | Penalty for recently used tokens |
| `seed` | `seed` | null | null = random; integer for reproducible output |
## Streaming Response Format