documentation updated for model inference settings
This commit is contained in:
@@ -54,6 +54,11 @@ INFERENCE_URL=http://localhost:8080
|
||||
The provider loader throws immediately on an unknown value, preventing silent
|
||||
misconfiguration.
|
||||
|
||||
> **LM Studio compatibility note:** LM Studio exposes an OpenAI-compatible
|
||||
> `/v1/chat/completions` endpoint with the same request shape as llama.cpp.
|
||||
> A future `lmstudio.js` provider would be nearly identical to `llamacpp.js` —
|
||||
> only the `BASE_URL` would differ. No architectural changes required.
|
||||
|
||||
## Internal Structure
|
||||
|
||||
```
|
||||
@@ -109,14 +114,19 @@ Set `DEFAULT_MODEL` in `.env` to the exact reported name.
|
||||
|
||||
### Inference Parameters
|
||||
|
||||
| NexusAI option | API field | Default |
|
||||
|---|---|---|
|
||||
| `temperature` | `temperature` | 0.7 |
|
||||
| `maxTokens` | `max_tokens` | 1024 |
|
||||
| `topP` | `top_p` | 0.9 |
|
||||
| `topK` | `top_k` | 40 |
|
||||
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
||||
| `seed` | `seed` | null (random) |
|
||||
All parameters are resolved in `resolveOptions()` — falling back to
|
||||
`INFERENCE_DEFAULTS` from `@nexusai/shared` if not provided in the request.
|
||||
In normal usage, orchestration reads these from `settings.json` and forwards
|
||||
them on every request.
|
||||
|
||||
| NexusAI option | API field | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `temperature` | `temperature` | 0.7 | Response randomness (0 = deterministic) |
|
||||
| `maxTokens` | `max_tokens` | 1024 | Max tokens to generate |
|
||||
| `topP` | `top_p` | 0.9 | Nucleus sampling probability mass |
|
||||
| `topK` | `top_k` | 40 | Top-K token candidates per step |
|
||||
| `repeatPenalty` | `repeat_penalty` | 1.1 | Penalty for recently used tokens |
|
||||
| `seed` | `seed` | null | null = random; integer for reproducible output |
|
||||
|
||||
## Streaming Response Format
|
||||
|
||||
|
||||
Reference in New Issue
Block a user