summary system backend implementation
This commit is contained in:
@@ -48,7 +48,7 @@ src/
|
||||
├── routes/
|
||||
│ ├── chat.js # POST /chat and POST /chat/stream
|
||||
│ ├── sessions.js # Session CRUD proxy
|
||||
│ ├── projects.js # Project CRUD proxy
|
||||
│ ├── projects.js # Project CRUD proxy — passes req.body straight through
|
||||
│ ├── episodes.js # Episode list and delete proxy
|
||||
│ ├── settings.js # GET /settings and PATCH /settings
|
||||
│ ├── health.js # GET /health — pings all four services
|
||||
@@ -75,6 +75,7 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||
| `topP` | 0.9 | Nucleus sampling probability mass |
|
||||
| `topK` | 40 | Top-K token candidates per step |
|
||||
| `systemPrompt` | *(ORCHESTRATION.SYSTEM_PROMPT)* | Global system prompt. `null` reverts to hardcoded constant. |
|
||||
|
||||
Defaults are defined in `config/settings.js` and fall back to constants in
|
||||
`@nexusai/shared`. Values saved in `settings.json` take precedence.
|
||||
@@ -91,41 +92,43 @@ difference is how the inference response is delivered to the client.
|
||||
step needed.
|
||||
|
||||
2. **Project context resolution** — if the session has a `project_id`, fetch
|
||||
the project and all its session IDs. Used to scope semantic search. See
|
||||
`memory-isolation.md` for full behaviour.
|
||||
the project and all its session IDs. Used to scope semantic search. The
|
||||
project's `system_prompt` is also read at this step if set.
|
||||
|
||||
3. **Recent episode retrieval** — fetch the most recent episodes for the
|
||||
3. **System prompt resolution** — three-tier hierarchy:
|
||||
- `project.system_prompt` — if the session is in a project and it's set (highest priority)
|
||||
- `settings.systemPrompt` — global setting from `settings.json`
|
||||
- `ORCHESTRATION.SYSTEM_PROMPT` — hardcoded constant in `@nexusai/shared` (last resort)
|
||||
|
||||
4. **Recent episode retrieval** — fetch the most recent episodes for the
|
||||
session (`recentEpisodeLimit`, default 5).
|
||||
|
||||
4. **Semantic search** — embed the user message, query Qdrant for the top
|
||||
5. **Semantic search** — embed the user message, query Qdrant for the top
|
||||
most similar past episodes (`semanticLimit`, `scoreThreshold`). Deduplicated
|
||||
against recent episodes. Non-critical — if it fails, pipeline continues with
|
||||
recency-only context.
|
||||
|
||||
5. **Entity search** — reuse the embedded user message vector to query the
|
||||
`entities` Qdrant collection (score threshold 0.6, limit 5). Returns
|
||||
entity payloads (`name`, `type`, `notes`) directly — no SQLite roundtrip
|
||||
needed. Non-critical — if it fails, pipeline continues without entity context.
|
||||
6. **Entity search** — query the `entities` Qdrant collection filtered by
|
||||
`projectId`. Non-project sessions receive no entity context. Non-critical.
|
||||
|
||||
6. **Prompt assembly** — combine system prompt, entity context, semantic
|
||||
episodes, recent episodes, and user message.
|
||||
7. **Prompt assembly** — combine resolved system prompt, entity context,
|
||||
semantic episodes, recent episodes, and user message.
|
||||
|
||||
7. **Inference** — send to inference service with settings-derived parameters
|
||||
8. **Inference** — send to inference service with settings-derived parameters
|
||||
(temperature, topP, topK, repeatPenalty). `/chat` awaits full response;
|
||||
`/chat/stream` pipes SSE chunks to the client.
|
||||
|
||||
8. **Episode write** — write the exchange back to memory. Fire-and-forget
|
||||
for `/chat`; awaited for `/chat/stream` to ensure the full text is
|
||||
accumulated before saving.
|
||||
9. **Episode write** — write the exchange back to memory with `projectId`.
|
||||
Fire-and-forget for `/chat`; awaited for `/chat/stream`.
|
||||
|
||||
9. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
||||
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
||||
write the result back as `session.name`. Fully fire-and-forget.
|
||||
10. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
||||
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
||||
write the result back as `session.name`. Fully fire-and-forget.
|
||||
|
||||
### Prompt Structure
|
||||
|
||||
```
|
||||
[System prompt]
|
||||
[Resolved system prompt]
|
||||
|
||||
Here is what you know about entities relevant to this conversation:
|
||||
- {name} ({type}): {notes}
|
||||
@@ -175,9 +178,9 @@ is terminated by `res.end()` after the done event.
|
||||
folder for richer metadata (label, description). Returns file size in GB.
|
||||
|
||||
`GET /models/props` fetches directly from llama-server via `LLAMA_SERVER_URL`.
|
||||
Returns `{ contextWindow, modelAlias }`. Used by the client to display
|
||||
read-only context window size and the currently loaded model in the settings
|
||||
panel. Returns `503` if llama-server is unreachable.
|
||||
Returns `{ contextWindow, modelAlias }`. `n_ctx` is at
|
||||
`data.default_generation_settings.n_ctx` in the llama-server response.
|
||||
Returns `503` if llama-server is unreachable.
|
||||
|
||||
## Sessions Route Behaviour
|
||||
|
||||
|
||||
Reference in New Issue
Block a user