retrieval fusion
This commit is contained in:
@@ -72,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
||||
|---|---|---|
|
||||
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||
| `scoreThreshold` | 0.5 | Minimum similarity score for semantic results |
|
||||
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
|
||||
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
|
||||
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||
| `temperature` | 0.7 | Inference temperature |
|
||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||
@@ -101,8 +103,12 @@ difference is how the inference response is delivered to the client.
|
||||
|
||||
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
||||
|
||||
5. **Semantic search** — embed user message, query Qdrant for similar past
|
||||
episodes. Deduplicated against recent episodes. Non-critical.
|
||||
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
|
||||
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
|
||||
Both paths are filtered against `recentIds` before fusion. FTS is scoped
|
||||
to the current session or all project sessions. If `keywordWeight` is `0`,
|
||||
the FTS call is skipped entirely. Non-critical — failures fall back to
|
||||
whichever strategy succeeded.
|
||||
|
||||
6. **Entity search** — query `entities` Qdrant collection filtered by
|
||||
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
||||
@@ -114,8 +120,8 @@ difference is how the inference response is delivered to the client.
|
||||
If no entities were found or the graph call fails, falls back to flat entity
|
||||
list (no edges). Non-critical.
|
||||
|
||||
8. **Prompt assembly** — combine system prompt, graph context, semantic
|
||||
episodes, recent episodes, and user message.
|
||||
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
|
||||
recent episodes, and user message.
|
||||
|
||||
9. **Inference** — send to inference service. `/chat` awaits full response;
|
||||
`/chat/stream` pipes SSE chunks to the client.
|
||||
|
||||
Reference in New Issue
Block a user