retrieval fusion

This commit is contained in:
Storme-bit
2026-04-27 07:03:46 -07:00
parent 27ad614130
commit 055683424d
6 changed files with 188 additions and 14 deletions

View File

@@ -72,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|---|---|---|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
| `semanticLimit` | 5 | Semantic search results injected into prompt |
| `scoreThreshold` | 0.5 | Minimum similarity score for semantic results |
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
| `temperature` | 0.7 | Inference temperature |
| `repeatPenalty` | 1.1 | Repeat token penalty |
@@ -101,8 +103,12 @@ difference is how the inference response is delivered to the client.
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
5. **Semantic search** — embed user message, query Qdrant for similar past
episodes. Deduplicated against recent episodes. Non-critical.
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
Both paths are filtered against `recentIds` before fusion. FTS is scoped
to the current session or all project sessions. If `keywordWeight` is `0`,
the FTS call is skipped entirely. Non-critical — failures fall back to
whichever strategy succeeded.
6. **Entity search** — query `entities` Qdrant collection filtered by
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
@@ -114,8 +120,8 @@ difference is how the inference response is delivered to the client.
If no entities were found or the graph call fails, falls back to flat entity
list (no edges). Non-critical.
8. **Prompt assembly** — combine system prompt, graph context, semantic
episodes, recent episodes, and user message.
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
recent episodes, and user message.
9. **Inference** — send to inference service. `/chat` awaits full response;
`/chat/stream` pipes SSE chunks to the client.