retrieval fusion

2026-04-27 07:03:46 -07:00
parent 27ad614130
commit 055683424d
6 changed files with 188 additions and 14 deletions
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -73,7 +73,7 @@ service by ID after the vector search.

 The core four-service architecture is complete and operational. Key capabilities:

- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
+- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
 - **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
 - **Projects** — sessions grouped with shared or isolated memory pools
 - **Auto-naming** — sessions named automatically from first exchange via inference
--- a/docs/reference/API-routes.md
+++ b/docs/reference/API-routes.md
@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
 |---|---|---|---|
 | `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
 | `semanticLimit` | integer | 1–20 | Max semantic search results |
-| `scoreThreshold` | float | 0–1 | Minimum similarity score |
+| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
+| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | string | — | Path to folder containing .gguf files |
 | `temperature` | float | 0–2 | Inference randomness |
 | `repeatPenalty` | float | 1–2 | Repeat token penalty |
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -64,11 +64,11 @@ The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversa
 - [x] Relationship traversal queries
 - [x] Graph-aware context assembly in orchestration

-### 2. Retrieval Fusion + Full-Text Search
+### 2. Retrieval Fusion + Full-Text Search ✅
 Multi-strategy retrieval merged into a single ranked result set.
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- [ ] Configurable weights per retrieval strategy
- [ ] Score threshold tuning per collection
+- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
+- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
+- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)

 ### 3. Memory Consolidation Lifecycle
 Prevents long-term memory degradation and enables compression.
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -72,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
 |---|---|---|
 | `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
 | `semanticLimit` | 5 | Semantic search results injected into prompt |
-| `scoreThreshold` | 0.5 | Minimum similarity score for semantic results |
+| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
+| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
 | `temperature` | 0.7 | Inference temperature |
 | `repeatPenalty` | 1.1 | Repeat token penalty |
@@ -101,8 +103,12 @@ difference is how the inference response is delivered to the client.

 4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).

-5. **Semantic search** — embed user message, query Qdrant for similar past
-   episodes. Deduplicated against recent episodes. Non-critical.
+5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
+   search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
+   Both paths are filtered against `recentIds` before fusion. FTS is scoped
+   to the current session or all project sessions. If `keywordWeight` is `0`,
+   the FTS call is skipped entirely. Non-critical — failures fall back to
+   whichever strategy succeeded.

 6. **Entity search** — query `entities` Qdrant collection filtered by
   `projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
@@ -114,8 +120,8 @@ difference is how the inference response is delivered to the client.
   If no entities were found or the graph call fails, falls back to flat entity
   list (no edges). Non-critical.

-8. **Prompt assembly** — combine system prompt, graph context, semantic
-   episodes, recent episodes, and user message.
+8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
+   recent episodes, and user message.

 9. **Inference** — send to inference service. `/chat` awaits full response;
   `/chat/stream` pipes SSE chunks to the client.
--- a/docs/services/retrieval-fusion.md
+++ b/docs/services/retrieval-fusion.md
@@ -0,0 +1,153 @@
+# Retrieval Fusion
+
+**Implementation:** `packages/orchestration-service/src/chat/index.js`  
+**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`  
+**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
+
+## Purpose
+
+Rather than relying solely on Qdrant vector similarity (which finds semantically
+related content but misses exact keyword matches) or FTS5 keyword search alone
+(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
+merges the ranked results from both strategies into a single better-ranked list.
+
+Episodes that rank highly in **both** lists score highest. An episode that is
+the top semantic match but irrelevant by keyword, or vice versa, scores lower
+than one that satisfies both.
+
+## How RRF Works
+
+For each episode `d`, its fused score is:
+
+```
+RRF(d) = w_semantic / (k + rank_semantic(d))
+        + w_keyword  / (k + rank_keyword(d))
+```
+
+- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
+- `k = 60` — smoothing constant (standard; not exposed in settings)
+- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
+
+Setting a weight to `0` removes that strategy's contribution entirely. Setting
+`keywordWeight` to `0` also short-circuits the FTS network call.
+
+## Architecture
+
+Fusion lives in orchestration — the service already coordinates multiple data
+sources, and fusion is a retrieval strategy, not a storage concern.
+
+```
+getFusedEpisodes()
+├── getSemanticEpisodes()     — Qdrant embed+search → fetch full rows by ID
+│   (existing path, unchanged)
+└── getFTSResults()           — memory-service /episodes/search → full rows directly
+    (skipped entirely if keywordWeight == 0)
+         ↓
+fuseEpisodeResults()          — pure RRF, no I/O
+         ↓
+fusedEpisodes[]               — top semanticLimit episodes by RRF score
+```
+
+### Data Shape Consistency
+
+Both sides must enter fusion as `Episode[]` — full SQLite row objects with
+the same shape — and both must be filtered against `recentIds` first:
+
+- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
+- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
+
+FTS requests `semanticLimit * 2` results to provide headroom for the
+`recentIds` filter without under-serving the fusion.
+
+## FTS Session Scoping
+
+Without scoping, FTS5 searches across all episodes in the database. For
+context assembly, results must be constrained to the current session or
+project session pool — the same scope used for Qdrant semantic search.
+
+`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
+optional `sessionIds` array. When provided, the SQL becomes:
+
+```sql
+SELECT e.* FROM episodes e
+JOIN episodes_fts fts ON e.id = fts.rowid
+WHERE episodes_fts MATCH ?
+AND e.session_id IN (?, ?, ...)
+ORDER BY rank
+LIMIT ?
+```
+
+The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
+comma-separated query param: `?q=hello&sessionIds=1,2,3`.
+
+In orchestration, `ftsSessionIds` is set to:
+- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
+- `[session.id]` — otherwise (single session only)
+
+This mirrors the Qdrant scoping logic exactly.
+
+## `fuseEpisodeResults` — Implementation Detail
+
+```js
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K; // 60
+    const scores = new Map();  // episode.id → { episode, score }
+
+    // Score semantic results (already filtered against recentIds)
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+
+    // Score + merge keyword results (already filtered against recentIds)
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;   // appears in both — sum scores
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });  // FTS-only episode
+        }
+        // contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
+    });
+
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit)
+        .map(({ episode }) => episode);
+}
+```
+
+The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
+the result set with a score of 0 when `keywordWeight` is 0 — verified by
+the test suite.
+
+## Settings
+
+| Setting | Default | Range | Description |
+|---|---|---|---|
+| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
+| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
+
+Both are readable via `GET /settings` and writable via `PATCH /settings`
+without a service restart. Changes take effect on the next chat request.
+
+**To enable keyword search:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"keywordWeight": 1.0}'
+```
+
+**To favour keyword matches over semantic:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
+```
+
+## Constants (`packages/shared/src/config/constants.js`)
+
+| Constant | Value | Description |
+|---|---|---|
+| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
+| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
+| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |