Files
nexusAI/docs/services/retrieval-fusion.md
2026-04-27 07:03:46 -07:00

154 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Retrieval Fusion
**Implementation:** `packages/orchestration-service/src/chat/index.js`
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
## Purpose
Rather than relying solely on Qdrant vector similarity (which finds semantically
related content but misses exact keyword matches) or FTS5 keyword search alone
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
merges the ranked results from both strategies into a single better-ranked list.
Episodes that rank highly in **both** lists score highest. An episode that is
the top semantic match but irrelevant by keyword, or vice versa, scores lower
than one that satisfies both.
## How RRF Works
For each episode `d`, its fused score is:
```
RRF(d) = w_semantic / (k + rank_semantic(d))
+ w_keyword / (k + rank_keyword(d))
```
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
- `k = 60` — smoothing constant (standard; not exposed in settings)
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
Setting a weight to `0` removes that strategy's contribution entirely. Setting
`keywordWeight` to `0` also short-circuits the FTS network call.
## Architecture
Fusion lives in orchestration — the service already coordinates multiple data
sources, and fusion is a retrieval strategy, not a storage concern.
```
getFusedEpisodes()
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
│ (existing path, unchanged)
└── getFTSResults() — memory-service /episodes/search → full rows directly
(skipped entirely if keywordWeight == 0)
fuseEpisodeResults() — pure RRF, no I/O
fusedEpisodes[] — top semanticLimit episodes by RRF score
```
### Data Shape Consistency
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
the same shape — and both must be filtered against `recentIds` first:
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
FTS requests `semanticLimit * 2` results to provide headroom for the
`recentIds` filter without under-serving the fusion.
## FTS Session Scoping
Without scoping, FTS5 searches across all episodes in the database. For
context assembly, results must be constrained to the current session or
project session pool — the same scope used for Qdrant semantic search.
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
optional `sessionIds` array. When provided, the SQL becomes:
```sql
SELECT e.* FROM episodes e
JOIN episodes_fts fts ON e.id = fts.rowid
WHERE episodes_fts MATCH ?
AND e.session_id IN (?, ?, ...)
ORDER BY rank
LIMIT ?
```
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
In orchestration, `ftsSessionIds` is set to:
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
- `[session.id]` — otherwise (single session only)
This mirrors the Qdrant scoping logic exactly.
## `fuseEpisodeResults` — Implementation Detail
```js
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
const k = RETRIEVAL.RRF_K; // 60
const scores = new Map(); // episode.id → { episode, score }
// Score semantic results (already filtered against recentIds)
semanticEps.forEach((ep, i) => {
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
});
// Score + merge keyword results (already filtered against recentIds)
keywordEps.forEach((ep, i) => {
const contrib = keywordWeight / (k + i + 1);
if (scores.has(ep.id)) {
scores.get(ep.id).score += contrib; // appears in both — sum scores
} else if (contrib > 0) {
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
}
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
});
return [...scores.values()]
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map(({ episode }) => episode);
}
```
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
the result set with a score of 0 when `keywordWeight` is 0 — verified by
the test suite.
## Settings
| Setting | Default | Range | Description |
|---|---|---|---|
| `semanticWeight` | 1.0 | 05 | Weight applied to Qdrant semantic results |
| `keywordWeight` | 0 | 05 | Weight applied to FTS5 keyword results. `0` = disabled |
Both are readable via `GET /settings` and writable via `PATCH /settings`
without a service restart. Changes take effect on the next chat request.
**To enable keyword search:**
```bash
curl -X PATCH http://localhost:4000/settings \
-H "Content-Type: application/json" \
-d '{"keywordWeight": 1.0}'
```
**To favour keyword matches over semantic:**
```bash
curl -X PATCH http://localhost:4000/settings \
-H "Content-Type: application/json" \
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
```
## Constants (`packages/shared/src/config/constants.js`)
| Constant | Value | Description |
|---|---|---|
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |