retrieval fusion
This commit is contained in:
153
docs/services/retrieval-fusion.md
Normal file
153
docs/services/retrieval-fusion.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Retrieval Fusion
|
||||
|
||||
**Implementation:** `packages/orchestration-service/src/chat/index.js`
|
||||
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
|
||||
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
|
||||
|
||||
## Purpose
|
||||
|
||||
Rather than relying solely on Qdrant vector similarity (which finds semantically
|
||||
related content but misses exact keyword matches) or FTS5 keyword search alone
|
||||
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
|
||||
merges the ranked results from both strategies into a single better-ranked list.
|
||||
|
||||
Episodes that rank highly in **both** lists score highest. An episode that is
|
||||
the top semantic match but irrelevant by keyword, or vice versa, scores lower
|
||||
than one that satisfies both.
|
||||
|
||||
## How RRF Works
|
||||
|
||||
For each episode `d`, its fused score is:
|
||||
|
||||
```
|
||||
RRF(d) = w_semantic / (k + rank_semantic(d))
|
||||
+ w_keyword / (k + rank_keyword(d))
|
||||
```
|
||||
|
||||
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
|
||||
- `k = 60` — smoothing constant (standard; not exposed in settings)
|
||||
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
|
||||
|
||||
Setting a weight to `0` removes that strategy's contribution entirely. Setting
|
||||
`keywordWeight` to `0` also short-circuits the FTS network call.
|
||||
|
||||
## Architecture
|
||||
|
||||
Fusion lives in orchestration — the service already coordinates multiple data
|
||||
sources, and fusion is a retrieval strategy, not a storage concern.
|
||||
|
||||
```
|
||||
getFusedEpisodes()
|
||||
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
|
||||
│ (existing path, unchanged)
|
||||
└── getFTSResults() — memory-service /episodes/search → full rows directly
|
||||
(skipped entirely if keywordWeight == 0)
|
||||
↓
|
||||
fuseEpisodeResults() — pure RRF, no I/O
|
||||
↓
|
||||
fusedEpisodes[] — top semanticLimit episodes by RRF score
|
||||
```
|
||||
|
||||
### Data Shape Consistency
|
||||
|
||||
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
|
||||
the same shape — and both must be filtered against `recentIds` first:
|
||||
|
||||
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
|
||||
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
|
||||
|
||||
FTS requests `semanticLimit * 2` results to provide headroom for the
|
||||
`recentIds` filter without under-serving the fusion.
|
||||
|
||||
## FTS Session Scoping
|
||||
|
||||
Without scoping, FTS5 searches across all episodes in the database. For
|
||||
context assembly, results must be constrained to the current session or
|
||||
project session pool — the same scope used for Qdrant semantic search.
|
||||
|
||||
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
|
||||
optional `sessionIds` array. When provided, the SQL becomes:
|
||||
|
||||
```sql
|
||||
SELECT e.* FROM episodes e
|
||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||
WHERE episodes_fts MATCH ?
|
||||
AND e.session_id IN (?, ?, ...)
|
||||
ORDER BY rank
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
|
||||
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
|
||||
|
||||
In orchestration, `ftsSessionIds` is set to:
|
||||
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
|
||||
- `[session.id]` — otherwise (single session only)
|
||||
|
||||
This mirrors the Qdrant scoping logic exactly.
|
||||
|
||||
## `fuseEpisodeResults` — Implementation Detail
|
||||
|
||||
```js
|
||||
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||
const k = RETRIEVAL.RRF_K; // 60
|
||||
const scores = new Map(); // episode.id → { episode, score }
|
||||
|
||||
// Score semantic results (already filtered against recentIds)
|
||||
semanticEps.forEach((ep, i) => {
|
||||
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||
});
|
||||
|
||||
// Score + merge keyword results (already filtered against recentIds)
|
||||
keywordEps.forEach((ep, i) => {
|
||||
const contrib = keywordWeight / (k + i + 1);
|
||||
if (scores.has(ep.id)) {
|
||||
scores.get(ep.id).score += contrib; // appears in both — sum scores
|
||||
} else if (contrib > 0) {
|
||||
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
|
||||
}
|
||||
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
|
||||
});
|
||||
|
||||
return [...scores.values()]
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, limit)
|
||||
.map(({ episode }) => episode);
|
||||
}
|
||||
```
|
||||
|
||||
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
|
||||
the result set with a score of 0 when `keywordWeight` is 0 — verified by
|
||||
the test suite.
|
||||
|
||||
## Settings
|
||||
|
||||
| Setting | Default | Range | Description |
|
||||
|---|---|---|---|
|
||||
| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
|
||||
| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
|
||||
|
||||
Both are readable via `GET /settings` and writable via `PATCH /settings`
|
||||
without a service restart. Changes take effect on the next chat request.
|
||||
|
||||
**To enable keyword search:**
|
||||
```bash
|
||||
curl -X PATCH http://localhost:4000/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"keywordWeight": 1.0}'
|
||||
```
|
||||
|
||||
**To favour keyword matches over semantic:**
|
||||
```bash
|
||||
curl -X PATCH http://localhost:4000/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
|
||||
```
|
||||
|
||||
## Constants (`packages/shared/src/config/constants.js`)
|
||||
|
||||
| Constant | Value | Description |
|
||||
|---|---|---|
|
||||
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
|
||||
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
|
||||
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
|
||||
Reference in New Issue
Block a user