Files
nexusAI/docs/services/retrieval-fusion.md
2026-04-27 07:03:46 -07:00

5.6 KiB
Raw Permalink Blame History

Retrieval Fusion

Implementation: packages/orchestration-service/src/chat/index.js
FTS scoping: packages/memory-service/src/episodic/index.js, src/index.js
Settings: semanticWeight, keywordWeight via PATCH /settings

Purpose

Rather than relying solely on Qdrant vector similarity (which finds semantically related content but misses exact keyword matches) or FTS5 keyword search alone (which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF) merges the ranked results from both strategies into a single better-ranked list.

Episodes that rank highly in both lists score highest. An episode that is the top semantic match but irrelevant by keyword, or vice versa, scores lower than one that satisfies both.

How RRF Works

For each episode d, its fused score is:

RRF(d) = w_semantic / (k + rank_semantic(d))
        + w_keyword  / (k + rank_keyword(d))
  • rank_i(d) — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
  • k = 60 — smoothing constant (standard; not exposed in settings)
  • w_semantic, w_keyword — user-tunable weights (both default-sourced from RETRIEVAL constants)

Setting a weight to 0 removes that strategy's contribution entirely. Setting keywordWeight to 0 also short-circuits the FTS network call.

Architecture

Fusion lives in orchestration — the service already coordinates multiple data sources, and fusion is a retrieval strategy, not a storage concern.

getFusedEpisodes()
├── getSemanticEpisodes()     — Qdrant embed+search → fetch full rows by ID
│   (existing path, unchanged)
└── getFTSResults()           — memory-service /episodes/search → full rows directly
    (skipped entirely if keywordWeight == 0)
         ↓
fuseEpisodeResults()          — pure RRF, no I/O
         ↓
fusedEpisodes[]               — top semanticLimit episodes by RRF score

Data Shape Consistency

Both sides must enter fusion as Episode[] — full SQLite row objects with the same shape — and both must be filtered against recentIds first:

  • Semantic path: recentIds filter applied before getEpisodeById fetch (existing behaviour)
  • FTS path: full rows returned directly; recentIds filter applied in getFusedEpisodes after receiving them

FTS requests semanticLimit * 2 results to provide headroom for the recentIds filter without under-serving the fusion.

FTS Session Scoping

Without scoping, FTS5 searches across all episodes in the database. For context assembly, results must be constrained to the current session or project session pool — the same scope used for Qdrant semantic search.

searchEpisodes(query, limit, sessionIds) in memory-service accepts an optional sessionIds array. When provided, the SQL becomes:

SELECT e.* FROM episodes e
JOIN episodes_fts fts ON e.id = fts.rowid
WHERE episodes_fts MATCH ?
AND e.session_id IN (?, ?, ...)
ORDER BY rank
LIMIT ?

The HTTP endpoint GET /episodes/search accepts sessionIds as a comma-separated query param: ?q=hello&sessionIds=1,2,3.

In orchestration, ftsSessionIds is set to:

  • projectSessionIds (all sessions in the project) — if the session belongs to a project
  • [session.id] — otherwise (single session only)

This mirrors the Qdrant scoping logic exactly.

fuseEpisodeResults — Implementation Detail

function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
    const k = RETRIEVAL.RRF_K; // 60
    const scores = new Map();  // episode.id → { episode, score }

    // Score semantic results (already filtered against recentIds)
    semanticEps.forEach((ep, i) => {
        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
    });

    // Score + merge keyword results (already filtered against recentIds)
    keywordEps.forEach((ep, i) => {
        const contrib = keywordWeight / (k + i + 1);
        if (scores.has(ep.id)) {
            scores.get(ep.id).score += contrib;   // appears in both — sum scores
        } else if (contrib > 0) {
            scores.set(ep.id, { episode: ep, score: contrib });  // FTS-only episode
        }
        // contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
    });

    return [...scores.values()]
        .sort((a, b) => b.score - a.score)
        .slice(0, limit)
        .map(({ episode }) => episode);
}

The else if (contrib > 0) guard prevents FTS-only episodes from entering the result set with a score of 0 when keywordWeight is 0 — verified by the test suite.

Settings

Setting Default Range Description
semanticWeight 1.0 05 Weight applied to Qdrant semantic results
keywordWeight 0 05 Weight applied to FTS5 keyword results. 0 = disabled

Both are readable via GET /settings and writable via PATCH /settings without a service restart. Changes take effect on the next chat request.

To enable keyword search:

curl -X PATCH http://localhost:4000/settings \
  -H "Content-Type: application/json" \
  -d '{"keywordWeight": 1.0}'

To favour keyword matches over semantic:

curl -X PATCH http://localhost:4000/settings \
  -H "Content-Type: application/json" \
  -d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'

Constants (packages/shared/src/config/constants.js)

Constant Value Description
RETRIEVAL.RRF_K 60 RRF smoothing constant — not exposed in settings
RETRIEVAL.SEMANTIC_WEIGHT 1.0 Default semantic weight
RETRIEVAL.KEYWORD_WEIGHT 0 Default keyword weight (off)