retrieval fusion
This commit is contained in:
@@ -73,7 +73,7 @@ service by ID after the vector search.
|
|||||||
|
|
||||||
The core four-service architecture is complete and operational. Key capabilities:
|
The core four-service architecture is complete and operational. Key capabilities:
|
||||||
|
|
||||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
|
||||||
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
||||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||||
|
|||||||
@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
|
|||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
|
||||||
|
| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||||
| `temperature` | float | 0–2 | Inference randomness |
|
| `temperature` | float | 0–2 | Inference randomness |
|
||||||
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||||
|
|||||||
@@ -64,11 +64,11 @@ The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversa
|
|||||||
- [x] Relationship traversal queries
|
- [x] Relationship traversal queries
|
||||||
- [x] Graph-aware context assembly in orchestration
|
- [x] Graph-aware context assembly in orchestration
|
||||||
|
|
||||||
### 2. Retrieval Fusion + Full-Text Search
|
### 2. Retrieval Fusion + Full-Text Search ✅
|
||||||
Multi-strategy retrieval merged into a single ranked result set.
|
Multi-strategy retrieval merged into a single ranked result set.
|
||||||
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||||
- [ ] Configurable weights per retrieval strategy
|
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
|
||||||
- [ ] Score threshold tuning per collection
|
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
|
||||||
|
|
||||||
### 3. Memory Consolidation Lifecycle
|
### 3. Memory Consolidation Lifecycle
|
||||||
Prevents long-term memory degradation and enables compression.
|
Prevents long-term memory degradation and enables compression.
|
||||||
|
|||||||
@@ -72,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||||
| `scoreThreshold` | 0.5 | Minimum similarity score for semantic results |
|
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
|
||||||
|
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||||
| `temperature` | 0.7 | Inference temperature |
|
| `temperature` | 0.7 | Inference temperature |
|
||||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||||
@@ -101,8 +103,12 @@ difference is how the inference response is delivered to the client.
|
|||||||
|
|
||||||
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
||||||
|
|
||||||
5. **Semantic search** — embed user message, query Qdrant for similar past
|
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
|
||||||
episodes. Deduplicated against recent episodes. Non-critical.
|
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
|
||||||
|
Both paths are filtered against `recentIds` before fusion. FTS is scoped
|
||||||
|
to the current session or all project sessions. If `keywordWeight` is `0`,
|
||||||
|
the FTS call is skipped entirely. Non-critical — failures fall back to
|
||||||
|
whichever strategy succeeded.
|
||||||
|
|
||||||
6. **Entity search** — query `entities` Qdrant collection filtered by
|
6. **Entity search** — query `entities` Qdrant collection filtered by
|
||||||
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
||||||
@@ -114,8 +120,8 @@ difference is how the inference response is delivered to the client.
|
|||||||
If no entities were found or the graph call fails, falls back to flat entity
|
If no entities were found or the graph call fails, falls back to flat entity
|
||||||
list (no edges). Non-critical.
|
list (no edges). Non-critical.
|
||||||
|
|
||||||
8. **Prompt assembly** — combine system prompt, graph context, semantic
|
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
|
||||||
episodes, recent episodes, and user message.
|
recent episodes, and user message.
|
||||||
|
|
||||||
9. **Inference** — send to inference service. `/chat` awaits full response;
|
9. **Inference** — send to inference service. `/chat` awaits full response;
|
||||||
`/chat/stream` pipes SSE chunks to the client.
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
|
|||||||
153
docs/services/retrieval-fusion.md
Normal file
153
docs/services/retrieval-fusion.md
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
# Retrieval Fusion
|
||||||
|
|
||||||
|
**Implementation:** `packages/orchestration-service/src/chat/index.js`
|
||||||
|
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
|
||||||
|
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Rather than relying solely on Qdrant vector similarity (which finds semantically
|
||||||
|
related content but misses exact keyword matches) or FTS5 keyword search alone
|
||||||
|
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
|
||||||
|
merges the ranked results from both strategies into a single better-ranked list.
|
||||||
|
|
||||||
|
Episodes that rank highly in **both** lists score highest. An episode that is
|
||||||
|
the top semantic match but irrelevant by keyword, or vice versa, scores lower
|
||||||
|
than one that satisfies both.
|
||||||
|
|
||||||
|
## How RRF Works
|
||||||
|
|
||||||
|
For each episode `d`, its fused score is:
|
||||||
|
|
||||||
|
```
|
||||||
|
RRF(d) = w_semantic / (k + rank_semantic(d))
|
||||||
|
+ w_keyword / (k + rank_keyword(d))
|
||||||
|
```
|
||||||
|
|
||||||
|
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
|
||||||
|
- `k = 60` — smoothing constant (standard; not exposed in settings)
|
||||||
|
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
|
||||||
|
|
||||||
|
Setting a weight to `0` removes that strategy's contribution entirely. Setting
|
||||||
|
`keywordWeight` to `0` also short-circuits the FTS network call.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Fusion lives in orchestration — the service already coordinates multiple data
|
||||||
|
sources, and fusion is a retrieval strategy, not a storage concern.
|
||||||
|
|
||||||
|
```
|
||||||
|
getFusedEpisodes()
|
||||||
|
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
|
||||||
|
│ (existing path, unchanged)
|
||||||
|
└── getFTSResults() — memory-service /episodes/search → full rows directly
|
||||||
|
(skipped entirely if keywordWeight == 0)
|
||||||
|
↓
|
||||||
|
fuseEpisodeResults() — pure RRF, no I/O
|
||||||
|
↓
|
||||||
|
fusedEpisodes[] — top semanticLimit episodes by RRF score
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Shape Consistency
|
||||||
|
|
||||||
|
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
|
||||||
|
the same shape — and both must be filtered against `recentIds` first:
|
||||||
|
|
||||||
|
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
|
||||||
|
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
|
||||||
|
|
||||||
|
FTS requests `semanticLimit * 2` results to provide headroom for the
|
||||||
|
`recentIds` filter without under-serving the fusion.
|
||||||
|
|
||||||
|
## FTS Session Scoping
|
||||||
|
|
||||||
|
Without scoping, FTS5 searches across all episodes in the database. For
|
||||||
|
context assembly, results must be constrained to the current session or
|
||||||
|
project session pool — the same scope used for Qdrant semantic search.
|
||||||
|
|
||||||
|
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
|
||||||
|
optional `sessionIds` array. When provided, the SQL becomes:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
|
WHERE episodes_fts MATCH ?
|
||||||
|
AND e.session_id IN (?, ?, ...)
|
||||||
|
ORDER BY rank
|
||||||
|
LIMIT ?
|
||||||
|
```
|
||||||
|
|
||||||
|
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
|
||||||
|
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
|
||||||
|
|
||||||
|
In orchestration, `ftsSessionIds` is set to:
|
||||||
|
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
|
||||||
|
- `[session.id]` — otherwise (single session only)
|
||||||
|
|
||||||
|
This mirrors the Qdrant scoping logic exactly.
|
||||||
|
|
||||||
|
## `fuseEpisodeResults` — Implementation Detail
|
||||||
|
|
||||||
|
```js
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K; // 60
|
||||||
|
const scores = new Map(); // episode.id → { episode, score }
|
||||||
|
|
||||||
|
// Score semantic results (already filtered against recentIds)
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
|
||||||
|
// Score + merge keyword results (already filtered against recentIds)
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib; // appears in both — sum scores
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
|
||||||
|
}
|
||||||
|
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
|
||||||
|
});
|
||||||
|
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit)
|
||||||
|
.map(({ episode }) => episode);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
|
||||||
|
the result set with a score of 0 when `keywordWeight` is 0 — verified by
|
||||||
|
the test suite.
|
||||||
|
|
||||||
|
## Settings
|
||||||
|
|
||||||
|
| Setting | Default | Range | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
|
||||||
|
|
||||||
|
Both are readable via `GET /settings` and writable via `PATCH /settings`
|
||||||
|
without a service restart. Changes take effect on the next chat request.
|
||||||
|
|
||||||
|
**To enable keyword search:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"keywordWeight": 1.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**To favour keyword matches over semantic:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Constants (`packages/shared/src/config/constants.js`)
|
||||||
|
|
||||||
|
| Constant | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
|
||||||
|
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
|
||||||
|
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
|
||||||
@@ -24,9 +24,10 @@ Default port: **4000**. Depends on memory-service, embedding-service, inference-
|
|||||||
- No project: `must: [sessionId == this session]`
|
- No project: `must: [sessionId == this session]`
|
||||||
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
|
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
|
||||||
- Dedup against recent episode IDs before including.
|
- Dedup against recent episode IDs before including.
|
||||||
5. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
|
5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
|
||||||
6. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
|
6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
|
||||||
7. Build prompt in this fixed order: **system prompt → graph context → semantic episodes → recent episodes → user message → "Assistant:"**
|
7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
|
||||||
|
8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
|
||||||
|
|
||||||
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
|
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
|
||||||
|
|
||||||
@@ -100,6 +101,18 @@ When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`,
|
|||||||
|
|
||||||
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
|
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
|
||||||
|
|
||||||
|
## Retrieval Fusion (`src/chat/index.js`)
|
||||||
|
|
||||||
|
Three functions handle fusion — all pure or lightly async, all non-critical:
|
||||||
|
|
||||||
|
- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
|
||||||
|
- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
|
||||||
|
- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
|
||||||
|
|
||||||
|
FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
|
||||||
|
|
||||||
|
> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
|
||||||
|
|
||||||
## Graph Service Client (`src/services/graph.js`)
|
## Graph Service Client (`src/services/graph.js`)
|
||||||
|
|
||||||
Thin HTTP client for memory-service graph endpoints. One function:
|
Thin HTTP client for memory-service graph endpoints. One function:
|
||||||
|
|||||||
Reference in New Issue
Block a user