214 lines
8.0 KiB
Markdown
214 lines
8.0 KiB
Markdown
# Knowledge Graph
|
|
|
|
**Location:** `packages/memory-service/src/graph/index.js`
|
|
**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`
|
|
**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`
|
|
**Consumed by:** Orchestration service context assembly
|
|
|
|
## Purpose
|
|
|
|
The knowledge graph transforms NexusAI from "remembers conversations" to
|
|
"understands relationships between things." Rather than injecting a flat
|
|
list of entity facts into every prompt, orchestration now retrieves a
|
|
1-hop subgraph of connected entities and their relationships, giving the
|
|
model structured, linked knowledge about people, projects, technologies,
|
|
and concepts that have appeared across conversations.
|
|
|
|
## Schema
|
|
|
|
### `entity_episodes` (join table)
|
|
|
|
Tracks which episodes contributed to each entity's knowledge. Defined in
|
|
`schema.js` — exists on all installs.
|
|
|
|
```sql
|
|
CREATE TABLE IF NOT EXISTS entity_episodes (
|
|
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
|
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
|
PRIMARY KEY (entity_id, episode_id)
|
|
);
|
|
```
|
|
|
|
Both FKs cascade on delete — removing an entity or episode automatically
|
|
cleans up its join rows.
|
|
|
|
### New columns on `entities`
|
|
|
|
Added via migration in `db/index.js`:
|
|
|
|
| Column | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
|
|
| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
|
|
| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
|
|
| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
|
|
|
|
### New columns on `relationships`
|
|
|
|
| Column | Type | Default | Description |
|
|
|---|---|---|---|
|
|
| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
|
|
| `notes` | TEXT | NULL | Relationship context sentence from extraction |
|
|
|
|
## Entity Promotion Model
|
|
|
|
Entities are not created equal — some are mentioned once in passing, others
|
|
recur across many conversations. `mention_count` is the signal:
|
|
|
|
- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
|
|
- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
|
|
- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
|
|
|
|
The same pattern applies to relationships — `mention_count` rises each time
|
|
the same `(from_id, to_id, label)` triple is extracted.
|
|
|
|
## Graph Traversal
|
|
|
|
`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
|
|
CTE support. No external graph database is needed.
|
|
|
|
### `getNeighborhood(entityId, depth)`
|
|
|
|
Traverses the graph from a single entity, following edges in **both directions**,
|
|
up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
|
|
|
|
Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
|
|
|
|
**SQLite query:**
|
|
|
|
```sql
|
|
WITH RECURSIVE traverse(entity_id, depth) AS (
|
|
SELECT ?, 0
|
|
UNION
|
|
SELECT
|
|
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
|
t.depth + 1
|
|
FROM relationships r
|
|
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
|
WHERE t.depth < ?
|
|
)
|
|
SELECT DISTINCT entity_id FROM traverse
|
|
```
|
|
|
|
`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
|
|
cycles — a node already in the traversal set is not re-visited.
|
|
|
|
After collecting node IDs, two follow-up queries fetch:
|
|
- All entity rows for those IDs
|
|
- All relationship rows where both `from_id` and `to_id` are in the node set
|
|
|
|
This ensures edges between neighbors are included even if they aren't on the
|
|
traversal path from the seed.
|
|
|
|
### `getEntityNeighbors(entityIds[])`
|
|
|
|
Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
|
|
(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
|
|
|
|
1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
|
|
2. Deduplicates seeds + neighbors using a JavaScript `Set`
|
|
3. Fetches all entity rows and all relationship rows within the combined node set
|
|
|
|
This is intentionally simpler than the recursive version — orchestration always
|
|
uses depth=1, and the bulk query avoids N separate CTE calls.
|
|
|
|
## Graph-Aware Context Assembly
|
|
|
|
Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
|
|
graph at step 7 of the chat pipeline:
|
|
|
|
1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
|
|
2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
|
|
3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
|
|
4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
|
|
|
|
### Prompt Format
|
|
|
|
`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
|
|
|
|
```
|
|
Here is what you know about entities relevant to this conversation and their connections:
|
|
- Alice (person): software engineer working on NexusAI
|
|
→ works_on NexusAI (project)
|
|
→ knows Bob (person)
|
|
- NexusAI (project): AI assistant framework
|
|
- Bob (person): Alice's colleague
|
|
```
|
|
|
|
- One line per node: `- {name} ({type}): {notes}`
|
|
- Outbound edges indented below: ` → {label} {target_name} ({target_type})`
|
|
- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
|
|
- Only outbound edges are shown — each relationship appears once, from the `from_id` side
|
|
|
|
## Project Scoping
|
|
|
|
The knowledge graph respects project boundaries at the **entry point**, not
|
|
during traversal:
|
|
|
|
- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
|
|
- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
|
|
- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
|
|
|
|
Entities are tagged with `projectId` in the Qdrant payload at extraction time.
|
|
Entities extracted from non-project sessions have `projectId: null` and only
|
|
appear in unfiltered global searches.
|
|
|
|
## API Reference
|
|
|
|
### `GET /graph/neighborhood/:entityId`
|
|
|
|
Returns the neighborhood of a single entity.
|
|
|
|
**Query params:**
|
|
|
|
| Param | Default | Max | Description |
|
|
|---|---|---|---|
|
|
| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
|
|
"neighborhood": {
|
|
"nodes": [
|
|
{ "id": 5, "name": "Alice", "type": "person", "notes": "..." },
|
|
{ "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
|
|
],
|
|
"edges": [
|
|
{ "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
Returns 404 if the entity does not exist.
|
|
|
|
### `POST /graph/neighbors`
|
|
|
|
Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
|
|
orchestration — not intended for direct client use.
|
|
|
|
**Request body:**
|
|
```json
|
|
{ "entityIds": [5, 8, 12] }
|
|
```
|
|
|
|
**Response:**
|
|
```json
|
|
{
|
|
"nodes": [ ...entity objects... ],
|
|
"edges": [ ...relationship objects... ]
|
|
}
|
|
```
|
|
|
|
Returns 400 if `entityIds` is missing or empty.
|
|
|
|
## Constants (`packages/shared/src/config/constants.js`)
|
|
|
|
| Constant | Value | Description |
|
|
|---|---|---|
|
|
| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
|
|
| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
|
|
| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
|
|
| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
|