# Knowledge Graph **Location:** `packages/memory-service/src/graph/index.js` **Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships` **Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors` **Consumed by:** Orchestration service context assembly ## Purpose The knowledge graph transforms NexusAI from "remembers conversations" to "understands relationships between things." Rather than injecting a flat list of entity facts into every prompt, orchestration now retrieves a 1-hop subgraph of connected entities and their relationships, giving the model structured, linked knowledge about people, projects, technologies, and concepts that have appeared across conversations. ## Schema ### `entity_episodes` (join table) Tracks which episodes contributed to each entity's knowledge. Defined in `schema.js` — exists on all installs. ```sql CREATE TABLE IF NOT EXISTS entity_episodes ( entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE, episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE, PRIMARY KEY (entity_id, episode_id) ); ``` Both FKs cascade on delete — removing an entity or episode automatically cleans up its join rows. ### New columns on `entities` Added via migration in `db/index.js`: | Column | Type | Default | Description | |---|---|---|---| | `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations | | `confidence` | REAL | 1.0 | Reserved for future confidence scoring | | `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` | | `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit | ### New columns on `relationships` | Column | Type | Default | Description | |---|---|---|---| | `mention_count` | INTEGER | 1 | How many times this edge has been extracted | | `notes` | TEXT | NULL | Relationship context sentence from extraction | ## Entity Promotion Model Entities are not created equal — some are mentioned once in passing, others recur across many conversations. `mention_count` is the signal: - Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated. - `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic. - Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal. The same pattern applies to relationships — `mention_count` rises each time the same `(from_id, to_id, label)` triple is extracted. ## Graph Traversal `src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE` CTE support. No external graph database is needed. ### `getNeighborhood(entityId, depth)` Traverses the graph from a single entity, following edges in **both directions**, up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`. Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3. **SQLite query:** ```sql WITH RECURSIVE traverse(entity_id, depth) AS ( SELECT ?, 0 UNION SELECT CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END, t.depth + 1 FROM relationships r JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id) WHERE t.depth < ? ) SELECT DISTINCT entity_id FROM traverse ``` `UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles cycles — a node already in the traversal set is not re-visited. After collecting node IDs, two follow-up queries fetch: - All entity rows for those IDs - All relationship rows where both `from_id` and `to_id` are in the node set This ensures edges between neighbors are included even if they aren't on the traversal path from the seed. ### `getEntityNeighbors(entityIds[])` Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs (the results of Qdrant semantic search), returns the combined 1-hop subgraph. 1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id` 2. Deduplicates seeds + neighbors using a JavaScript `Set` 3. Fetches all entity rows and all relationship rows within the combined node set This is intentionally simpler than the recursive version — orchestration always uses depth=1, and the bulk query avoids N separate CTE calls. ## Graph-Aware Context Assembly Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the graph at step 7 of the chat pipeline: 1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload 2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service 3. The returned `{ nodes, edges }` is passed to `formatGraphContext()` 4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges ### Prompt Format `formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as: ``` Here is what you know about entities relevant to this conversation and their connections: - Alice (person): software engineer working on NexusAI → works_on NexusAI (project) → knows Bob (person) - NexusAI (project): AI assistant framework - Bob (person): Alice's colleague ``` - One line per node: `- {name} ({type}): {notes}` - Outbound edges indented below: ` → {label} {target_name} ({target_type})` - Nodes with only inbound edges (pulled in as neighbors) appear without connection lines - Only outbound edges are shown — each relationship appears once, from the `from_id` side ## Project Scoping The knowledge graph respects project boundaries at the **entry point**, not during traversal: - Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds - Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project - This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge Entities are tagged with `projectId` in the Qdrant payload at extraction time. Entities extracted from non-project sessions have `projectId: null` and only appear in unfiltered global searches. ## API Reference ### `GET /graph/neighborhood/:entityId` Returns the neighborhood of a single entity. **Query params:** | Param | Default | Max | Description | |---|---|---|---| | `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth | **Response:** ```json { "entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 }, "neighborhood": { "nodes": [ { "id": 5, "name": "Alice", "type": "person", "notes": "..." }, { "id": 8, "name": "NexusAI", "type": "project", "notes": "..." } ], "edges": [ { "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 } ] } } ``` Returns 404 if the entity does not exist. ### `POST /graph/neighbors` Bulk 1-hop neighborhood for a set of entity IDs. Used internally by orchestration — not intended for direct client use. **Request body:** ```json { "entityIds": [5, 8, 12] } ``` **Response:** ```json { "nodes": [ ...entity objects... ], "edges": [ ...relationship objects... ] } ``` Returns 400 if `entityIds` is missing or empty. ## Constants (`packages/shared/src/config/constants.js`) | Constant | Value | Description | |---|---|---| | `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established | | `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries | | `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search | | `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |