Files

Storme-bit 1a97b19280 roadmap phase 1 complete

2026-04-27 03:10:39 -07:00

8.0 KiB

Raw Blame History

Knowledge Graph

Location: packages/memory-service/src/graph/index.js
Schema additions: entity_episodes table; new columns on entities and relationships
Exposed via: GET /graph/neighborhood/:entityId, POST /graph/neighbors
Consumed by: Orchestration service context assembly

Purpose

The knowledge graph transforms NexusAI from "remembers conversations" to "understands relationships between things." Rather than injecting a flat list of entity facts into every prompt, orchestration now retrieves a 1-hop subgraph of connected entities and their relationships, giving the model structured, linked knowledge about people, projects, technologies, and concepts that have appeared across conversations.

Schema

`entity_episodes` (join table)

Tracks which episodes contributed to each entity's knowledge. Defined in schema.js — exists on all installs.

CREATE TABLE IF NOT EXISTS entity_episodes (
  entity_id  INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
  episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
  PRIMARY KEY (entity_id, episode_id)
);

Both FKs cascade on delete — removing an entity or episode automatically cleans up its join rows.

New columns on `entities`

Added via migration in db/index.js:

Column	Type	Default	Description
`mention_count`	INTEGER	1	How many times this entity has been extracted across conversations
`confidence`	REAL	1.0	Reserved for future confidence scoring
`source`	TEXT	`'extraction'`	`'extraction'` (auto) or `'manual'`
`last_seen_at`	INTEGER	NULL	Unix timestamp of most recent extraction hit

New columns on `relationships`

Column	Type	Default	Description
`mention_count`	INTEGER	1	How many times this edge has been extracted
`notes`	TEXT	NULL	Relationship context sentence from extraction

Entity Promotion Model

Entities are not created equal — some are mentioned once in passing, others recur across many conversations. mention_count is the signal:

Every time upsertEntity is called for an existing (name, type) pair, mention_count is incremented and last_seen_at is updated.
ENTITIES.PROMOTION_THRESHOLD (default: 3) is the mention_count at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
Currently mention_count is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.

The same pattern applies to relationships — mention_count rises each time the same (from_id, to_id, label) triple is extracted.

Graph Traversal

src/graph/index.js exports two functions built on SQLite's WITH RECURSIVE CTE support. No external graph database is needed.

`getNeighborhood(entityId, depth)`

Traverses the graph from a single entity, following edges in both directions, up to depth hops. Returns { nodes: [...entities], edges: [...relationships] }.

Default depth: ENTITIES.GRAPH_HOP_DEPTH (1). Maximum enforced at HTTP layer: 3.

SQLite query:

WITH RECURSIVE traverse(entity_id, depth) AS (
    SELECT ?, 0
    UNION
    SELECT
        CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
        t.depth + 1
    FROM relationships r
    JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
    WHERE t.depth < ?
)
SELECT DISTINCT entity_id FROM traverse

UNION (not UNION ALL) eliminates duplicate visits and naturally handles cycles — a node already in the traversal set is not re-visited.

After collecting node IDs, two follow-up queries fetch:

All entity rows for those IDs
All relationship rows where both from_id and to_id are in the node set

This ensures edges between neighbors are included even if they aren't on the traversal path from the seed.

`getEntityNeighbors(entityIds[])`

Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs (the results of Qdrant semantic search), returns the combined 1-hop subgraph.

Finds all neighbor IDs via one query using IN (...) on both from_id and to_id
Deduplicates seeds + neighbors using a JavaScript Set
Fetches all entity rows and all relationship rows within the combined node set

This is intentionally simpler than the recursive version — orchestration always uses depth=1, and the bulk query avoids N separate CTE calls.

Graph-Aware Context Assembly

Orchestration's assembleContext (in src/chat/index.js) integrates the graph at step 7 of the chat pipeline:

Qdrant entity search returns up to ORCHESTRATION.ENTITIES_LIMIT results, each including r.id (the SQLite entity ID) alongside the Qdrant payload
graph.getNeighbors(entityIds) is called with those IDs → POST /graph/neighbors on memory-service
The returned { nodes, edges } is passed to formatGraphContext()
On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges

Prompt Format

formatGraphContext(nodes, edges) in chat/index.js formats the subgraph as:

Here is what you know about entities relevant to this conversation and their connections:
- Alice (person): software engineer working on NexusAI
  → works_on NexusAI (project)
  → knows Bob (person)
- NexusAI (project): AI assistant framework
- Bob (person): Alice's colleague

One line per node: - {name} ({type}): {notes}
Outbound edges indented below: → {label} {target_name} ({target_type})
Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
Only outbound edges are shown — each relationship appears once, from the from_id side

Project Scoping

The knowledge graph respects project boundaries at the entry point, not during traversal:

Qdrant entity search is filtered by projectId — only entities tagged with this project are returned as seeds
Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge

Entities are tagged with projectId in the Qdrant payload at extraction time. Entities extracted from non-project sessions have projectId: null and only appear in unfiltered global searches.

API Reference

`GET /graph/neighborhood/:entityId`

Returns the neighborhood of a single entity.

Query params:

Param	Default	Max	Description
`depth`	`ENTITIES.GRAPH_HOP_DEPTH` (1)	3	Traversal depth

Response:

{
  "entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
  "neighborhood": {
    "nodes": [
      { "id": 5, "name": "Alice", "type": "person", "notes": "..." },
      { "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
    ],
    "edges": [
      { "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
    ]
  }
}

Returns 404 if the entity does not exist.

`POST /graph/neighbors`

Bulk 1-hop neighborhood for a set of entity IDs. Used internally by orchestration — not intended for direct client use.

Request body:

{ "entityIds": [5, 8, 12] }

Response:

{
  "nodes": [ ...entity objects... ],
  "edges": [ ...relationship objects... ]
}

Returns 400 if entityIds is missing or empty.

Constants (`packages/shared/src/config/constants.js`)

Constant	Value	Description
`ENTITIES.PROMOTION_THRESHOLD`	3	`mention_count` at which an entity is considered well-established
`ENTITIES.GRAPH_HOP_DEPTH`	1	Default traversal depth for neighborhood queries
`ORCHESTRATION.ENTITIES_LIMIT`	5	Max entity seeds returned from Qdrant search
`ORCHESTRATION.ENTITIES_THRESHOLD`	0.55	Minimum similarity score for entity Qdrant search

8.0 KiB Raw Blame History

Knowledge Graph

Purpose

Schema

entity_episodes (join table)

New columns on entities

New columns on relationships