roadmap phase 1 complete

This commit is contained in:
Storme-bit
2026-04-27 03:10:39 -07:00
parent 9fe8e568cf
commit 1a97b19280
19 changed files with 759 additions and 281 deletions

View File

@@ -1,178 +1,140 @@
# Memory Service
# Entity Extraction
**Package:** `@nexusai/memory-service`
**Location:** `packages/memory-service`
**Deployed on:** Mini PC 1 (192.168.0.81)
**Port:** 3002
**Location:** `packages/memory-service/src/entities/extraction.js`
**Triggered by:** Episode creation (`POST /episodes`)
**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
## Purpose
Responsible for all reading and writing of long-term memory. Acts as the
sole interface to both SQLite and Qdrant — no other service accesses these
stores directly. On episode creation, automatically calls the embedding
service to generate and store a vector in Qdrant.
After each episode is saved to SQLite, the extraction pipeline runs
asynchronously in the background to identify named entities and the
relationships between them. Results are written back to SQLite and
embedded into Qdrant — the episode response is never delayed.
## Dependencies
## Trigger
- `express` — HTTP API
- `better-sqlite3` — SQLite driver
- `@qdrant/js-client-rest` — Qdrant vector store client
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities and constants
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3002 | Port to listen on |
| SQLITE_PATH | Yes | — | Path to SQLite database file |
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
## Internal Structure
```
src/
├── db/
│ ├── index.js # SQLite connection + initialization + migrations
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
│ ├── projects.js # Project CRUD functions
│ └── summaries.js # Summary CRUD functions
├── episodic/
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
├── semantic/
│ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/
│ ├── index.js # Entity + relationship CRUD
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
└── index.js # Express app + all route definitions
```
## SQLite Schema
Seven core tables:
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
### Migrations
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
idempotent migrations in `db/index.js` at startup:
`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
immediately after the SQLite insert, without awaiting it:
```js
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
```
New migrations are always appended here — never modify the schema file for
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
If extraction throws, the episode is unaffected — the error is logged and
swallowed.
### FTS5 Full-Text Search
## Model Settings
An `episodes_fts` virtual table enables keyword search across all episodes.
Three triggers (`episodes_fts_insert`, `episodes_fts_update`, `episodes_fts_delete`)
keep the FTS index automatically in sync with the episodes table.
| Setting | Value | Notes |
|---|---|---|
| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
| Temperature | 0.1 | Low for consistent, deterministic output |
| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
### SQLite Configuration
## Prompt Structure
- `journal_mode = WAL` — non-blocking reads during writes
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs set via `db.pragma()`, not `db.exec()`
The prompt is built by `buildExtractionPrompt()`. It includes:
### Dynamic Updates
1. **System message** — declares the model's role as an entity and relationship extractor
2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
4. **Conversation** — the raw user message and AI response, delimited clearly
Both `updateSession` and `updateProject` build their `SET` clause dynamically
from only the fields passed — prevents partial updates from overwriting fields
that weren't touched.
```
<|im_start|>system
You are a named entity and relationship extractor. You output only valid JSON.
<|im_end|>
<|im_start|>user
Read the conversation below and extract all named entities and the relationships between them.
Entity types: person, place, project, technology, concept, organization
...
Return this exact JSON structure:
{ "entities": [...], "relationships": [...] }
`updateProject` allowlist:
```js
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
Already known entities (use these exact name and type values if the same entity appears):
- "NexusAI" (project)
- "Alice" (person)
--- CONVERSATION ---
User: ...
Assistant: ...
--- END CONVERSATION ---
<|im_end|>
<|im_start|>assistant
```
## Qdrant / Semantic Layer
## Expected JSON Output
Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
| Collection | Purpose |
|---|---|
| `episodes` | Embeddings for individual conversation exchanges |
| `entities` | Embeddings for named entities |
| `summaries` | Embeddings for condensed episode summaries |
All collections use **768-dimension vectors** with **Cosine similarity**,
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
defined in `@nexusai/shared` — not hardcoded here.
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
collection that doesn't already exist at startup — all three collections are
guaranteed to exist before any requests are handled, avoiding race conditions
between the first entity embed and an entity search.
Each collection exposes upsert, search (with optional Qdrant filter), and
delete operations. The `wait: true` flag is used on all writes.
## Embedding Write Path
When a new episode is created:
1. Episode saved to SQLite synchronously — response returned immediately
2. User message + AI response combined: `User: ...\nAssistant: ...`
3. Text sent to embedding service (`POST /embed`)
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
This step is **fire-and-forget** — if embedding fails, the episode is still
saved and searchable via FTS. The error is logged but not surfaced.
> The Qdrant payload stores `sessionId` (the internal integer ID). See
> `memory-isolation.md` for how project-level filtering works.
## Entity Layer
Entities and relationships use upsert semantics with composite unique
constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on relationship foreign keys
After each episode is saved, `extraction.js` automatically extracts named
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
> For full details on the extraction pipeline, prompt format, constrained
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
## Summaries Layer
Session summaries are generated by `orchestration-service/src/services/summarization.js`
after each episode write and stored here via `POST /summaries`. The memory
service is responsible only for CRUD — generation logic lives in orchestration.
> For full details on trigger conditions, prompt format, cumulative updates,
> and ChatML token stripping, see `summarization.md`.
## Project Delete Behaviour
Deleting a project runs as a transaction — it first nulls out `project_id`
on all assigned sessions, then deletes the project. This avoids a foreign
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
```js
const doDelete = db.transaction(() => {
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
});
```json
{
"entities": [
{ "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
{ "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
],
"relationships": [
{
"from": "Alice", "fromType": "person",
"to": "NexusAI", "toType": "project",
"label": "works_on",
"notes": "Alice is the primary developer."
}
]
}
```
For all HTTP endpoints, see `api-routes.md`.
Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
`knows`, `located_in`, `part_of`, `created_by`).
## JSON Parsing
The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
tolerates any preamble or trailing prose the model emits alongside the JSON.
If the match fails or `JSON.parse` throws, the function logs a warning and
returns without writing anything.
## Entity Processing
For each entity in `parsed.entities`:
1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
2. Call `upsertEntity(name, type, notes)`:
- **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
- **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
## Relationship Processing
After all entities are saved, relationships are processed:
1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
3. Call `upsertRelationship(fromId, toId, label, notes)`:
- **Insert**: creates new row with `mention_count = 1`
- **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
Relationships are unidirectional in storage. Bidirectionality is handled at
query time by the graph traversal layer.
## Project Scoping
`projectId` is threaded through from the episode creation call. It is stored
in the Qdrant entity payload, which enables project-scoped entity search in
orchestration. SQLite entities and relationships are global — scoping only
applies at the Qdrant retrieval layer.
## Error Behaviour
All steps after the initial model call are wrapped in a single outer try/catch.
If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
parsed, the function logs at `warn` level and returns. There is no retry logic.
Individual entity embedding failures are caught per-entity and logged at `warn`
level without affecting other entities in the same batch.