documentation updates for entity extraction and summarization

2026-04-21 03:50:38 -07:00
parent 32365e67f4
commit acda21317b
6 changed files with 540 additions and 107 deletions
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -38,7 +38,8 @@ src/
 ├── db/
 │   ├── index.js       # SQLite connection + initialization + migrations
 │   ├── schema.js      # Table definitions, indexes, FTS5, triggers
-│   └── projects.js    # Project CRUD functions
+│   ├── projects.js    # Project CRUD functions
+│   └── summaries.js   # Summary CRUD functions
 ├── episodic/
 │   └── index.js       # Session + episode CRUD, FTS search, embedding write path
 ├── semantic/
@@ -51,7 +52,7 @@ src/

 ## SQLite Schema

-Six core tables:
+Seven core tables:

 - **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
 - **episodes** — individual exchanges (user message + AI response) tied to a session
@@ -100,12 +101,9 @@ that weren't touched.
 const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
 ```

-This means saving just `{ notes: "..." }` or `{ system_prompt: "..." }` won't
-touch any other field.
-
 ## Qdrant / Semantic Layer

-Three Qdrant collections are initialized on service startup:
+Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:

 | Collection | Purpose |
 |---|---|
@@ -117,9 +115,13 @@ All collections use **768-dimension vectors** with **Cosine similarity**,
 matching `nomic-embed-text` via Ollama. Vector size and distance metric are
 defined in `@nexusai/shared` — not hardcoded here.

-Each collection exposes three operations in `src/semantic/index.js`:
-upsert, search (with optional Qdrant filter), and delete. The `wait: true`
-flag is used on all writes.
+`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
+collection that doesn't already exist at startup — all three collections are
+guaranteed to exist before any requests are handled, avoiding race conditions
+between the first entity embed and an entity search.
+
+Each collection exposes upsert, search (with optional Qdrant filter), and
+delete operations. The `wait: true` flag is used on all writes.

 ## Embedding Write Path

@@ -133,8 +135,7 @@ When a new episode is created:
 This step is **fire-and-forget** — if embedding fails, the episode is still
 saved and searchable via FTS. The error is logged but not surfaced.

-> The Qdrant payload stores `sessionId` (the internal integer ID). This is
-> used for per-session and per-project filtering during semantic search. See
+> The Qdrant payload stores `sessionId` (the internal integer ID). See
 > `memory-isolation.md` for how project-level filtering works.

 ## Entity Layer
@@ -146,34 +147,20 @@ constraints to prevent duplicates:
 - `UNIQUE(from_id, to_id, label)` on relationships
 - `ON DELETE CASCADE` on relationship foreign keys

-### Automatic Entity Extraction
-
 After each episode is saved, `extraction.js` automatically extracts named
-entities from the conversation using `qwen2.5:3b` running on Ollama (Mini PC 1).
-This runs **fire-and-forget** — the episode is already saved and returned
-before extraction begins.
+entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.

-**Entity types extracted:** `person`, `place`, `project`, `technology`,
-`concept`, `organization`
+> For full details on the extraction pipeline, prompt format, constrained
+> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.

-The extraction prompt uses ChatML format (native to qwen2.5) and primes the
-response by ending with `[` to steer the model directly into JSON array output.
-A list of already-known entities is injected into the prompt so the model
-reuses existing `(name, type)` pairs rather than creating duplicates with
-different types.
+## Summaries Layer

-After extraction, each entity is:
-1. Upserted into SQLite via `upsertEntity` — notes are only written if
-   the entity is new (`COALESCE(entities.notes, excluded.notes)` prevents
-   overwriting existing notes with speculative updates)
-2. Embedded via the embedding service and upserted into the `entities`
-   Qdrant collection with `{ name, type, notes, projectId }` as payload —
-   `projectId` scopes entities to their project for isolated retrieval
+Session summaries are generated by `orchestration-service/src/services/summarization.js`
+after each episode write and stored here via `POST /summaries`. The memory
+service is responsible only for CRUD — generation logic lives in orchestration.

-`extractAndStoreEntities` receives `projectId` from `createEpisode`, which
-receives it from the episode route, which receives it from orchestration's
-`createEpisode` call. This ensures entities are tagged with the correct
-project scope at extraction time.
+> For full details on trigger conditions, prompt format, cumulative updates,
+> and ChatML token stripping, see `summarization.md`.

 ## Project Delete Behaviour