Compare commits
11 Commits
aac0923351
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e4908193bd | ||
|
|
b58a4e4692 | ||
|
|
055683424d | ||
|
|
27ad614130 | ||
|
|
8ade5c68ca | ||
|
|
49982a85de | ||
|
|
9c6c5c9a42 | ||
|
|
c9cbac87ac | ||
|
|
1a97b19280 | ||
|
|
9fe8e568cf | ||
|
|
5ad01c6ad8 |
108
CLAUDE.md
Normal file
108
CLAUDE.md
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Development Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start individual services
|
||||||
|
npm run memory # Memory Service (port 3002)
|
||||||
|
npm run embedding # Embedding Service (port 3003)
|
||||||
|
npm run inference # Inference Service (port 3001)
|
||||||
|
npm run orchestration # Orchestration Service (port 4000)
|
||||||
|
npm run mini1 # Start memory + embedding concurrently
|
||||||
|
|
||||||
|
# Per-service dev mode (with --watch)
|
||||||
|
npm -w packages/<service-name> run dev
|
||||||
|
|
||||||
|
# Chat client
|
||||||
|
npm -w packages/chat-client run dev # Vite dev server (port 5173)
|
||||||
|
npm -w packages/chat-client run build # Production build
|
||||||
|
```
|
||||||
|
|
||||||
|
No test framework or linter is configured.
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
|
||||||
|
|
||||||
|
### Services
|
||||||
|
|
||||||
|
| Package | Port | Role |
|
||||||
|
|---|---|---|
|
||||||
|
| `orchestration-service` | 4000 | Central gateway; coordinates all others |
|
||||||
|
| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
|
||||||
|
| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
|
||||||
|
| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
|
||||||
|
| `chat-client` | 5173 | React/Vite frontend |
|
||||||
|
| `shared` | — | Constants, env helpers, logger, formatters |
|
||||||
|
|
||||||
|
All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
|
||||||
|
|
||||||
|
### Chat Request Flow
|
||||||
|
|
||||||
|
1. Client POSTs to orchestration `/chat/stream`
|
||||||
|
2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
|
||||||
|
3. Embedding computed for user message (embedding-service)
|
||||||
|
4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
|
||||||
|
5. Inference streams response (inference-service)
|
||||||
|
6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
|
||||||
|
7. Entity extraction triggered async (qwen2.5:3b via inference-service)
|
||||||
|
8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
|
||||||
|
9. Auto-naming on first message (temp 0.3, 20 tokens max)
|
||||||
|
|
||||||
|
### Memory Model
|
||||||
|
|
||||||
|
**Dual store — neither works alone:**
|
||||||
|
- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
|
||||||
|
- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
|
||||||
|
|
||||||
|
Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
|
||||||
|
|
||||||
|
**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
|
||||||
|
|
||||||
|
### Key File Locations
|
||||||
|
|
||||||
|
**Orchestration** (`packages/orchestration-service/src/`):
|
||||||
|
- `chat/index.js` — Core prompt building and memory assembly
|
||||||
|
- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
|
||||||
|
- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
|
||||||
|
- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
|
||||||
|
|
||||||
|
**Memory** (`packages/memory-service/src/`):
|
||||||
|
- `db/schema.js` — SQLite table definitions (source of truth for data model)
|
||||||
|
- `episodic/` — Episode CRUD
|
||||||
|
- `semantic/` — Qdrant operations
|
||||||
|
- `entities/` — Entity extraction + CRUD
|
||||||
|
- `summarization/` — Project summary generation
|
||||||
|
|
||||||
|
**Shared** (`packages/shared/src/`):
|
||||||
|
- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
|
||||||
|
- `config/env.js` — `getEnv()` helper with fallback to constants
|
||||||
|
- `utils.js` — `parseRow()`, `formatEpisodeText()`, `logger`
|
||||||
|
|
||||||
|
**Frontend** (`packages/chat-client/src/`):
|
||||||
|
- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
|
||||||
|
- `hooks/` — `useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
|
||||||
|
- `api/orchestration.js` — Fetch wrapper for all API calls
|
||||||
|
- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Home lab across 3 nodes, managed with Docker Compose:
|
||||||
|
- **Main PC** — RTX A4000 (inference via llama.cpp)
|
||||||
|
- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
|
||||||
|
- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
|
||||||
|
|
||||||
|
Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
|
||||||
|
|
||||||
|
## Key Development Principles
|
||||||
|
|
||||||
|
- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
|
||||||
|
- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
|
||||||
|
- **All services read settings on every request** — no restart required for config changes
|
||||||
|
- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend
|
||||||
@@ -73,8 +73,8 @@ service by ID after the vector search.
|
|||||||
|
|
||||||
The core four-service architecture is complete and operational. Key capabilities:
|
The core four-service architecture is complete and operational. Key capabilities:
|
||||||
|
|
||||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
|
||||||
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
|
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
||||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||||
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||||
|
|||||||
@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
|
|||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
|
||||||
|
| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||||
| `temperature` | float | 0–2 | Inference randomness |
|
| `temperature` | float | 0–2 | Inference randomness |
|
||||||
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||||
@@ -360,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.
|
|||||||
|
|
||||||
**DELETE /relationships — body:**
|
**DELETE /relationships — body:**
|
||||||
```json
|
```json
|
||||||
{ "fromId": 1, "toId": 2, "label": "uses" }
|
{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
|
||||||
```
|
```
|
||||||
|
notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.
|
||||||
|
|
||||||
Relationships are identified by the composite key `(fromId, toId, label)`.
|
Relationships are identified by the composite key `(fromId, toId, label)`.
|
||||||
Delete uses request body rather than URL params since this three-part key
|
Delete uses request body rather than URL params since this three-part key
|
||||||
is awkward to encode in a path.
|
is awkward to encode in a path.
|
||||||
|
|
||||||
|
### Graph
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
|
||||||
|
| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
|
||||||
|
|
||||||
|
**GET /graph/neighborhood/:entityId — query params:**
|
||||||
|
|
||||||
|
| Param | Default | Max | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| depth | 1 | 3 | Traversal depth |
|
||||||
|
|
||||||
|
Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
|
||||||
|
|
||||||
|
**POST /graph/neighbors — body:**
|
||||||
|
```json
|
||||||
|
{ "entityIds": [5, 8, 12] }
|
||||||
|
Returns { nodes: [...], edges: [...] }. Used internally by orchestration — not a client-facing endpoint.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Embedding Service — port 3003
|
## Embedding Service — port 3003
|
||||||
|
|||||||
@@ -39,36 +39,36 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 1 — Loose Ends & Stability
|
## Phase 1 — Loose Ends & Stability - COMPLETE ✅
|
||||||
*Target: Next development session (Saturday)*
|
*Target: Next development session (Saturday)*
|
||||||
|
|
||||||
### Bug Fixes
|
### Bug Fixes
|
||||||
- [x] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
||||||
- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
||||||
|
|
||||||
### Tech Debt
|
### Tech Debt
|
||||||
- [x] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
||||||
- [x] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
||||||
- [x] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
||||||
- [x] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase 2 — Memory System Upgrades
|
## Phase 2 — Memory System Upgrades
|
||||||
*The core intelligence layer*
|
*The core intelligence layer*
|
||||||
|
|
||||||
### 1. Knowledge Graph (SQLite)
|
### 1. Knowledge Graph (SQLite) ✅
|
||||||
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
|
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
|
||||||
- [ ] Graph schema — `nodes` and `edges` tables with typed relationships
|
- [x] Graph schema — `nodes` and `edges` tables with typed relationships
|
||||||
- [ ] Entity → node promotion pipeline
|
- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
|
||||||
- [ ] Relationship traversal queries
|
- [x] Relationship traversal queries
|
||||||
- [ ] Graph-aware context assembly in orchestration
|
- [x] Graph-aware context assembly in orchestration
|
||||||
|
|
||||||
### 2. Retrieval Fusion + Full-Text Search
|
### 2. Retrieval Fusion + Full-Text Search ✅
|
||||||
Multi-strategy retrieval merged into a single ranked result set.
|
Multi-strategy retrieval merged into a single ranked result set.
|
||||||
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||||
- [ ] Configurable weights per retrieval strategy
|
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
|
||||||
- [ ] Score threshold tuning per collection
|
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
|
||||||
|
|
||||||
### 3. Memory Consolidation Lifecycle
|
### 3. Memory Consolidation Lifecycle
|
||||||
Prevents long-term memory degradation and enables compression.
|
Prevents long-term memory degradation and enables compression.
|
||||||
|
|||||||
@@ -1,178 +1,140 @@
|
|||||||
# Memory Service
|
# Entity Extraction
|
||||||
|
|
||||||
**Package:** `@nexusai/memory-service`
|
**Location:** `packages/memory-service/src/entities/extraction.js`
|
||||||
**Location:** `packages/memory-service`
|
**Triggered by:** Episode creation (`POST /episodes`)
|
||||||
**Deployed on:** Mini PC 1 (192.168.0.81)
|
**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
|
||||||
**Port:** 3002
|
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
Responsible for all reading and writing of long-term memory. Acts as the
|
After each episode is saved to SQLite, the extraction pipeline runs
|
||||||
sole interface to both SQLite and Qdrant — no other service accesses these
|
asynchronously in the background to identify named entities and the
|
||||||
stores directly. On episode creation, automatically calls the embedding
|
relationships between them. Results are written back to SQLite and
|
||||||
service to generate and store a vector in Qdrant.
|
embedded into Qdrant — the episode response is never delayed.
|
||||||
|
|
||||||
## Dependencies
|
## Trigger
|
||||||
|
|
||||||
- `express` — HTTP API
|
`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
|
||||||
- `better-sqlite3` — SQLite driver
|
immediately after the SQLite insert, without awaiting it:
|
||||||
- `@qdrant/js-client-rest` — Qdrant vector store client
|
|
||||||
- `dotenv` — environment variable loading
|
|
||||||
- `@nexusai/shared` — shared utilities and constants
|
|
||||||
|
|
||||||
## Environment Variables
|
|
||||||
|
|
||||||
| Variable | Required | Default | Description |
|
|
||||||
|---|---|---|---|
|
|
||||||
| PORT | No | 3002 | Port to listen on |
|
|
||||||
| SQLITE_PATH | Yes | — | Path to SQLite database file |
|
|
||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
|
||||||
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
|
||||||
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
|
|
||||||
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
|
|
||||||
|
|
||||||
## Internal Structure
|
|
||||||
|
|
||||||
```
|
|
||||||
src/
|
|
||||||
├── db/
|
|
||||||
│ ├── index.js # SQLite connection + initialization + migrations
|
|
||||||
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
|
|
||||||
│ ├── projects.js # Project CRUD functions
|
|
||||||
│ └── summaries.js # Summary CRUD functions
|
|
||||||
├── episodic/
|
|
||||||
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
|
||||||
├── semantic/
|
|
||||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
|
||||||
├── entities/
|
|
||||||
│ ├── index.js # Entity + relationship CRUD
|
|
||||||
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
|
||||||
└── index.js # Express app + all route definitions
|
|
||||||
```
|
|
||||||
|
|
||||||
## SQLite Schema
|
|
||||||
|
|
||||||
Seven core tables:
|
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
|
||||||
- **entities** — named things the system learns about (people, places, concepts)
|
|
||||||
- **relationships** — directional labeled links between entities
|
|
||||||
- **summaries** — condensed episode groups for efficient context retrieval
|
|
||||||
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
|
||||||
|
|
||||||
### Migrations
|
|
||||||
|
|
||||||
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
|
|
||||||
idempotent migrations in `db/index.js` at startup:
|
|
||||||
|
|
||||||
```js
|
```js
|
||||||
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
|
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||||
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
|
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||||
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
|
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
New migrations are always appended here — never modify the schema file for
|
If extraction throws, the episode is unaffected — the error is logged and
|
||||||
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
swallowed.
|
||||||
|
|
||||||
### FTS5 Full-Text Search
|
## Model Settings
|
||||||
|
|
||||||
An `episodes_fts` virtual table enables keyword search across all episodes.
|
| Setting | Value | Notes |
|
||||||
Three triggers (`episodes_fts_insert`, `episodes_fts_update`, `episodes_fts_delete`)
|
|---|---|---|
|
||||||
keep the FTS index automatically in sync with the episodes table.
|
| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
|
||||||
|
| Temperature | 0.1 | Low for consistent, deterministic output |
|
||||||
|
| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
|
||||||
|
| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
|
||||||
|
| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
|
||||||
|
|
||||||
### SQLite Configuration
|
## Prompt Structure
|
||||||
|
|
||||||
- `journal_mode = WAL` — non-blocking reads during writes
|
The prompt is built by `buildExtractionPrompt()`. It includes:
|
||||||
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
|
||||||
- PRAGMAs set via `db.pragma()`, not `db.exec()`
|
|
||||||
|
|
||||||
### Dynamic Updates
|
1. **System message** — declares the model's role as an entity and relationship extractor
|
||||||
|
2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
|
||||||
|
3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
|
||||||
|
4. **Conversation** — the raw user message and AI response, delimited clearly
|
||||||
|
|
||||||
Both `updateSession` and `updateProject` build their `SET` clause dynamically
|
```
|
||||||
from only the fields passed — prevents partial updates from overwriting fields
|
<|im_start|>system
|
||||||
that weren't touched.
|
You are a named entity and relationship extractor. You output only valid JSON.
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
Read the conversation below and extract all named entities and the relationships between them.
|
||||||
|
Entity types: person, place, project, technology, concept, organization
|
||||||
|
...
|
||||||
|
Return this exact JSON structure:
|
||||||
|
{ "entities": [...], "relationships": [...] }
|
||||||
|
|
||||||
`updateProject` allowlist:
|
Already known entities (use these exact name and type values if the same entity appears):
|
||||||
```js
|
- "NexusAI" (project)
|
||||||
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
|
- "Alice" (person)
|
||||||
|
|
||||||
|
--- CONVERSATION ---
|
||||||
|
User: ...
|
||||||
|
Assistant: ...
|
||||||
|
--- END CONVERSATION ---
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
```
|
```
|
||||||
|
|
||||||
## Qdrant / Semantic Layer
|
## Expected JSON Output
|
||||||
|
|
||||||
Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
|
```json
|
||||||
|
{
|
||||||
| Collection | Purpose |
|
"entities": [
|
||||||
|---|---|
|
{ "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
|
||||||
| `episodes` | Embeddings for individual conversation exchanges |
|
{ "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
|
||||||
| `entities` | Embeddings for named entities |
|
],
|
||||||
| `summaries` | Embeddings for condensed episode summaries |
|
"relationships": [
|
||||||
|
{
|
||||||
All collections use **768-dimension vectors** with **Cosine similarity**,
|
"from": "Alice", "fromType": "person",
|
||||||
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
|
"to": "NexusAI", "toType": "project",
|
||||||
defined in `@nexusai/shared` — not hardcoded here.
|
"label": "works_on",
|
||||||
|
"notes": "Alice is the primary developer."
|
||||||
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
}
|
||||||
collection that doesn't already exist at startup — all three collections are
|
]
|
||||||
guaranteed to exist before any requests are handled, avoiding race conditions
|
}
|
||||||
between the first entity embed and an entity search.
|
|
||||||
|
|
||||||
Each collection exposes upsert, search (with optional Qdrant filter), and
|
|
||||||
delete operations. The `wait: true` flag is used on all writes.
|
|
||||||
|
|
||||||
## Embedding Write Path
|
|
||||||
|
|
||||||
When a new episode is created:
|
|
||||||
|
|
||||||
1. Episode saved to SQLite synchronously — response returned immediately
|
|
||||||
2. User message + AI response combined: `User: ...\nAssistant: ...`
|
|
||||||
3. Text sent to embedding service (`POST /embed`)
|
|
||||||
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
|
|
||||||
|
|
||||||
This step is **fire-and-forget** — if embedding fails, the episode is still
|
|
||||||
saved and searchable via FTS. The error is logged but not surfaced.
|
|
||||||
|
|
||||||
> The Qdrant payload stores `sessionId` (the internal integer ID). See
|
|
||||||
> `memory-isolation.md` for how project-level filtering works.
|
|
||||||
|
|
||||||
## Entity Layer
|
|
||||||
|
|
||||||
Entities and relationships use upsert semantics with composite unique
|
|
||||||
constraints to prevent duplicates:
|
|
||||||
|
|
||||||
- `UNIQUE(name, type)` on entities
|
|
||||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
|
||||||
- `ON DELETE CASCADE` on relationship foreign keys
|
|
||||||
|
|
||||||
After each episode is saved, `extraction.js` automatically extracts named
|
|
||||||
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
|
|
||||||
|
|
||||||
> For full details on the extraction pipeline, prompt format, constrained
|
|
||||||
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
|
|
||||||
|
|
||||||
## Summaries Layer
|
|
||||||
|
|
||||||
Session summaries are generated by `orchestration-service/src/services/summarization.js`
|
|
||||||
after each episode write and stored here via `POST /summaries`. The memory
|
|
||||||
service is responsible only for CRUD — generation logic lives in orchestration.
|
|
||||||
|
|
||||||
> For full details on trigger conditions, prompt format, cumulative updates,
|
|
||||||
> and ChatML token stripping, see `summarization.md`.
|
|
||||||
|
|
||||||
## Project Delete Behaviour
|
|
||||||
|
|
||||||
Deleting a project runs as a transaction — it first nulls out `project_id`
|
|
||||||
on all assigned sessions, then deletes the project. This avoids a foreign
|
|
||||||
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
|
|
||||||
|
|
||||||
```js
|
|
||||||
const doDelete = db.transaction(() => {
|
|
||||||
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
|
|
||||||
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
|
|
||||||
});
|
|
||||||
```
|
```
|
||||||
|
|
||||||
For all HTTP endpoints, see `api-routes.md`.
|
Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
|
||||||
|
`knows`, `located_in`, `part_of`, `created_by`).
|
||||||
|
|
||||||
|
## JSON Parsing
|
||||||
|
|
||||||
|
The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
|
||||||
|
tolerates any preamble or trailing prose the model emits alongside the JSON.
|
||||||
|
If the match fails or `JSON.parse` throws, the function logs a warning and
|
||||||
|
returns without writing anything.
|
||||||
|
|
||||||
|
## Entity Processing
|
||||||
|
|
||||||
|
For each entity in `parsed.entities`:
|
||||||
|
|
||||||
|
1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
|
||||||
|
2. Call `upsertEntity(name, type, notes)`:
|
||||||
|
- **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
|
||||||
|
- **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
|
||||||
|
3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
|
||||||
|
4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
|
||||||
|
5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
|
||||||
|
|
||||||
|
**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
|
||||||
|
|
||||||
|
**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
|
||||||
|
|
||||||
|
## Relationship Processing
|
||||||
|
|
||||||
|
After all entities are saved, relationships are processed:
|
||||||
|
|
||||||
|
1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
|
||||||
|
2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
|
||||||
|
3. Call `upsertRelationship(fromId, toId, label, notes)`:
|
||||||
|
- **Insert**: creates new row with `mention_count = 1`
|
||||||
|
- **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
|
||||||
|
|
||||||
|
Relationships are unidirectional in storage. Bidirectionality is handled at
|
||||||
|
query time by the graph traversal layer.
|
||||||
|
|
||||||
|
## Project Scoping
|
||||||
|
|
||||||
|
`projectId` is threaded through from the episode creation call. It is stored
|
||||||
|
in the Qdrant entity payload, which enables project-scoped entity search in
|
||||||
|
orchestration. SQLite entities and relationships are global — scoping only
|
||||||
|
applies at the Qdrant retrieval layer.
|
||||||
|
|
||||||
|
## Error Behaviour
|
||||||
|
|
||||||
|
All steps after the initial model call are wrapped in a single outer try/catch.
|
||||||
|
If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
|
||||||
|
parsed, the function logs at `warn` level and returns. There is no retry logic.
|
||||||
|
Individual entity embedding failures are caught per-entity and logged at `warn`
|
||||||
|
level without affecting other entities in the same batch.
|
||||||
|
|||||||
213
docs/services/knowledge-graph.md
Normal file
213
docs/services/knowledge-graph.md
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
# Knowledge Graph
|
||||||
|
|
||||||
|
**Location:** `packages/memory-service/src/graph/index.js`
|
||||||
|
**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`
|
||||||
|
**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`
|
||||||
|
**Consumed by:** Orchestration service context assembly
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The knowledge graph transforms NexusAI from "remembers conversations" to
|
||||||
|
"understands relationships between things." Rather than injecting a flat
|
||||||
|
list of entity facts into every prompt, orchestration now retrieves a
|
||||||
|
1-hop subgraph of connected entities and their relationships, giving the
|
||||||
|
model structured, linked knowledge about people, projects, technologies,
|
||||||
|
and concepts that have appeared across conversations.
|
||||||
|
|
||||||
|
## Schema
|
||||||
|
|
||||||
|
### `entity_episodes` (join table)
|
||||||
|
|
||||||
|
Tracks which episodes contributed to each entity's knowledge. Defined in
|
||||||
|
`schema.js` — exists on all installs.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||||
|
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||||
|
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (entity_id, episode_id)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Both FKs cascade on delete — removing an entity or episode automatically
|
||||||
|
cleans up its join rows.
|
||||||
|
|
||||||
|
### New columns on `entities`
|
||||||
|
|
||||||
|
Added via migration in `db/index.js`:
|
||||||
|
|
||||||
|
| Column | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
|
||||||
|
| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
|
||||||
|
| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
|
||||||
|
| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
|
||||||
|
|
||||||
|
### New columns on `relationships`
|
||||||
|
|
||||||
|
| Column | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
|
||||||
|
| `notes` | TEXT | NULL | Relationship context sentence from extraction |
|
||||||
|
|
||||||
|
## Entity Promotion Model
|
||||||
|
|
||||||
|
Entities are not created equal — some are mentioned once in passing, others
|
||||||
|
recur across many conversations. `mention_count` is the signal:
|
||||||
|
|
||||||
|
- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
|
||||||
|
- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
|
||||||
|
- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
|
||||||
|
|
||||||
|
The same pattern applies to relationships — `mention_count` rises each time
|
||||||
|
the same `(from_id, to_id, label)` triple is extracted.
|
||||||
|
|
||||||
|
## Graph Traversal
|
||||||
|
|
||||||
|
`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
|
||||||
|
CTE support. No external graph database is needed.
|
||||||
|
|
||||||
|
### `getNeighborhood(entityId, depth)`
|
||||||
|
|
||||||
|
Traverses the graph from a single entity, following edges in **both directions**,
|
||||||
|
up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
|
||||||
|
|
||||||
|
Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
|
||||||
|
|
||||||
|
**SQLite query:**
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||||
|
SELECT ?, 0
|
||||||
|
UNION
|
||||||
|
SELECT
|
||||||
|
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||||
|
t.depth + 1
|
||||||
|
FROM relationships r
|
||||||
|
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||||
|
WHERE t.depth < ?
|
||||||
|
)
|
||||||
|
SELECT DISTINCT entity_id FROM traverse
|
||||||
|
```
|
||||||
|
|
||||||
|
`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
|
||||||
|
cycles — a node already in the traversal set is not re-visited.
|
||||||
|
|
||||||
|
After collecting node IDs, two follow-up queries fetch:
|
||||||
|
- All entity rows for those IDs
|
||||||
|
- All relationship rows where both `from_id` and `to_id` are in the node set
|
||||||
|
|
||||||
|
This ensures edges between neighbors are included even if they aren't on the
|
||||||
|
traversal path from the seed.
|
||||||
|
|
||||||
|
### `getEntityNeighbors(entityIds[])`
|
||||||
|
|
||||||
|
Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
|
||||||
|
(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
|
||||||
|
|
||||||
|
1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
|
||||||
|
2. Deduplicates seeds + neighbors using a JavaScript `Set`
|
||||||
|
3. Fetches all entity rows and all relationship rows within the combined node set
|
||||||
|
|
||||||
|
This is intentionally simpler than the recursive version — orchestration always
|
||||||
|
uses depth=1, and the bulk query avoids N separate CTE calls.
|
||||||
|
|
||||||
|
## Graph-Aware Context Assembly
|
||||||
|
|
||||||
|
Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
|
||||||
|
graph at step 7 of the chat pipeline:
|
||||||
|
|
||||||
|
1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
|
||||||
|
2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
|
||||||
|
3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
|
||||||
|
4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
|
||||||
|
|
||||||
|
### Prompt Format
|
||||||
|
|
||||||
|
`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
|
||||||
|
|
||||||
|
```
|
||||||
|
Here is what you know about entities relevant to this conversation and their connections:
|
||||||
|
- Alice (person): software engineer working on NexusAI
|
||||||
|
→ works_on NexusAI (project)
|
||||||
|
→ knows Bob (person)
|
||||||
|
- NexusAI (project): AI assistant framework
|
||||||
|
- Bob (person): Alice's colleague
|
||||||
|
```
|
||||||
|
|
||||||
|
- One line per node: `- {name} ({type}): {notes}`
|
||||||
|
- Outbound edges indented below: ` → {label} {target_name} ({target_type})`
|
||||||
|
- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
|
||||||
|
- Only outbound edges are shown — each relationship appears once, from the `from_id` side
|
||||||
|
|
||||||
|
## Project Scoping
|
||||||
|
|
||||||
|
The knowledge graph respects project boundaries at the **entry point**, not
|
||||||
|
during traversal:
|
||||||
|
|
||||||
|
- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
|
||||||
|
- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
|
||||||
|
- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
|
||||||
|
|
||||||
|
Entities are tagged with `projectId` in the Qdrant payload at extraction time.
|
||||||
|
Entities extracted from non-project sessions have `projectId: null` and only
|
||||||
|
appear in unfiltered global searches.
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
### `GET /graph/neighborhood/:entityId`
|
||||||
|
|
||||||
|
Returns the neighborhood of a single entity.
|
||||||
|
|
||||||
|
**Query params:**
|
||||||
|
|
||||||
|
| Param | Default | Max | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
|
||||||
|
"neighborhood": {
|
||||||
|
"nodes": [
|
||||||
|
{ "id": 5, "name": "Alice", "type": "person", "notes": "..." },
|
||||||
|
{ "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
|
||||||
|
],
|
||||||
|
"edges": [
|
||||||
|
{ "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns 404 if the entity does not exist.
|
||||||
|
|
||||||
|
### `POST /graph/neighbors`
|
||||||
|
|
||||||
|
Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
|
||||||
|
orchestration — not intended for direct client use.
|
||||||
|
|
||||||
|
**Request body:**
|
||||||
|
```json
|
||||||
|
{ "entityIds": [5, 8, 12] }
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"nodes": [ ...entity objects... ],
|
||||||
|
"edges": [ ...relationship objects... ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns 400 if `entityIds` is missing or empty.
|
||||||
|
|
||||||
|
## Constants (`packages/shared/src/config/constants.js`)
|
||||||
|
|
||||||
|
| Constant | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
|
||||||
|
| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
|
||||||
|
| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
|
||||||
|
| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
|
||||||
@@ -9,8 +9,8 @@
|
|||||||
|
|
||||||
Responsible for all reading and writing of long-term memory. Acts as the
|
Responsible for all reading and writing of long-term memory. Acts as the
|
||||||
sole interface to both SQLite and Qdrant — no other service accesses these
|
sole interface to both SQLite and Qdrant — no other service accesses these
|
||||||
stores directly. On episode creation, automatically calls the embedding
|
stores directly. On episode creation, automatically triggers entity and
|
||||||
service to generate and store a vector in Qdrant.
|
relationship extraction and embeds results into Qdrant.
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
@@ -45,19 +45,22 @@ src/
|
|||||||
├── semantic/
|
├── semantic/
|
||||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||||
├── entities/
|
├── entities/
|
||||||
│ ├── index.js # Entity + relationship CRUD
|
│ ├── index.js # Entity + relationship CRUD (upsert, mention tracking)
|
||||||
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
│ └── extraction.js # Automatic entity + relationship extraction via qwen2.5:3b
|
||||||
|
├── graph/
|
||||||
|
│ └── index.js # Knowledge graph traversal (neighborhood queries, recursive CTE)
|
||||||
└── index.js # Express app + all route definitions
|
└── index.js # Express app + all route definitions
|
||||||
```
|
```
|
||||||
|
|
||||||
## SQLite Schema
|
## SQLite Schema
|
||||||
|
|
||||||
Seven core tables:
|
Eight core tables:
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||||
- **entities** — named things the system learns about (people, places, concepts)
|
- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
|
||||||
- **relationships** — directional labeled links between entities
|
- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
|
||||||
|
- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
|
||||||
- **summaries** — condensed episode groups for efficient context retrieval
|
- **summaries** — condensed episode groups for efficient context retrieval
|
||||||
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
||||||
|
|
||||||
@@ -73,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
|
|||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
||||||
|
// Knowledge graph columns:
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||||
```
|
```
|
||||||
|
|
||||||
New migrations are always appended here — never modify the schema file for
|
`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
|
||||||
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
|
||||||
|
New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||||
|
|
||||||
### FTS5 Full-Text Search
|
### FTS5 Full-Text Search
|
||||||
|
|
||||||
@@ -117,8 +128,7 @@ defined in `@nexusai/shared` — not hardcoded here.
|
|||||||
|
|
||||||
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
||||||
collection that doesn't already exist at startup — all three collections are
|
collection that doesn't already exist at startup — all three collections are
|
||||||
guaranteed to exist before any requests are handled, avoiding race conditions
|
guaranteed to exist before any requests are handled.
|
||||||
between the first entity embed and an entity search.
|
|
||||||
|
|
||||||
Each collection exposes upsert, search (with optional Qdrant filter), and
|
Each collection exposes upsert, search (with optional Qdrant filter), and
|
||||||
delete operations. The `wait: true` flag is used on all writes.
|
delete operations. The `wait: true` flag is used on all writes.
|
||||||
@@ -143,15 +153,27 @@ saved and searchable via FTS. The error is logged but not surfaced.
|
|||||||
Entities and relationships use upsert semantics with composite unique
|
Entities and relationships use upsert semantics with composite unique
|
||||||
constraints to prevent duplicates:
|
constraints to prevent duplicates:
|
||||||
|
|
||||||
- `UNIQUE(name, type)` on entities
|
- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
|
||||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
|
||||||
- `ON DELETE CASCADE` on relationship foreign keys
|
- `ON DELETE CASCADE` on relationship foreign keys
|
||||||
|
|
||||||
After each episode is saved, `extraction.js` automatically extracts named
|
After each episode is saved, `extraction.js` automatically extracts named
|
||||||
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
|
entities **and relationships** from the conversation using `qwen2.5:3b` on
|
||||||
|
Ollama — fire-and-forget. Each saved entity is also linked to the episode
|
||||||
|
via the `entity_episodes` join table.
|
||||||
|
|
||||||
> For full details on the extraction pipeline, prompt format, constrained
|
> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.
|
||||||
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
|
> For the knowledge graph traversal layer, see `knowledge-graph.md`.
|
||||||
|
|
||||||
|
## Knowledge Graph Layer
|
||||||
|
|
||||||
|
`src/graph/index.js` provides SQLite-based graph traversal over the entities
|
||||||
|
and relationships tables. Two functions are exposed via HTTP:
|
||||||
|
|
||||||
|
- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
|
||||||
|
- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
|
||||||
|
|
||||||
|
> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.
|
||||||
|
|
||||||
## Summaries Layer
|
## Summaries Layer
|
||||||
|
|
||||||
|
|||||||
@@ -42,9 +42,10 @@ src/
|
|||||||
│ ├── inference.js # HTTP client for inference service
|
│ ├── inference.js # HTTP client for inference service
|
||||||
│ ├── embedding.js # HTTP client for embedding service
|
│ ├── embedding.js # HTTP client for embedding service
|
||||||
│ ├── qdrant.js # HTTP client for Qdrant (direct vector search)
|
│ ├── qdrant.js # HTTP client for Qdrant (direct vector search)
|
||||||
|
│ ├── graph.js # HTTP client for memory-service graph endpoints
|
||||||
│ └── summarization.js # Session summarisation — triggers after each episode
|
│ └── summarization.js # Session summarisation — triggers after each episode
|
||||||
├── chat/
|
├── chat/
|
||||||
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
│ └── index.js # Core pipeline — context assembly, graph expansion, auto-naming
|
||||||
├── config/
|
├── config/
|
||||||
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
||||||
├── routes/
|
├── routes/
|
||||||
@@ -71,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||||
| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
|
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
|
||||||
|
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||||
| `temperature` | 0.7 | Inference temperature |
|
| `temperature` | 0.7 | Inference temperature |
|
||||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||||
@@ -100,24 +103,35 @@ difference is how the inference response is delivered to the client.
|
|||||||
|
|
||||||
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
||||||
|
|
||||||
5. **Semantic search** — embed user message, query Qdrant for similar past
|
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
|
||||||
episodes. Deduplicated against recent episodes. Non-critical.
|
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
|
||||||
|
Both paths are filtered against `recentIds` before fusion. FTS is scoped
|
||||||
|
to the current session or all project sessions. If `keywordWeight` is `0`,
|
||||||
|
the FTS call is skipped entirely. Non-critical — failures fall back to
|
||||||
|
whichever strategy succeeded.
|
||||||
|
|
||||||
6. **Entity search** — query `entities` Qdrant collection filtered by
|
6. **Entity search** — query `entities` Qdrant collection filtered by
|
||||||
`projectId`. Non-project sessions receive no entity context. Non-critical.
|
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
||||||
|
point ID equals the SQLite entity ID). Non-critical.
|
||||||
|
|
||||||
7. **Prompt assembly** — combine system prompt, entity context, semantic
|
7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
|
||||||
episodes, recent episodes, and user message.
|
memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
|
||||||
|
`{ nodes, edges }` — entity objects plus the relationships connecting them.
|
||||||
|
If no entities were found or the graph call fails, falls back to flat entity
|
||||||
|
list (no edges). Non-critical.
|
||||||
|
|
||||||
8. **Inference** — send to inference service. `/chat` awaits full response;
|
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
|
||||||
|
recent episodes, and user message.
|
||||||
|
|
||||||
|
9. **Inference** — send to inference service. `/chat` awaits full response;
|
||||||
`/chat/stream` pipes SSE chunks to the client.
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
|
|
||||||
9. **Episode write** — write exchange back to memory with `projectId`.
|
10. **Episode write** — write exchange back to memory with `projectId`.
|
||||||
|
|
||||||
10. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
|
11. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
|
||||||
fire-and-forget. See `summarization.md` for full details.
|
fire-and-forget. See `summarization.md` for full details.
|
||||||
|
|
||||||
11. **Auto-naming** — on first message with no session name, fires a secondary
|
12. **Auto-naming** — on first message with no session name, fires a secondary
|
||||||
inference call (max 20 tokens, temperature 0.3) to generate a session name.
|
inference call (max 20 tokens, temperature 0.3) to generate a session name.
|
||||||
|
|
||||||
### Prompt Structure
|
### Prompt Structure
|
||||||
@@ -125,8 +139,9 @@ difference is how the inference response is delivered to the client.
|
|||||||
```
|
```
|
||||||
[Resolved system prompt]
|
[Resolved system prompt]
|
||||||
|
|
||||||
Here is what you know about entities relevant to this conversation:
|
Here is what you know about entities relevant to this conversation and their connections:
|
||||||
- {name} ({type}): {notes}
|
- {name} ({type}): {notes}
|
||||||
|
→ {label} {neighbor_name} ({neighbor_type})
|
||||||
---
|
---
|
||||||
Here are some relevant memories from earlier conversations:
|
Here are some relevant memories from earlier conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
@@ -141,6 +156,12 @@ User: {current message}
|
|||||||
Assistant:
|
Assistant:
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The entity block renders the full graph neighborhood — seed entities matched
|
||||||
|
by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
|
||||||
|
shows its `notes` and any outbound relationships with their targets. Neighbor
|
||||||
|
nodes that have no outbound edges within the subgraph appear without connection
|
||||||
|
lines.
|
||||||
|
|
||||||
## Summarisation
|
## Summarisation
|
||||||
|
|
||||||
After each episode write, `triggerSummary` is called fire-and-forget. It
|
After each episode write, `triggerSummary` is called fire-and-forget. It
|
||||||
@@ -199,4 +220,7 @@ handle /health* { reverse_proxy localhost:4000 }
|
|||||||
|
|
||||||
After updating: `caddy reload --config /path/to/Caddyfile`
|
After updating: `caddy reload --config /path/to/Caddyfile`
|
||||||
|
|
||||||
|
> Note: `/graph` routes are on the memory-service (port 3002) and are called
|
||||||
|
> internally by orchestration — they do not need a Caddy entry.
|
||||||
|
|
||||||
For all HTTP endpoints, see `api-routes.md`.
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
153
docs/services/retrieval-fusion.md
Normal file
153
docs/services/retrieval-fusion.md
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
# Retrieval Fusion
|
||||||
|
|
||||||
|
**Implementation:** `packages/orchestration-service/src/chat/index.js`
|
||||||
|
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
|
||||||
|
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Rather than relying solely on Qdrant vector similarity (which finds semantically
|
||||||
|
related content but misses exact keyword matches) or FTS5 keyword search alone
|
||||||
|
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
|
||||||
|
merges the ranked results from both strategies into a single better-ranked list.
|
||||||
|
|
||||||
|
Episodes that rank highly in **both** lists score highest. An episode that is
|
||||||
|
the top semantic match but irrelevant by keyword, or vice versa, scores lower
|
||||||
|
than one that satisfies both.
|
||||||
|
|
||||||
|
## How RRF Works
|
||||||
|
|
||||||
|
For each episode `d`, its fused score is:
|
||||||
|
|
||||||
|
```
|
||||||
|
RRF(d) = w_semantic / (k + rank_semantic(d))
|
||||||
|
+ w_keyword / (k + rank_keyword(d))
|
||||||
|
```
|
||||||
|
|
||||||
|
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
|
||||||
|
- `k = 60` — smoothing constant (standard; not exposed in settings)
|
||||||
|
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
|
||||||
|
|
||||||
|
Setting a weight to `0` removes that strategy's contribution entirely. Setting
|
||||||
|
`keywordWeight` to `0` also short-circuits the FTS network call.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Fusion lives in orchestration — the service already coordinates multiple data
|
||||||
|
sources, and fusion is a retrieval strategy, not a storage concern.
|
||||||
|
|
||||||
|
```
|
||||||
|
getFusedEpisodes()
|
||||||
|
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
|
||||||
|
│ (existing path, unchanged)
|
||||||
|
└── getFTSResults() — memory-service /episodes/search → full rows directly
|
||||||
|
(skipped entirely if keywordWeight == 0)
|
||||||
|
↓
|
||||||
|
fuseEpisodeResults() — pure RRF, no I/O
|
||||||
|
↓
|
||||||
|
fusedEpisodes[] — top semanticLimit episodes by RRF score
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Shape Consistency
|
||||||
|
|
||||||
|
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
|
||||||
|
the same shape — and both must be filtered against `recentIds` first:
|
||||||
|
|
||||||
|
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
|
||||||
|
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
|
||||||
|
|
||||||
|
FTS requests `semanticLimit * 2` results to provide headroom for the
|
||||||
|
`recentIds` filter without under-serving the fusion.
|
||||||
|
|
||||||
|
## FTS Session Scoping
|
||||||
|
|
||||||
|
Without scoping, FTS5 searches across all episodes in the database. For
|
||||||
|
context assembly, results must be constrained to the current session or
|
||||||
|
project session pool — the same scope used for Qdrant semantic search.
|
||||||
|
|
||||||
|
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
|
||||||
|
optional `sessionIds` array. When provided, the SQL becomes:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
|
WHERE episodes_fts MATCH ?
|
||||||
|
AND e.session_id IN (?, ?, ...)
|
||||||
|
ORDER BY rank
|
||||||
|
LIMIT ?
|
||||||
|
```
|
||||||
|
|
||||||
|
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
|
||||||
|
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
|
||||||
|
|
||||||
|
In orchestration, `ftsSessionIds` is set to:
|
||||||
|
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
|
||||||
|
- `[session.id]` — otherwise (single session only)
|
||||||
|
|
||||||
|
This mirrors the Qdrant scoping logic exactly.
|
||||||
|
|
||||||
|
## `fuseEpisodeResults` — Implementation Detail
|
||||||
|
|
||||||
|
```js
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K; // 60
|
||||||
|
const scores = new Map(); // episode.id → { episode, score }
|
||||||
|
|
||||||
|
// Score semantic results (already filtered against recentIds)
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
|
||||||
|
// Score + merge keyword results (already filtered against recentIds)
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib; // appears in both — sum scores
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
|
||||||
|
}
|
||||||
|
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
|
||||||
|
});
|
||||||
|
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit)
|
||||||
|
.map(({ episode }) => episode);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
|
||||||
|
the result set with a score of 0 when `keywordWeight` is 0 — verified by
|
||||||
|
the test suite.
|
||||||
|
|
||||||
|
## Settings
|
||||||
|
|
||||||
|
| Setting | Default | Range | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
|
||||||
|
|
||||||
|
Both are readable via `GET /settings` and writable via `PATCH /settings`
|
||||||
|
without a service restart. Changes take effect on the next chat request.
|
||||||
|
|
||||||
|
**To enable keyword search:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"keywordWeight": 1.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**To favour keyword matches over semantic:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Constants (`packages/shared/src/config/constants.js`)
|
||||||
|
|
||||||
|
| Constant | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
|
||||||
|
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
|
||||||
|
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
|
||||||
3
package-lock.json
generated
3
package-lock.json
generated
@@ -4224,8 +4224,7 @@
|
|||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@nexusai/shared": "^1.0.0",
|
"@nexusai/shared": "^1.0.0",
|
||||||
"dotenv": "^17.4.0",
|
"dotenv": "^17.4.0",
|
||||||
"express": "^5.2.1",
|
"express": "^5.2.1"
|
||||||
"ollama": "^0.6.3"
|
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"packages/inference-service": {
|
"packages/inference-service": {
|
||||||
|
|||||||
64
packages/embedding-service/CLAUDE.md
Normal file
64
packages/embedding-service/CLAUDE.md
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run embedding # From repo root
|
||||||
|
npm -w packages/embedding-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
|
||||||
|
|
||||||
|
## Single-File Service
|
||||||
|
|
||||||
|
The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `PORT` | `3003` | Port to listen on |
|
||||||
|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
|
||||||
|
| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
|
||||||
|
|
||||||
|
Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
|
||||||
|
|
||||||
|
## Ollama API Details
|
||||||
|
|
||||||
|
Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
|
||||||
|
|
||||||
|
The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
|
||||||
|
|
||||||
|
## Batch Endpoint
|
||||||
|
|
||||||
|
`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
|
||||||
|
|
||||||
|
## Error Responses
|
||||||
|
|
||||||
|
| Condition | Status | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Missing/empty `text` | 400 | |
|
||||||
|
| Ollama call fails | 502 | Upstream failure — correct status |
|
||||||
|
| Empty `texts` array | 400 | |
|
||||||
|
|
||||||
|
## Known Issue
|
||||||
|
|
||||||
|
The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Static response — does not verify Ollama is reachable |
|
||||||
|
| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
|
||||||
|
| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |
|
||||||
@@ -9,7 +9,6 @@
|
|||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@nexusai/shared": "^1.0.0",
|
"@nexusai/shared": "^1.0.0",
|
||||||
"dotenv": "^17.4.0",
|
"dotenv": "^17.4.0",
|
||||||
"express": "^5.2.1",
|
"express": "^5.2.1"
|
||||||
"ollama": "^0.6.3"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -3,7 +3,7 @@ const express = require('express');
|
|||||||
const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');
|
const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '1mb' })); // limit request body to 1mb to prevent abuse - embedding requests should be small
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.EMBEDDING);
|
const PORT = getEnv('PORT', PORTS.EMBEDDING);
|
||||||
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL);
|
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL);
|
||||||
@@ -14,7 +14,8 @@ async function embedText(text) {
|
|||||||
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
|
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify({ model: EMBED_MODEL, input: text })
|
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
|
||||||
|
signal: AbortSignal.timeout(30_000),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
|
|||||||
app.post('/embed', async (req, res) => {
|
app.post('/embed', async (req, res) => {
|
||||||
const { text } = req.body;
|
const { text } = req.body;
|
||||||
if (!text || typeof text !== 'string' || text.trim() === '') {
|
if (!text || typeof text !== 'string' || text.trim() === '') {
|
||||||
return res.status(400).json({ error: 'text is required and must be empty' });
|
return res.status(400).json({ error: 'text is required and must not be empty' });
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
|
|||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
//sequential embedding for now, Ollama doesn't natively parallize embeddings
|
const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
|
||||||
|
if (invalid !== -1)
|
||||||
|
return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
|
||||||
|
|
||||||
const embeddings = [];
|
const embeddings = [];
|
||||||
for (const text of texts) {
|
for (const text of texts) {
|
||||||
embeddings.push(await embedText(text.trim()));
|
embeddings.push(await embedText(text.trim()));
|
||||||
|
|||||||
75
packages/inference-service/CLAUDE.md
Normal file
75
packages/inference-service/CLAUDE.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run inference # From repo root
|
||||||
|
npm -w packages/inference-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
|
||||||
|
|
||||||
|
## Provider Pattern
|
||||||
|
|
||||||
|
`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
|
||||||
|
|
||||||
|
| `INFERENCE_PROVIDER` | Module | Backend |
|
||||||
|
|---|---|---|
|
||||||
|
| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
|
||||||
|
| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
|
||||||
|
|
||||||
|
An unknown provider throws immediately at startup — fail-fast, not at request time.
|
||||||
|
|
||||||
|
Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `PORT` | `3001` | Port to listen on |
|
||||||
|
| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
|
||||||
|
| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
|
||||||
|
| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
|
||||||
|
|
||||||
|
`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
|
||||||
|
|
||||||
|
## Options Resolution
|
||||||
|
|
||||||
|
Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
|
||||||
|
|
||||||
|
## Streaming Chunk Format
|
||||||
|
|
||||||
|
The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
|
||||||
|
|
||||||
|
**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
|
||||||
|
|
||||||
|
**llama.cpp** yields:
|
||||||
|
- Per-token: `{ response: delta, done: false }`
|
||||||
|
- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
|
||||||
|
|
||||||
|
The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
|
||||||
|
|
||||||
|
## Known Issue: `maxTokens` Missing from Streaming Route
|
||||||
|
|
||||||
|
`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
|
||||||
|
|
||||||
|
## SSE Format (route → caller)
|
||||||
|
|
||||||
|
```
|
||||||
|
data: {"response":"Hello"} ← per token
|
||||||
|
data: {"response":" world"}
|
||||||
|
data: {"done":true,"model":"...","tokenCount":42} ← final metadata
|
||||||
|
data: [DONE] ← sentinel
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Returns `{ service, status, provider, model }` |
|
||||||
|
| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
|
||||||
|
| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |
|
||||||
@@ -4,7 +4,7 @@ const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
|
|||||||
const inferenceRouter = require('./routes/inference');
|
const inferenceRouter = require('./routes/inference');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '8mb' })); // prompts include full context window
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.INFERENCE);
|
const PORT = getEnv('PORT', PORTS.INFERENCE);
|
||||||
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
|
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
|
||||||
|
|||||||
@@ -57,8 +57,17 @@ async function* completeStream(prompt, options = {} ) {
|
|||||||
});
|
});
|
||||||
|
|
||||||
for await (const chunk of stream) {
|
for await (const chunk of stream) {
|
||||||
|
if (chunk.done) {
|
||||||
|
yield {
|
||||||
|
response: '',
|
||||||
|
done: true,
|
||||||
|
model: chunk.model,
|
||||||
|
tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
|
||||||
|
};
|
||||||
|
} else {
|
||||||
yield chunk;
|
yield chunk;
|
||||||
}
|
}
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = { complete, completeStream };
|
module.exports = { complete, completeStream };
|
||||||
@@ -23,7 +23,7 @@ router.post('/complete', async (req, res) => {
|
|||||||
|
|
||||||
// Streaming completion endpoint - sends partial responses as they arrive
|
// Streaming completion endpoint - sends partial responses as they arrive
|
||||||
router.post('/complete/stream', async (req, res) => {
|
router.post('/complete/stream', async (req, res) => {
|
||||||
const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
|
const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;
|
||||||
|
|
||||||
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
|
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
|
||||||
|
|
||||||
@@ -35,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
|
|||||||
let lastModel = model;
|
let lastModel = model;
|
||||||
let tokenCount = 0;
|
let tokenCount = 0;
|
||||||
|
|
||||||
for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
|
for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
|
||||||
if (chunk.response) {
|
if (chunk.response) {
|
||||||
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
|
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -19,15 +19,26 @@ Default port: **3002**. Requires Qdrant and the embedding-service to be reachabl
|
|||||||
|
|
||||||
- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
|
- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
|
||||||
- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
|
- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
|
||||||
- Several columns (`sessions.name`, `sessions.project_id`, `projects.isolated`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
|
- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
|
||||||
- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
|
- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
|
||||||
|
- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
|
||||||
|
|
||||||
|
**New columns on `entities` (added via migration):**
|
||||||
|
- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
|
||||||
|
- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
|
||||||
|
- `source TEXT DEFAULT 'extraction'` — `'extraction'` or `'manual'`
|
||||||
|
- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
|
||||||
|
|
||||||
|
**New columns on `relationships` (added via migration):**
|
||||||
|
- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
|
||||||
|
- `notes TEXT` — relationship context sentence from extraction
|
||||||
|
|
||||||
## Async Pipeline: Episode Creation
|
## Async Pipeline: Episode Creation
|
||||||
|
|
||||||
`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
|
`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
|
||||||
|
|
||||||
1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
|
1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
|
||||||
2. **Entity extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 200 tokens) and upserts any recognized entities to both SQLite and Qdrant.
|
2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
|
||||||
|
|
||||||
Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
|
Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
|
||||||
|
|
||||||
@@ -36,26 +47,39 @@ Both tasks catch and log errors silently. An episode can exist in SQLite with no
|
|||||||
`src/entities/extraction.js`:
|
`src/entities/extraction.js`:
|
||||||
|
|
||||||
- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
|
- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
|
||||||
- Recognized types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
|
- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
|
||||||
- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
|
- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
|
||||||
- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
|
- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
|
||||||
- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null (`COALESCE(entities.notes, excluded.notes)`).
|
- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
|
||||||
|
- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
|
||||||
|
- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
|
||||||
|
- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
|
||||||
- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
|
- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
|
||||||
|
|
||||||
|
> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
|
||||||
|
|
||||||
|
## Knowledge Graph
|
||||||
|
|
||||||
|
`src/graph/index.js` provides two SQLite traversal functions:
|
||||||
|
|
||||||
|
- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
|
||||||
|
|
||||||
|
- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
|
||||||
|
|
||||||
|
The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
|
||||||
|
|
||||||
|
> For full design rationale and usage see `docs/services/knowledge-graph.md`.
|
||||||
|
|
||||||
## Summarization Strategy
|
## Summarization Strategy
|
||||||
|
|
||||||
`src/summarization/project.js`:
|
`src/summarization/project.js`:
|
||||||
|
|
||||||
- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
|
- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
|
||||||
- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
|
- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
|
||||||
- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (30,000 chars) by slicing from the end (most recent content wins).
|
- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
|
||||||
- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
|
- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
|
||||||
- Uses temp 0.2 and `num_predict 1200`.
|
- Uses temp 0.2 and `num_predict 1200`.
|
||||||
|
|
||||||
## Known Quirk: `getRecentEpisodes`
|
|
||||||
|
|
||||||
`src/episodic/index.js` `getRecentEpisodes(sessionId, limit)` has a parameter mismatch — the SQLite query binds only `limit`, not `sessionId`, so it returns recent episodes across **all sessions**. Orchestration-service uses `getEpisodesBySession()` (the paginated route) instead, so this bug is not visible in normal operation. Don't rely on `getRecentEpisodes` when you need session-scoped results.
|
|
||||||
|
|
||||||
## Qdrant Client
|
## Qdrant Client
|
||||||
|
|
||||||
`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
|
`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
|
||||||
@@ -68,16 +92,18 @@ Both tasks catch and log errors silently. An episode can exist in SQLite with no
|
|||||||
| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
|
| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
|
||||||
| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
|
| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
|
||||||
| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
|
| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
|
||||||
| GET/POST | `/episodes` | POST triggers async embedding + entity extraction |
|
| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
|
||||||
| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
|
| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
|
||||||
| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
|
| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
|
||||||
| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
|
| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
|
||||||
| POST | `/entities` | Upsert by `(name, type)` |
|
| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
|
||||||
| GET | `/entities/by-type/:type` | All entities of given type |
|
| GET | `/entities/by-type/:type` | All entities of given type |
|
||||||
| GET/DELETE | `/entities/:id` | |
|
| GET/DELETE | `/entities/:id` | |
|
||||||
| POST | `/relationships` | Upsert by `(fromId, toId, label)`; conflict = no-op |
|
| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
|
||||||
| GET | `/entities/:id/relationships` | Outbound only |
|
| GET | `/entities/:id/relationships` | Outbound only |
|
||||||
| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
|
| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
|
||||||
|
| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
|
||||||
|
| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
|
||||||
| GET/POST | `/projects` | POST requires non-empty `name` |
|
| GET/POST | `/projects` | POST requires non-empty `name` |
|
||||||
| GET/PATCH/DELETE | `/projects/:id` | |
|
| GET/PATCH/DELETE | `/projects/:id` | |
|
||||||
| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
|
| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
|
||||||
|
|||||||
@@ -54,9 +54,14 @@ function getDB() {
|
|||||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
|
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
|
||||||
} catch {}
|
} catch {}
|
||||||
|
|
||||||
try {
|
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
|
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||||
} catch {}
|
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||||
|
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||||
|
|
||||||
|
|
||||||
// Sync FTS index with any existing episodes data
|
// Sync FTS index with any existing episodes data
|
||||||
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
|
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
|
||||||
|
|||||||
@@ -38,6 +38,20 @@ const schema = `
|
|||||||
UNIQUE(from_id, to_id, label)
|
UNIQUE(from_id, to_id, label)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_id);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||||
|
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||||
|
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (entity_id, episode_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity ON entity_episodes(entity_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS projects (
|
CREATE TABLE IF NOT EXISTS projects (
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
name TEXT NOT NULL,
|
name TEXT NOT NULL,
|
||||||
|
|||||||
@@ -1,14 +1,18 @@
|
|||||||
const semantic = require('../semantic')
|
const semantic = require('../semantic')
|
||||||
const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
|
const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
|
||||||
const { upsertEntity } = require('./index');
|
const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');
|
||||||
|
|
||||||
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
|
||||||
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
|
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
|
||||||
|
|
||||||
const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
|
const ENTITY_TYPES = ENTITIES.TYPES;
|
||||||
const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];
|
const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];
|
||||||
|
|
||||||
|
// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
|
||||||
|
// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
|
||||||
|
// or other model, this format will need to change — most alternatives use either
|
||||||
|
// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
|
||||||
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
||||||
const knownBlock = knownEntities.length > 0
|
const knownBlock = knownEntities.length > 0
|
||||||
? [
|
? [
|
||||||
@@ -20,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
|||||||
|
|
||||||
return [
|
return [
|
||||||
'<|im_start|>system',
|
'<|im_start|>system',
|
||||||
'You are a named entity extractor. You output only valid JSON.',
|
'You are a named entity and relationship extractor. You output only valid JSON.',
|
||||||
'<|im_end|>',
|
'<|im_end|>',
|
||||||
'<|im_start|>user',
|
'<|im_start|>user',
|
||||||
'Read the conversation below and extract every named entity mentioned.',
|
'Read the conversation below and extract all named entities and the relationships between them.',
|
||||||
`Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
|
`Entity types: ${ENTITY_TYPES.join(', ')}`,
|
||||||
'For each entity found, provide: name, type, and a one-sentence notes field.',
|
'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
|
||||||
'Return your answer as: { "entities": [ ... ] }',
|
'Use "person" only for real people',
|
||||||
'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
|
'For each entity provide:',
|
||||||
'For each entity found, provide:',
|
' "name": short proper noun only (max 4 words)',
|
||||||
' "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
|
|
||||||
' "type": one of the valid types',
|
' "type": one of the valid types',
|
||||||
' "notes": one specific sentence about this entity based on the conversation (not generic)',
|
' "notes": one specific sentence about this entity based on the conversation',
|
||||||
|
'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
|
||||||
|
'Only include relationships between entities you have listed above.',
|
||||||
|
'Return this exact JSON structure:',
|
||||||
|
'{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
|
||||||
'',
|
'',
|
||||||
knownBlock,
|
knownBlock,
|
||||||
'--- CONVERSATION ---', // clear delimiter helps smaller models
|
'--- CONVERSATION ---',
|
||||||
`User: ${userMessage}`,
|
`User: ${userMessage}`,
|
||||||
`Assistant: ${aiResponse}`,
|
`Assistant: ${aiResponse}`,
|
||||||
'--- END CONVERSATION ---',
|
'--- END CONVERSATION ---',
|
||||||
@@ -58,7 +65,7 @@ async function embedEntity(entity) {
|
|||||||
return data.embedding;
|
return data.embedding;
|
||||||
}
|
}
|
||||||
|
|
||||||
async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
|
async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
|
||||||
logger.info('[entities] Extraction triggered')
|
logger.info('[entities] Extraction triggered')
|
||||||
try {
|
try {
|
||||||
// Fetch existing entities to guide the model toward consistent name/type pairs
|
// Fetch existing entities to guide the model toward consistent name/type pairs
|
||||||
@@ -80,6 +87,7 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
|||||||
num_predict: ENTITIES.NUM_PREDICT,
|
num_predict: ENTITIES.NUM_PREDICT,
|
||||||
},
|
},
|
||||||
}),
|
}),
|
||||||
|
signal: AbortSignal.timeout(60_000),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||||
@@ -105,21 +113,23 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
|||||||
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
||||||
if (entities.length === 0) {
|
if (entities.length === 0) {
|
||||||
logger.debug('[entities] No entities found in this exchange — skipping');
|
logger.debug('[entities] No entities found in this exchange — skipping');
|
||||||
return; // not an error, just nothing to extract
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Map of "name::type" → saved entity, used for relationship resolution below
|
||||||
|
const entityMap = new Map();
|
||||||
let saved = 0;
|
let saved = 0;
|
||||||
|
|
||||||
|
|
||||||
for (const { name, type, notes } of entities) {
|
for (const { name, type, notes } of entities) {
|
||||||
|
|
||||||
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
|
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
|
||||||
if (IGNORED_NAMES.includes(name.toLowerCase())) continue;
|
if (IGNORED_NAMES.includes(name.toLowerCase())) continue;
|
||||||
|
|
||||||
const entity = upsertEntity(name, type, notes ?? null);
|
const entity = upsertEntity(name, type, notes ?? null);
|
||||||
|
entityMap.set(`${name}::${type}`, entity);
|
||||||
logger.info('[entities] Upserted entity:', entity);
|
logger.info('[entities] Upserted entity:', entity);
|
||||||
|
|
||||||
// Embed and upsert to Qdrant fire-and-forget
|
if (episodeId) linkEntityToEpisode(entity.id, episodeId);
|
||||||
|
|
||||||
embedEntity(entity)
|
embedEntity(entity)
|
||||||
.then(vector => semantic.upsertEntity(entity.id, vector, {
|
.then(vector => semantic.upsertEntity(entity.id, vector, {
|
||||||
name: entity.name,
|
name: entity.name,
|
||||||
@@ -136,6 +146,23 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
|||||||
|
|
||||||
if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
|
if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
|
||||||
|
|
||||||
|
// Process extracted relationships — both entities must have been saved above
|
||||||
|
const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
|
||||||
|
let relSaved = 0;
|
||||||
|
|
||||||
|
for (const { from, fromType, to, toType, label, notes } of relationships) {
|
||||||
|
if (!from || !fromType || !to || !toType || !label) continue;
|
||||||
|
|
||||||
|
const fromEntity = entityMap.get(`${from}::${fromType}`);
|
||||||
|
const toEntity = entityMap.get(`${to}::${toType}`);
|
||||||
|
if (!fromEntity || !toEntity) continue;
|
||||||
|
|
||||||
|
upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
|
||||||
|
relSaved++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);
|
||||||
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
// Non-critical — log and move on, episode is already saved
|
// Non-critical — log and move on, episode is already saved
|
||||||
logger.warn('[entities] Extraction failed:', err.message);
|
logger.warn('[entities] Extraction failed:', err.message);
|
||||||
|
|||||||
@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
|
|||||||
/******* Entities ********/
|
/******* Entities ********/
|
||||||
|
|
||||||
// Upsert an entity - insert or update if (name, type) already exists
|
// Upsert an entity - insert or update if (name, type) already exists
|
||||||
function upsertEntity(name, type, notes = null, metadata = null) {
|
function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
INSERT INTO entities (name, type, notes, metadata)
|
INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||||
ON CONFLICT(name, type) DO UPDATE SET
|
ON CONFLICT(name, type) DO UPDATE SET
|
||||||
|
-- First extraction wins: notes are never overwritten once set.
|
||||||
|
-- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
|
||||||
|
-- quality scoring exists, a higher-confidence extraction should be able
|
||||||
|
-- to replace stale notes rather than being silently dropped.
|
||||||
notes = COALESCE(entities.notes, excluded.notes),
|
notes = COALESCE(entities.notes, excluded.notes),
|
||||||
metadata = excluded.metadata,
|
metadata = excluded.metadata,
|
||||||
|
mention_count = entities.mention_count + 1,
|
||||||
|
last_seen_at = unixepoch(),
|
||||||
updated_at = unixepoch()
|
updated_at = unixepoch()
|
||||||
`);
|
`);
|
||||||
const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
|
stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
|
||||||
|
|
||||||
return getEntityByNameType(name, type);
|
return getEntityByNameType(name, type);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -40,15 +45,17 @@ function deleteEntity(id) {
|
|||||||
/********* Relationships *********/
|
/********* Relationships *********/
|
||||||
|
|
||||||
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
|
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
|
||||||
function upsertRelationship(fromId, toId, label, metadata = null){
|
function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
INSERT INTO relationships (from_id, to_id, label, metadata)
|
INSERT INTO relationships (from_id, to_id, label, notes, metadata)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?)
|
||||||
ON CONFLICT(from_id, to_id, label) DO NOTHING
|
ON CONFLICT(from_id, to_id, label) DO UPDATE SET
|
||||||
|
mention_count = relationships.mention_count + 1,
|
||||||
|
-- First extraction wins for notes — same policy as entities.
|
||||||
|
notes = COALESCE(relationships.notes, excluded.notes)
|
||||||
`);
|
`);
|
||||||
|
stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
|
||||||
const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
|
|
||||||
return getRelationship(fromId, toId, label);
|
return getRelationship(fromId, toId, label);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Retrive all relationships originating from a given entity
|
// Retrive all relationships originating from a given entity
|
||||||
function getRelationshipsByEntity(entityId) {
|
function getOutboundRelationships(entityId) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
|
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
|
||||||
}
|
}
|
||||||
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
|
|||||||
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
|
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function linkEntityToEpisode(entityId, episodeId) {
|
||||||
|
const db = getDB();
|
||||||
|
db.prepare(`
|
||||||
|
INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
|
||||||
|
VALUES (?, ?)
|
||||||
|
`).run(entityId, episodeId);
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
upsertEntity,
|
upsertEntity,
|
||||||
getEntity,
|
getEntity,
|
||||||
getEntitiesByType,
|
getEntitiesByType,
|
||||||
getEntityByNameType,
|
getEntityByNameType,
|
||||||
deleteEntity,
|
deleteEntity,
|
||||||
|
linkEntityToEpisode,
|
||||||
upsertRelationship,
|
upsertRelationship,
|
||||||
getRelationship,
|
getRelationship,
|
||||||
getRelationshipsByEntity,
|
getOutboundRelationships,
|
||||||
deleteRelationship
|
deleteRelationship
|
||||||
}
|
}
|
||||||
@@ -127,7 +127,7 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
|
|||||||
}))
|
}))
|
||||||
.catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));
|
.catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));
|
||||||
|
|
||||||
extractAndStoreEntities(userMessage, aiResponse, projectId)
|
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||||
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||||
|
|
||||||
|
|
||||||
@@ -168,17 +168,27 @@ function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
|
|||||||
|
|
||||||
|
|
||||||
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
|
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
|
||||||
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
|
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
|
||||||
// FTS5 full-text search across all episodes
|
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const safeQuery = `"${query.replace(/"/g, '""')}"`;
|
||||||
|
if (sessionIds && sessionIds.length > 0) {
|
||||||
|
const ph = sessionIds.map(() => '?').join(',');
|
||||||
|
return db.prepare(`
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
|
WHERE episodes_fts MATCH ?
|
||||||
|
AND e.session_id IN (${ph})
|
||||||
|
ORDER BY rank
|
||||||
|
LIMIT ?
|
||||||
|
`).all(safeQuery, ...sessionIds, limit).map(parseRow);
|
||||||
|
}
|
||||||
|
return db.prepare(`
|
||||||
SELECT e.* FROM episodes e
|
SELECT e.* FROM episodes e
|
||||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
WHERE episodes_fts MATCH ?
|
WHERE episodes_fts MATCH ?
|
||||||
ORDER BY rank
|
ORDER BY rank
|
||||||
LIMIT ?
|
LIMIT ?
|
||||||
`);
|
`).all(safeQuery, limit).map(parseRow);
|
||||||
return stmt.all(query, limit).map(parseRow);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Deletes an episode by its ID
|
// Deletes an episode by its ID
|
||||||
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
|
|||||||
const res = await fetch(`${url}/embed`, {
|
const res = await fetch(`${url}/embed`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify({ text })
|
body: JSON.stringify({ text }),
|
||||||
|
signal: AbortSignal.timeout(30_000),
|
||||||
})
|
})
|
||||||
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
|
|||||||
77
packages/memory-service/src/graph/index.js
Normal file
77
packages/memory-service/src/graph/index.js
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
const { getDB } = require('../db');
|
||||||
|
const { parseRow, ENTITIES } = require('@nexusai/shared');
|
||||||
|
|
||||||
|
// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
|
||||||
|
function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
|
||||||
|
const db = getDB();
|
||||||
|
|
||||||
|
const nodeRows = db.prepare(`
|
||||||
|
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||||
|
SELECT ?, 0
|
||||||
|
UNION
|
||||||
|
SELECT
|
||||||
|
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||||
|
t.depth + 1
|
||||||
|
FROM relationships r
|
||||||
|
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||||
|
WHERE t.depth < ?
|
||||||
|
)
|
||||||
|
SELECT DISTINCT entity_id FROM traverse
|
||||||
|
`).all(entityId, depth);
|
||||||
|
|
||||||
|
const nodeIds = nodeRows.map(r => r.entity_id);
|
||||||
|
if (nodeIds.length === 0) return { nodes: [], edges: [] };
|
||||||
|
|
||||||
|
const ph = nodeIds.map(() => '?').join(',');
|
||||||
|
const nodes = db.prepare(
|
||||||
|
`SELECT * FROM entities WHERE id IN (${ph})`
|
||||||
|
).all(...nodeIds).map(parseRow);
|
||||||
|
|
||||||
|
const edges = db.prepare(
|
||||||
|
`SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
|
||||||
|
).all(...nodeIds, ...nodeIds).map(parseRow);
|
||||||
|
|
||||||
|
return { nodes, edges };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
|
||||||
|
function getEntityNeighbors(entityIds) {
|
||||||
|
if (!entityIds.length) return { nodes: [], edges: [] };
|
||||||
|
const db = getDB();
|
||||||
|
|
||||||
|
const ph = entityIds.map(() => '?').join(',');
|
||||||
|
|
||||||
|
// entityIds appears three times — once for the CASE (finding the neighbor),
|
||||||
|
// and once each for the FROM and TO sides of the WHERE clause
|
||||||
|
const neighborRows = db.prepare(`
|
||||||
|
SELECT DISTINCT
|
||||||
|
CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
|
||||||
|
FROM relationships
|
||||||
|
WHERE from_id IN (${ph}) OR to_id IN (${ph})
|
||||||
|
`).all(...entityIds, ...entityIds, ...entityIds);
|
||||||
|
|
||||||
|
const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
|
||||||
|
const allPh = allIds.map(() => '?').join(',');
|
||||||
|
|
||||||
|
const nodes = db.prepare(
|
||||||
|
`SELECT * FROM entities WHERE id IN (${allPh})`
|
||||||
|
).all(...allIds).map(parseRow);
|
||||||
|
|
||||||
|
const edges = db.prepare(
|
||||||
|
`SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
|
||||||
|
).all(...allIds, ...allIds).map(parseRow);
|
||||||
|
|
||||||
|
return { nodes, edges };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns episode IDs linked to any of the given entity IDs via entity_episodes
|
||||||
|
function getEpisodeIdsByEntities(entityIds) {
|
||||||
|
if (!entityIds.length) return [];
|
||||||
|
const db = getDB();
|
||||||
|
const ph = entityIds.map(() => '?').join(',');
|
||||||
|
return db.prepare(
|
||||||
|
`SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
|
||||||
|
).all(...entityIds).map(r => r.episode_id);
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };
|
||||||
@@ -5,13 +5,14 @@ const { getDB } = require('./db');
|
|||||||
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
|
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
|
||||||
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
|
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
|
||||||
const { generateAndStoreProjectSummary } = require('./summarization/project');
|
const { generateAndStoreProjectSummary } = require('./summarization/project');
|
||||||
|
const graph = require('./graph');
|
||||||
|
|
||||||
const episodic = require('./episodic');
|
const episodic = require('./episodic');
|
||||||
const semantic = require('./semantic');
|
const semantic = require('./semantic');
|
||||||
const entities = require('./entities');
|
const entities = require('./entities');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '2mb' }));
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.MEMORY);
|
const PORT = getEnv('PORT', PORTS.MEMORY);
|
||||||
|
|
||||||
@@ -130,10 +131,12 @@ app.get('/episodes', (req, res) => {
|
|||||||
|
|
||||||
// Search MUST come before /:id — otherwise 'search' gets captured as an id
|
// Search MUST come before /:id — otherwise 'search' gets captured as an id
|
||||||
app.get('/episodes/search', (req, res) => {
|
app.get('/episodes/search', (req, res) => {
|
||||||
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
|
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
|
||||||
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
|
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
|
||||||
const results = episodic.searchEpisodes(q, Number(limit));
|
const parsedSessionIds = sessionIds
|
||||||
res.json(results);
|
? sessionIds.split(',').map(Number).filter(Boolean)
|
||||||
|
: null;
|
||||||
|
res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
|
||||||
});
|
});
|
||||||
|
|
||||||
app.get('/episodes/:id', (req, res) => {
|
app.get('/episodes/:id', (req, res) => {
|
||||||
@@ -202,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {
|
|||||||
|
|
||||||
// Upsert a relationship between two entities
|
// Upsert a relationship between two entities
|
||||||
app.post('/relationships', (req, res) => {
|
app.post('/relationships', (req, res) => {
|
||||||
const {fromId, toId, label, metadata } = req.body;
|
const { fromId, toId, label, notes, metadata } = req.body;
|
||||||
if (!fromId || !toId || !label) {
|
if (!fromId || !toId || !label) {
|
||||||
return res.status(400).json({ error: 'fromId, toId and label are required' });
|
return res.status(400).json({ error: 'fromId, toId and label are required' });
|
||||||
}
|
}
|
||||||
const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
|
const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
|
||||||
res.status(201).json(relationship);
|
res.status(201).json(relationship);
|
||||||
});
|
});
|
||||||
|
|
||||||
// Get all relationships for a given entity ID
|
// Get all relationships for a given entity ID
|
||||||
app.get('/entities/:id/relationships', (req, res) => {
|
app.get('/entities/:id/relationships', (req, res) => {
|
||||||
res.json(entities.getRelationshipsByEntity(req.params.id));
|
res.json(entities.getOutboundRelationships(req.params.id));
|
||||||
});
|
});
|
||||||
|
|
||||||
// Delete a specific relationship
|
// Delete a specific relationship
|
||||||
@@ -225,6 +228,37 @@ app.delete('/relationships', (req, res) => {
|
|||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
})
|
})
|
||||||
|
|
||||||
|
/********************************* */
|
||||||
|
/********** Graph Routes ********** */
|
||||||
|
/********************************* */
|
||||||
|
|
||||||
|
// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
|
||||||
|
app.get('/graph/neighborhood/:entityId', (req, res) => {
|
||||||
|
const entity = entities.getEntity(req.params.entityId);
|
||||||
|
if (!entity) return res.status(404).json({ error: 'Entity not found' });
|
||||||
|
|
||||||
|
const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
|
||||||
|
const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
|
||||||
|
res.json({ entity, neighborhood });
|
||||||
|
});
|
||||||
|
|
||||||
|
// Bulk 1-hop neighborhood — body: { entityIds: [...] }
|
||||||
|
app.post('/graph/neighbors', (req, res) => {
|
||||||
|
const { entityIds } = req.body;
|
||||||
|
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||||
|
return res.status(400).json({ error: 'entityIds array is required' });
|
||||||
|
}
|
||||||
|
res.json(graph.getEntityNeighbors(entityIds.map(Number)));
|
||||||
|
});
|
||||||
|
|
||||||
|
app.post('/episodes/by-entities', (req, res) => {
|
||||||
|
const { entityIds } = req.body;
|
||||||
|
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||||
|
return res.status(400).json({ error: 'entityIds array is required' });
|
||||||
|
}
|
||||||
|
res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
|
||||||
|
});
|
||||||
|
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
/********** Project Routes ********** */
|
/********** Project Routes ********** */
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
|
|||||||
156
packages/orchestration-service/CLAUDE.md
Normal file
156
packages/orchestration-service/CLAUDE.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run orchestration # From repo root (node src/index.js)
|
||||||
|
npm -w packages/orchestration-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
|
||||||
|
|
||||||
|
## Context Assembly (`src/chat/index.js`)
|
||||||
|
|
||||||
|
`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
|
||||||
|
|
||||||
|
1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
|
||||||
|
2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
|
||||||
|
3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
|
||||||
|
4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
|
||||||
|
- No project: `must: [sessionId == this session]`
|
||||||
|
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
|
||||||
|
- Dedup against recent episode IDs before including.
|
||||||
|
5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
|
||||||
|
6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
|
||||||
|
7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
|
||||||
|
8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
|
||||||
|
|
||||||
|
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
|
||||||
|
|
||||||
|
## Graph Context Format
|
||||||
|
|
||||||
|
`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
|
||||||
|
|
||||||
|
```
|
||||||
|
- Alice (person): software engineer working on NexusAI
|
||||||
|
→ works_on NexusAI (project)
|
||||||
|
→ knows Bob (person)
|
||||||
|
- NexusAI (project): AI assistant framework
|
||||||
|
- Bob (person): Alice's colleague
|
||||||
|
```
|
||||||
|
|
||||||
|
Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
|
||||||
|
|
||||||
|
## System Prompt Resolution
|
||||||
|
|
||||||
|
Priority from highest to lowest:
|
||||||
|
1. `project.system_prompt` (stored on the project row in memory-service)
|
||||||
|
2. `settings.systemPrompt` (saved in `data/settings.json`)
|
||||||
|
3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
|
||||||
|
|
||||||
|
## Settings (`src/config/settings.js`)
|
||||||
|
|
||||||
|
Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
|
||||||
|
|
||||||
|
| Field | Constraint |
|
||||||
|
|---|---|
|
||||||
|
| `recentEpisodeLimit` | integer, 1–20 |
|
||||||
|
| `semanticLimit` | integer, 1–20 |
|
||||||
|
| `scoreThreshold` | number, 0–1 |
|
||||||
|
| `temperature` | number, 0–2 |
|
||||||
|
| `repeatPenalty` | number, 1–2 |
|
||||||
|
| `topP` | number, 0–1 |
|
||||||
|
| `topK` | integer, 1–100 |
|
||||||
|
| `modelsFolderPath` | path must exist and be readable |
|
||||||
|
| `systemPrompt` | string (trimmed); `null` reverts to shared default |
|
||||||
|
|
||||||
|
`data/settings.json` is created on first save. Parent directories are created if missing.
|
||||||
|
|
||||||
|
## Streaming SSE (`src/chat/index.js` — `chatStream`)
|
||||||
|
|
||||||
|
The route sets SSE headers and delegates to `chatStream`, which:
|
||||||
|
1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
|
||||||
|
2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
|
||||||
|
3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
|
||||||
|
4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
|
||||||
|
5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
|
||||||
|
|
||||||
|
If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
|
||||||
|
|
||||||
|
## Fire-and-Forget Tasks
|
||||||
|
|
||||||
|
After every successful chat turn:
|
||||||
|
- **Summarization** (`services/summarization.js` → `triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
|
||||||
|
- **Auto-naming** (`chat/index.js` → `autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
|
||||||
|
|
||||||
|
Both tasks catch all errors and log warnings without surfacing to the client.
|
||||||
|
|
||||||
|
## Summarization Recency Guard
|
||||||
|
|
||||||
|
`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
|
||||||
|
|
||||||
|
When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
|
||||||
|
|
||||||
|
## Qdrant Calls (Direct, Not Via Memory-Service)
|
||||||
|
|
||||||
|
`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
|
||||||
|
|
||||||
|
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
|
||||||
|
|
||||||
|
## Retrieval Fusion (`src/chat/index.js`)
|
||||||
|
|
||||||
|
Three functions handle fusion — all pure or lightly async, all non-critical:
|
||||||
|
|
||||||
|
- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
|
||||||
|
- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
|
||||||
|
- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
|
||||||
|
|
||||||
|
FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
|
||||||
|
|
||||||
|
> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
|
||||||
|
|
||||||
|
## Graph Service Client (`src/services/graph.js`)
|
||||||
|
|
||||||
|
Thin HTTP client for memory-service graph endpoints. One function:
|
||||||
|
|
||||||
|
- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
|
||||||
|
|
||||||
|
## Models Endpoint
|
||||||
|
|
||||||
|
`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
|
||||||
|
|
||||||
|
`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
|
||||||
|
|
||||||
|
## Health Check
|
||||||
|
|
||||||
|
`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
|
||||||
|
|
||||||
|
## Background Model (qwen2.5:3b)
|
||||||
|
Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
|
||||||
|
|
||||||
|
## API Endpoints Quick Reference
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Returns service URLs |
|
||||||
|
| GET | `/health/services` | Parallel status of all dependencies |
|
||||||
|
| POST | `/chat` | Blocking completion |
|
||||||
|
| POST | `/chat/stream` | SSE streaming |
|
||||||
|
| GET/PATCH | `/settings` | Persistent settings |
|
||||||
|
| GET | `/models` | `.gguf` file scan |
|
||||||
|
| GET | `/models/props` | llama-server model info |
|
||||||
|
| GET | `/sessions` | Delegates to memory-service |
|
||||||
|
| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
|
||||||
|
| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
|
||||||
|
| DELETE | `/sessions/:sessionId` | |
|
||||||
|
| GET | `/episodes` | Delegates; supports `q` for FTS |
|
||||||
|
| DELETE | `/episodes/:id` | Delegates |
|
||||||
|
| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
|
||||||
|
| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
|
||||||
|
| GET | `/summaries/project/:projectId/overview` | |
|
||||||
|
| GET | `/summaries/session/:sessionId` | Resolves external ID first |
|
||||||
|
| GET | `/summaries/project/:projectId` | |
|
||||||
@@ -2,34 +2,32 @@ const memory = require("../services/memory");
|
|||||||
const inference = require("../services/inference");
|
const inference = require("../services/inference");
|
||||||
const embedding = require("../services/embedding");
|
const embedding = require("../services/embedding");
|
||||||
const qdrant = require("../services/qdrant");
|
const qdrant = require("../services/qdrant");
|
||||||
const { ORCHESTRATION, logger } = require("@nexusai/shared");
|
const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
|
||||||
const appSettings = require("../config/settings");
|
const appSettings = require("../config/settings");
|
||||||
const {triggerSummary} = require('../services/summarization')
|
const {triggerSummary} = require('../services/summarization')
|
||||||
|
const graph = require('../services/graph');
|
||||||
|
|
||||||
function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
|
function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
|
||||||
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
|
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
|
||||||
|
|
||||||
if (entities.length > 0) {
|
const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
|
||||||
parts.push(
|
if (graphText) {
|
||||||
"Here is what you know about entities relevant to this conversation:",
|
parts.push("Here is what you know about entities relevant to this conversation and their connections:");
|
||||||
);
|
parts.push(graphText);
|
||||||
for (const e of entities) {
|
|
||||||
parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
|
|
||||||
}
|
|
||||||
parts.push("---");
|
parts.push("---");
|
||||||
}
|
}
|
||||||
|
|
||||||
if (semanticEpisodes.length > 0) {
|
if (selected.length > 0) {
|
||||||
parts.push("Here are some relevant memories from earlier conversations:");
|
parts.push("Relevant memories from earlier conversations:");
|
||||||
for (const ep of semanticEpisodes) {
|
for (const ep of selected) {
|
||||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||||
}
|
}
|
||||||
parts.push("---");
|
parts.push("---");
|
||||||
}
|
}
|
||||||
|
|
||||||
if (recentEpisodes.length > 0) {
|
if (guaranteed.length > 0) {
|
||||||
parts.push(`Here are some relevant memories from your past conversations:`);
|
parts.push("Recent conversation history (most recent exchanges):");
|
||||||
for (const ep of recentEpisodes) {
|
for (const ep of guaranteed) {
|
||||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||||
}
|
}
|
||||||
parts.push("--- End of recent memories ---\n");
|
parts.push("--- End of recent memories ---\n");
|
||||||
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
|
|||||||
].join("\n");
|
].join("\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function formatGraphContext(nodes, edges) {
|
||||||
|
if (!nodes.length) return null;
|
||||||
|
|
||||||
|
const nodeMap = new Map(nodes.map(n => [n.id, n]));
|
||||||
|
|
||||||
|
// Build outbound adjacency
|
||||||
|
const outbound = new Map(nodes.map(n => [n.id, []]));
|
||||||
|
for (const edge of edges) {
|
||||||
|
if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
|
||||||
|
const target = nodeMap.get(edge.to_id);
|
||||||
|
outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nodes.map(n => {
|
||||||
|
const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
|
||||||
|
for (const conn of outbound.get(n.id) ?? []) lines.push(` → ${conn}`);
|
||||||
|
return lines.join('\n');
|
||||||
|
}).join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
async function autoNameSession(externalId, userMessage, aiResponse) {
|
async function autoNameSession(externalId, userMessage, aiResponse) {
|
||||||
try {
|
try {
|
||||||
const prompt = buildNamingPrompt(userMessage, aiResponse);
|
const prompt = buildNamingPrompt(userMessage, aiResponse);
|
||||||
@@ -107,28 +127,136 @@ async function getSemanticEpisodes(
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async function getRelevantEntities(userMessage, projectId=null) {
|
async function getRelevantEntities(userMessage, projectId = null) {
|
||||||
try {
|
try {
|
||||||
const vector = await embedding.embed(userMessage);
|
const vector = await embedding.embed(userMessage);
|
||||||
const results = await qdrant.searchEntities(vector, { projectId });
|
const results = await qdrant.searchEntities(vector, { projectId });
|
||||||
logger.info(
|
logger.info(
|
||||||
"[orchestration] Entity search results:",
|
'[orchestration] Entity search results:',
|
||||||
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
||||||
);
|
);
|
||||||
return results.map((r) => r.payload).filter(Boolean);
|
// Include the Qdrant point ID (== SQLite entity ID) for graph traversal
|
||||||
|
return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
logger.debug(
|
logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
|
||||||
"[orchestration] Entity search failed, continuing without:",
|
|
||||||
err.message,
|
|
||||||
);
|
|
||||||
return [];
|
return [];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function getFTSResults(userMessage, { limit, sessionIds }) {
|
||||||
|
try {
|
||||||
|
return await memory.searchEpisodes(userMessage, { limit, sessionIds });
|
||||||
|
} catch (err) {
|
||||||
|
logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns {episode, score}[] — scores needed for buildScoredPool downstream
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const scores = new Map();
|
||||||
|
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib;
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
function estimateTokens(episode) {
|
||||||
|
return episode.token_count
|
||||||
|
?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const pool = new Map(); // episode.id → {episode, score}
|
||||||
|
|
||||||
|
for (const { episode, score } of fusedWithScores) {
|
||||||
|
pool.set(episode.id, { episode, score });
|
||||||
|
}
|
||||||
|
|
||||||
|
recentEpisodes.forEach((ep, i) => {
|
||||||
|
const recencyScore = 1.0 / (k + i + 1);
|
||||||
|
if (pool.has(ep.id)) {
|
||||||
|
pool.get(ep.id).score += recencyScore;
|
||||||
|
} else {
|
||||||
|
pool.set(ep.id, { episode: ep, score: recencyScore });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
for (const id of entityBoostedIds) {
|
||||||
|
if (pool.has(id)) pool.get(id).score += entityWeight;
|
||||||
|
}
|
||||||
|
|
||||||
|
return [...pool.values()].sort((a, b) => b.score - a.score);
|
||||||
|
}
|
||||||
|
|
||||||
|
function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
|
||||||
|
let budget = contextBudget;
|
||||||
|
const sortByTime = (a, b) => a.created_at - b.created_at;
|
||||||
|
|
||||||
|
// Guarantee floor: always include the N most recent episodes
|
||||||
|
const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
|
||||||
|
const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
|
||||||
|
for (const ep of guaranteed) budget -= estimateTokens(ep);
|
||||||
|
|
||||||
|
// Fill remaining budget from scored pool, highest-priority first
|
||||||
|
const selected = [];
|
||||||
|
for (const { episode } of scoredPool) {
|
||||||
|
if (guaranteedIds.has(episode.id)) continue;
|
||||||
|
const cost = estimateTokens(episode);
|
||||||
|
|
||||||
|
// // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
|
||||||
|
if (budget - cost < 0) break;
|
||||||
|
selected.push(episode);
|
||||||
|
budget -= cost;
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
guaranteed: [...guaranteed].sort(sortByTime),
|
||||||
|
selected: selected.sort(sortByTime),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
|
||||||
|
const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
|
||||||
|
const ftsSessionIds = projectSessionIds ?? [session.id];
|
||||||
|
|
||||||
|
const ftsPromise = keywordWeight > 0
|
||||||
|
// FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
|
||||||
|
? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
|
||||||
|
: Promise.resolve([]);
|
||||||
|
|
||||||
|
const [semanticEps, rawKeywordEps] = await Promise.all([
|
||||||
|
getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
|
||||||
|
ftsPromise,
|
||||||
|
]);
|
||||||
|
|
||||||
|
const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
|
||||||
|
return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
|
||||||
|
}
|
||||||
|
|
||||||
async function assembleContext(externalId, userMessage) {
|
async function assembleContext(externalId, userMessage) {
|
||||||
const settings = appSettings.load();
|
const settings = appSettings.load();
|
||||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold,
|
const { recentEpisodeLimit, semanticLimit, scoreThreshold,
|
||||||
temperature, repeatPenalty, topP, topK, systemPrompt } = settings;
|
temperature, repeatPenalty, topP, topK, systemPrompt,
|
||||||
|
semanticWeight, keywordWeight,
|
||||||
|
contextBudget, entityWeight, minRecentEpisodes } = settings;
|
||||||
|
|
||||||
// 1. Resolve or create session
|
// 1. Resolve or create session
|
||||||
let session = await memory.getSessionByExternalId(externalId);
|
let session = await memory.getSessionByExternalId(externalId);
|
||||||
@@ -155,14 +283,41 @@ async function assembleContext(externalId, userMessage) {
|
|||||||
const isFirstMessage = recentEpisodes.length === 0;
|
const isFirstMessage = recentEpisodes.length === 0;
|
||||||
const recentIds = new Set(recentEpisodes.map(e => e.id));
|
const recentIds = new Set(recentEpisodes.map(e => e.id));
|
||||||
|
|
||||||
// 4. Semantic + entity search
|
// 4. Fused retrieval + entity search in parallel (both are independent)
|
||||||
const semanticEpisodes = await getSemanticEpisodes(
|
const [fusedWithScores, entityResults] = await Promise.all([
|
||||||
userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }
|
getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
|
||||||
);
|
getRelevantEntities(userMessage, session.project_id ?? null),
|
||||||
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
|
]);
|
||||||
|
|
||||||
// 5. Assemble prompt
|
// 5. Entity-linked episode IDs for scoring bonus
|
||||||
const prompt = buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, activeSystemPrompt);
|
const entityIds = entityResults.map(e => e.id);
|
||||||
|
let entityBoostedIds = new Set();
|
||||||
|
if (entityIds.length > 0) {
|
||||||
|
try {
|
||||||
|
const result = await memory.getEpisodesByEntities(entityIds);
|
||||||
|
entityBoostedIds = new Set(result.episodeIds);
|
||||||
|
} catch (err) {
|
||||||
|
logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 6. Build unified scored pool and select within token budget
|
||||||
|
const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
|
||||||
|
const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);
|
||||||
|
|
||||||
|
// 7. Graph neighborhood expansion
|
||||||
|
let neighborhood = { nodes: [], edges: [] };
|
||||||
|
if (entityIds.length > 0) {
|
||||||
|
try {
|
||||||
|
neighborhood = await graph.getNeighbors(entityIds);
|
||||||
|
} catch (err) {
|
||||||
|
logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
|
||||||
|
neighborhood = { nodes: entityResults, edges: [] };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// 8. Assemble prompt
|
||||||
|
const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
|
||||||
|
|
||||||
return {
|
return {
|
||||||
session,
|
session,
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
const fs = require('fs');
|
const fs = require('fs');
|
||||||
const path = require('path');
|
const path = require('path');
|
||||||
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
|
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');
|
||||||
|
|
||||||
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
|
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
|
||||||
|
|
||||||
@@ -14,6 +14,11 @@ const DEFAULTS = {
|
|||||||
topP: INFERENCE_DEFAULTS.TOP_P,
|
topP: INFERENCE_DEFAULTS.TOP_P,
|
||||||
topK: INFERENCE_DEFAULTS.TOP_K,
|
topK: INFERENCE_DEFAULTS.TOP_K,
|
||||||
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
|
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
|
||||||
|
semanticWeight: RETRIEVAL.SEMANTIC_WEIGHT,
|
||||||
|
keywordWeight: RETRIEVAL.KEYWORD_WEIGHT,
|
||||||
|
contextBudget: ORCHESTRATION.CONTEXT_BUDGET,
|
||||||
|
entityWeight: ORCHESTRATION.ENTITY_WEIGHT,
|
||||||
|
minRecentEpisodes: ORCHESTRATION.MIN_RECENT_EPISODES,
|
||||||
};
|
};
|
||||||
|
|
||||||
function load() {
|
function load() {
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ const summariesRouter = require('./routes/summaries')
|
|||||||
const cors = require('cors');
|
const cors = require('cors');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '2mb' }));
|
||||||
|
|
||||||
app.use(cors({
|
app.use(cors({
|
||||||
origin: [
|
origin: [
|
||||||
|
|||||||
@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
|
|||||||
updates.systemPrompt = val.trim() || null; // null reverts to default
|
updates.systemPrompt = val.trim() || null; // null reverts to default
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (req.body.semanticWeight !== undefined) {
|
||||||
|
const val = Number(req.body.semanticWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 5)
|
||||||
|
return res.status(400).json({ error: 'semanticWeight must be 0–5' });
|
||||||
|
updates.semanticWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.keywordWeight !== undefined) {
|
||||||
|
const val = Number(req.body.keywordWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 5)
|
||||||
|
return res.status(400).json({ error: 'keywordWeight must be 0–5' });
|
||||||
|
updates.keywordWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.contextBudget !== undefined) {
|
||||||
|
const val = Number(req.body.contextBudget);
|
||||||
|
if (!Number.isInteger(val) || val < 512 || val > 32768)
|
||||||
|
return res.status(400).json({ error: 'contextBudget must be 512–32768' });
|
||||||
|
updates.contextBudget = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.entityWeight !== undefined) {
|
||||||
|
const val = Number(req.body.entityWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 2)
|
||||||
|
return res.status(400).json({ error: 'entityWeight must be 0–2' });
|
||||||
|
updates.entityWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.minRecentEpisodes !== undefined) {
|
||||||
|
const val = Number(req.body.minRecentEpisodes);
|
||||||
|
if (!Number.isInteger(val) || val < 0 || val > 10)
|
||||||
|
return res.status(400).json({ error: 'minRecentEpisodes must be 0–10' });
|
||||||
|
updates.minRecentEpisodes = val;
|
||||||
|
}
|
||||||
|
|
||||||
res.json(settings.save(updates));
|
res.json(settings.save(updates));
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
15
packages/orchestration-service/src/services/graph.js
Normal file
15
packages/orchestration-service/src/services/graph.js
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
const { getEnv, SERVICES } = require('@nexusai/shared');
|
||||||
|
|
||||||
|
const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
|
||||||
|
|
||||||
|
async function getNeighbors(entityIds) {
|
||||||
|
const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ entityIds }),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { getNeighbors };
|
||||||
@@ -196,6 +196,26 @@ async function getProjectOverviewSummary(projectId) {
|
|||||||
return res.json(); // null if none exists yet
|
return res.json(); // null if none exists yet
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
|
||||||
|
const url = new URL(`${BASE_URL}/episodes/search`);
|
||||||
|
url.searchParams.set('q', query);
|
||||||
|
url.searchParams.set('limit', limit);
|
||||||
|
if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
|
||||||
|
const res = await fetch(url.toString());
|
||||||
|
if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function getEpisodesByEntities(entityIds) {
|
||||||
|
const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ entityIds }),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
|
||||||
|
return res.json(); // { episodeIds: [...] }
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
getSessionByExternalId,
|
getSessionByExternalId,
|
||||||
createSession,
|
createSession,
|
||||||
@@ -220,4 +240,6 @@ module.exports = {
|
|||||||
getSummariesByProject,
|
getSummariesByProject,
|
||||||
generateProjectSummary,
|
generateProjectSummary,
|
||||||
getProjectOverviewSummary,
|
getProjectOverviewSummary,
|
||||||
|
searchEpisodes,
|
||||||
|
getEpisodesByEntities,
|
||||||
}
|
}
|
||||||
@@ -28,6 +28,9 @@ const ORCHESTRATION = {
|
|||||||
ENTITIES_LIMIT: 5,
|
ENTITIES_LIMIT: 5,
|
||||||
ENTITIES_THRESHOLD: 0.55,
|
ENTITIES_THRESHOLD: 0.55,
|
||||||
TEMPERATURE: 0.7,
|
TEMPERATURE: 0.7,
|
||||||
|
CONTEXT_BUDGET: 4096,
|
||||||
|
ENTITY_WEIGHT: 0.5,
|
||||||
|
MIN_RECENT_EPISODES: 2,
|
||||||
CORS_ORIGIN: 'http://localhost:5173',
|
CORS_ORIGIN: 'http://localhost:5173',
|
||||||
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
|
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
|
||||||
}
|
}
|
||||||
@@ -79,8 +82,27 @@ const SUMMARIES = {
|
|||||||
|
|
||||||
const ENTITIES = {
|
const ENTITIES = {
|
||||||
TEMPERATURE: 0.1, // Low temperature, more precise extraction, less creative
|
TEMPERATURE: 0.1, // Low temperature, more precise extraction, less creative
|
||||||
NUM_PREDICT: 1024, // Max tokens to consider for entity extraction (e.g. recent conversation)
|
NUM_PREDICT: 1500, // Max tokens to consider for entity extraction (e.g. recent conversation)
|
||||||
THRESHOLD: 0.55, // Minimum confidence score for an extracted entity to be included in the results
|
THRESHOLD: 0.55, // Minimum confidence score for an extracted entity to be included in the results
|
||||||
|
PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
|
||||||
|
GRAPH_HOP_DEPTH: 1, // Default traversal depth for neighborhood queries
|
||||||
|
TYPES: [
|
||||||
|
'person',
|
||||||
|
'place',
|
||||||
|
'project',
|
||||||
|
'technology',
|
||||||
|
'concept',
|
||||||
|
'organization',
|
||||||
|
'character',
|
||||||
|
'event',
|
||||||
|
'topic'
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
const RETRIEVAL = {
|
||||||
|
RRF_K: 60, // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
|
||||||
|
SEMANTIC_WEIGHT: 1.0, // Weight applied to semantic (QDrant) results
|
||||||
|
KEYWORD_WEIGHT: 0, // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
@@ -95,5 +117,6 @@ module.exports = {
|
|||||||
SQLITE,
|
SQLITE,
|
||||||
ORCHESTRATION,
|
ORCHESTRATION,
|
||||||
SUMMARIES,
|
SUMMARIES,
|
||||||
ENTITIES
|
ENTITIES,
|
||||||
|
RETRIEVAL,
|
||||||
};
|
};
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
const {getEnv} = require('./config/env');
|
const {getEnv} = require('./config/env');
|
||||||
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES } = require('./config/constants');
|
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
|
||||||
const {parseRow, formatEpisodeText} = require('./utils')
|
const {parseRow, formatEpisodeText} = require('./utils')
|
||||||
const logger = require('./utils/logger');
|
const logger = require('./utils/logger');
|
||||||
|
|
||||||
@@ -20,4 +20,5 @@ module.exports = {
|
|||||||
SUMMARIES,
|
SUMMARIES,
|
||||||
ENTITIES,
|
ENTITIES,
|
||||||
logger,
|
logger,
|
||||||
|
RETRIEVAL,
|
||||||
};
|
};
|
||||||
67
test-fusion.js
Normal file
67
test-fusion.js
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
// test-fusion.js
|
||||||
|
const { RETRIEVAL } = require('./packages/shared/src/config/constants');
|
||||||
|
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const scores = new Map();
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib;
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit)
|
||||||
|
.map(({ episode }) => episode);
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Test 1: episodes in both lists rank highest ---
|
||||||
|
const semantic = [
|
||||||
|
{ id: 1, user_message: 'ep1 — semantic only, rank 1' },
|
||||||
|
{ id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
|
||||||
|
{ id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
|
||||||
|
];
|
||||||
|
const keyword = [
|
||||||
|
{ id: 3, user_message: 'ep3 — rank 1 FTS' },
|
||||||
|
{ id: 2, user_message: 'ep2 — rank 2 FTS' },
|
||||||
|
{ id: 4, user_message: 'ep4 — FTS only, rank 3' },
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||||
|
console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
|
||||||
|
result.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id} "${ep.user_message}"`));
|
||||||
|
console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
|
||||||
|
console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
|
||||||
|
const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
|
||||||
|
console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
|
||||||
|
result2.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id}`));
|
||||||
|
console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
|
||||||
|
console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
|
||||||
|
console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 3: limit is respected ---
|
||||||
|
const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
|
||||||
|
console.log('Test 3 — limit:2 should return exactly 2 results:');
|
||||||
|
console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
|
||||||
|
const semOnly = [{ id: 10, user_message: 'sem' }];
|
||||||
|
const ftsOnly = [{ id: 20, user_message: 'fts' }];
|
||||||
|
const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||||
|
console.log('Test 4 — no overlap, both should appear:');
|
||||||
|
console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
|
||||||
|
console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
console.log('All tests passed.');
|
||||||
Reference in New Issue
Block a user