Compare commits
20 Commits
c86b565eed
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e4908193bd | ||
|
|
b58a4e4692 | ||
|
|
055683424d | ||
|
|
27ad614130 | ||
|
|
8ade5c68ca | ||
|
|
49982a85de | ||
|
|
9c6c5c9a42 | ||
|
|
c9cbac87ac | ||
|
|
1a97b19280 | ||
|
|
9fe8e568cf | ||
|
|
5ad01c6ad8 | ||
|
|
aac0923351 | ||
|
|
54218894c0 | ||
|
|
66a95f4479 | ||
| 78476e166f | |||
|
|
696ead29f8 | ||
|
|
45db47a584 | ||
|
|
095c9a623e | ||
|
|
f5011fddca | ||
|
|
86e78cc4c6 |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -5,4 +5,5 @@ data/
|
||||
.env
|
||||
.env.*
|
||||
*.db
|
||||
.claude/settings.local.json
|
||||
EOF
|
||||
2
.vscode/settings.json
vendored
2
.vscode/settings.json
vendored
@@ -1,2 +0,0 @@
|
||||
{
|
||||
}
|
||||
108
CLAUDE.md
Normal file
108
CLAUDE.md
Normal file
@@ -0,0 +1,108 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
## Development Commands
|
||||
|
||||
```bash
|
||||
# Start individual services
|
||||
npm run memory # Memory Service (port 3002)
|
||||
npm run embedding # Embedding Service (port 3003)
|
||||
npm run inference # Inference Service (port 3001)
|
||||
npm run orchestration # Orchestration Service (port 4000)
|
||||
npm run mini1 # Start memory + embedding concurrently
|
||||
|
||||
# Per-service dev mode (with --watch)
|
||||
npm -w packages/<service-name> run dev
|
||||
|
||||
# Chat client
|
||||
npm -w packages/chat-client run dev # Vite dev server (port 5173)
|
||||
npm -w packages/chat-client run build # Production build
|
||||
```
|
||||
|
||||
No test framework or linter is configured.
|
||||
|
||||
## Architecture Overview
|
||||
|
||||
NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
|
||||
|
||||
### Services
|
||||
|
||||
| Package | Port | Role |
|
||||
|---|---|---|
|
||||
| `orchestration-service` | 4000 | Central gateway; coordinates all others |
|
||||
| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
|
||||
| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
|
||||
| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
|
||||
| `chat-client` | 5173 | React/Vite frontend |
|
||||
| `shared` | — | Constants, env helpers, logger, formatters |
|
||||
|
||||
All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
|
||||
|
||||
### Chat Request Flow
|
||||
|
||||
1. Client POSTs to orchestration `/chat/stream`
|
||||
2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
|
||||
3. Embedding computed for user message (embedding-service)
|
||||
4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
|
||||
5. Inference streams response (inference-service)
|
||||
6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
|
||||
7. Entity extraction triggered async (qwen2.5:3b via inference-service)
|
||||
8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
|
||||
9. Auto-naming on first message (temp 0.3, 20 tokens max)
|
||||
|
||||
### Memory Model
|
||||
|
||||
**Dual store — neither works alone:**
|
||||
- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
|
||||
- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
|
||||
|
||||
Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
|
||||
|
||||
**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
|
||||
|
||||
### Key File Locations
|
||||
|
||||
**Orchestration** (`packages/orchestration-service/src/`):
|
||||
- `chat/index.js` — Core prompt building and memory assembly
|
||||
- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
|
||||
- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
|
||||
- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
|
||||
|
||||
**Memory** (`packages/memory-service/src/`):
|
||||
- `db/schema.js` — SQLite table definitions (source of truth for data model)
|
||||
- `episodic/` — Episode CRUD
|
||||
- `semantic/` — Qdrant operations
|
||||
- `entities/` — Entity extraction + CRUD
|
||||
- `summarization/` — Project summary generation
|
||||
|
||||
**Shared** (`packages/shared/src/`):
|
||||
- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
|
||||
- `config/env.js` — `getEnv()` helper with fallback to constants
|
||||
- `utils.js` — `parseRow()`, `formatEpisodeText()`, `logger`
|
||||
|
||||
**Frontend** (`packages/chat-client/src/`):
|
||||
- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
|
||||
- `hooks/` — `useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
|
||||
- `api/orchestration.js` — Fetch wrapper for all API calls
|
||||
- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
|
||||
|
||||
### Configuration
|
||||
|
||||
Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
|
||||
|
||||
### Deployment
|
||||
|
||||
Home lab across 3 nodes, managed with Docker Compose:
|
||||
- **Main PC** — RTX A4000 (inference via llama.cpp)
|
||||
- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
|
||||
- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
|
||||
|
||||
Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
|
||||
|
||||
## Key Development Principles
|
||||
|
||||
- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
|
||||
- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
|
||||
- **All services read settings on every request** — no restart required for config changes
|
||||
- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend
|
||||
@@ -73,8 +73,8 @@ service by ID after the vector search.
|
||||
|
||||
The core four-service architecture is complete and operational. Key capabilities:
|
||||
|
||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
||||
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
|
||||
- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
|
||||
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||
|
||||
@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
|
||||
|---|---|---|---|
|
||||
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
|
||||
| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
|
||||
| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||
| `temperature` | float | 0–2 | Inference randomness |
|
||||
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||
@@ -360,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.
|
||||
|
||||
**DELETE /relationships — body:**
|
||||
```json
|
||||
{ "fromId": 1, "toId": 2, "label": "uses" }
|
||||
{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
|
||||
```
|
||||
notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.
|
||||
|
||||
Relationships are identified by the composite key `(fromId, toId, label)`.
|
||||
Delete uses request body rather than URL params since this three-part key
|
||||
is awkward to encode in a path.
|
||||
|
||||
### Graph
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
|
||||
| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
|
||||
|
||||
**GET /graph/neighborhood/:entityId — query params:**
|
||||
|
||||
| Param | Default | Max | Description |
|
||||
|---|---|---|---|
|
||||
| depth | 1 | 3 | Traversal depth |
|
||||
|
||||
Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
|
||||
|
||||
**POST /graph/neighbors — body:**
|
||||
```json
|
||||
{ "entityIds": [5, 8, 12] }
|
||||
Returns { nodes: [...], edges: [...] }. Used internally by orchestration — not a client-facing endpoint.
|
||||
|
||||
---
|
||||
|
||||
## Embedding Service — port 3003
|
||||
|
||||
@@ -39,36 +39,36 @@
|
||||
|
||||
---
|
||||
|
||||
## Phase 1 — Loose Ends & Stability
|
||||
## Phase 1 — Loose Ends & Stability - COMPLETE ✅
|
||||
*Target: Next development session (Saturday)*
|
||||
|
||||
### Bug Fixes
|
||||
- [ ] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
||||
- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
||||
✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
||||
✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
||||
|
||||
### Tech Debt
|
||||
- [ ] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
||||
- [ ] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
||||
- [ ] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
||||
- [ ] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
||||
✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
||||
✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
||||
✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
||||
✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
||||
|
||||
---
|
||||
|
||||
## Phase 2 — Memory System Upgrades
|
||||
*The core intelligence layer*
|
||||
|
||||
### 1. Knowledge Graph (SQLite)
|
||||
### 1. Knowledge Graph (SQLite) ✅
|
||||
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
|
||||
- [ ] Graph schema — `nodes` and `edges` tables with typed relationships
|
||||
- [ ] Entity → node promotion pipeline
|
||||
- [ ] Relationship traversal queries
|
||||
- [ ] Graph-aware context assembly in orchestration
|
||||
- [x] Graph schema — `nodes` and `edges` tables with typed relationships
|
||||
- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
|
||||
- [x] Relationship traversal queries
|
||||
- [x] Graph-aware context assembly in orchestration
|
||||
|
||||
### 2. Retrieval Fusion + Full-Text Search
|
||||
### 2. Retrieval Fusion + Full-Text Search ✅
|
||||
Multi-strategy retrieval merged into a single ranked result set.
|
||||
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||
- [ ] Configurable weights per retrieval strategy
|
||||
- [ ] Score threshold tuning per collection
|
||||
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
|
||||
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
|
||||
|
||||
### 3. Memory Consolidation Lifecycle
|
||||
Prevents long-term memory degradation and enables compression.
|
||||
|
||||
@@ -1,178 +1,140 @@
|
||||
# Memory Service
|
||||
# Entity Extraction
|
||||
|
||||
**Package:** `@nexusai/memory-service`
|
||||
**Location:** `packages/memory-service`
|
||||
**Deployed on:** Mini PC 1 (192.168.0.81)
|
||||
**Port:** 3002
|
||||
**Location:** `packages/memory-service/src/entities/extraction.js`
|
||||
**Triggered by:** Episode creation (`POST /episodes`)
|
||||
**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
|
||||
|
||||
## Purpose
|
||||
|
||||
Responsible for all reading and writing of long-term memory. Acts as the
|
||||
sole interface to both SQLite and Qdrant — no other service accesses these
|
||||
stores directly. On episode creation, automatically calls the embedding
|
||||
service to generate and store a vector in Qdrant.
|
||||
After each episode is saved to SQLite, the extraction pipeline runs
|
||||
asynchronously in the background to identify named entities and the
|
||||
relationships between them. Results are written back to SQLite and
|
||||
embedded into Qdrant — the episode response is never delayed.
|
||||
|
||||
## Dependencies
|
||||
## Trigger
|
||||
|
||||
- `express` — HTTP API
|
||||
- `better-sqlite3` — SQLite driver
|
||||
- `@qdrant/js-client-rest` — Qdrant vector store client
|
||||
- `dotenv` — environment variable loading
|
||||
- `@nexusai/shared` — shared utilities and constants
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|---|---|---|---|
|
||||
| PORT | No | 3002 | Port to listen on |
|
||||
| SQLITE_PATH | Yes | — | Path to SQLite database file |
|
||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
||||
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
||||
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
|
||||
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
|
||||
|
||||
## Internal Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── db/
|
||||
│ ├── index.js # SQLite connection + initialization + migrations
|
||||
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
|
||||
│ ├── projects.js # Project CRUD functions
|
||||
│ └── summaries.js # Summary CRUD functions
|
||||
├── episodic/
|
||||
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
||||
├── semantic/
|
||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||
├── entities/
|
||||
│ ├── index.js # Entity + relationship CRUD
|
||||
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
||||
└── index.js # Express app + all route definitions
|
||||
```
|
||||
|
||||
## SQLite Schema
|
||||
|
||||
Seven core tables:
|
||||
|
||||
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||
- **entities** — named things the system learns about (people, places, concepts)
|
||||
- **relationships** — directional labeled links between entities
|
||||
- **summaries** — condensed episode groups for efficient context retrieval
|
||||
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
||||
|
||||
### Migrations
|
||||
|
||||
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
|
||||
idempotent migrations in `db/index.js` at startup:
|
||||
`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
|
||||
immediately after the SQLite insert, without awaiting it:
|
||||
|
||||
```js
|
||||
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
|
||||
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
|
||||
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
||||
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||
```
|
||||
|
||||
New migrations are always appended here — never modify the schema file for
|
||||
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||
If extraction throws, the episode is unaffected — the error is logged and
|
||||
swallowed.
|
||||
|
||||
### FTS5 Full-Text Search
|
||||
## Model Settings
|
||||
|
||||
An `episodes_fts` virtual table enables keyword search across all episodes.
|
||||
Three triggers (`episodes_fts_insert`, `episodes_fts_update`, `episodes_fts_delete`)
|
||||
keep the FTS index automatically in sync with the episodes table.
|
||||
| Setting | Value | Notes |
|
||||
|---|---|---|
|
||||
| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
|
||||
| Temperature | 0.1 | Low for consistent, deterministic output |
|
||||
| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
|
||||
| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
|
||||
| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
|
||||
|
||||
### SQLite Configuration
|
||||
## Prompt Structure
|
||||
|
||||
- `journal_mode = WAL` — non-blocking reads during writes
|
||||
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
||||
- PRAGMAs set via `db.pragma()`, not `db.exec()`
|
||||
The prompt is built by `buildExtractionPrompt()`. It includes:
|
||||
|
||||
### Dynamic Updates
|
||||
1. **System message** — declares the model's role as an entity and relationship extractor
|
||||
2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
|
||||
3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
|
||||
4. **Conversation** — the raw user message and AI response, delimited clearly
|
||||
|
||||
Both `updateSession` and `updateProject` build their `SET` clause dynamically
|
||||
from only the fields passed — prevents partial updates from overwriting fields
|
||||
that weren't touched.
|
||||
```
|
||||
<|im_start|>system
|
||||
You are a named entity and relationship extractor. You output only valid JSON.
|
||||
<|im_end|>
|
||||
<|im_start|>user
|
||||
Read the conversation below and extract all named entities and the relationships between them.
|
||||
Entity types: person, place, project, technology, concept, organization
|
||||
...
|
||||
Return this exact JSON structure:
|
||||
{ "entities": [...], "relationships": [...] }
|
||||
|
||||
`updateProject` allowlist:
|
||||
```js
|
||||
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
|
||||
Already known entities (use these exact name and type values if the same entity appears):
|
||||
- "NexusAI" (project)
|
||||
- "Alice" (person)
|
||||
|
||||
--- CONVERSATION ---
|
||||
User: ...
|
||||
Assistant: ...
|
||||
--- END CONVERSATION ---
|
||||
<|im_end|>
|
||||
<|im_start|>assistant
|
||||
```
|
||||
|
||||
## Qdrant / Semantic Layer
|
||||
## Expected JSON Output
|
||||
|
||||
Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
|
||||
|
||||
| Collection | Purpose |
|
||||
|---|---|
|
||||
| `episodes` | Embeddings for individual conversation exchanges |
|
||||
| `entities` | Embeddings for named entities |
|
||||
| `summaries` | Embeddings for condensed episode summaries |
|
||||
|
||||
All collections use **768-dimension vectors** with **Cosine similarity**,
|
||||
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
|
||||
defined in `@nexusai/shared` — not hardcoded here.
|
||||
|
||||
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
||||
collection that doesn't already exist at startup — all three collections are
|
||||
guaranteed to exist before any requests are handled, avoiding race conditions
|
||||
between the first entity embed and an entity search.
|
||||
|
||||
Each collection exposes upsert, search (with optional Qdrant filter), and
|
||||
delete operations. The `wait: true` flag is used on all writes.
|
||||
|
||||
## Embedding Write Path
|
||||
|
||||
When a new episode is created:
|
||||
|
||||
1. Episode saved to SQLite synchronously — response returned immediately
|
||||
2. User message + AI response combined: `User: ...\nAssistant: ...`
|
||||
3. Text sent to embedding service (`POST /embed`)
|
||||
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
|
||||
|
||||
This step is **fire-and-forget** — if embedding fails, the episode is still
|
||||
saved and searchable via FTS. The error is logged but not surfaced.
|
||||
|
||||
> The Qdrant payload stores `sessionId` (the internal integer ID). See
|
||||
> `memory-isolation.md` for how project-level filtering works.
|
||||
|
||||
## Entity Layer
|
||||
|
||||
Entities and relationships use upsert semantics with composite unique
|
||||
constraints to prevent duplicates:
|
||||
|
||||
- `UNIQUE(name, type)` on entities
|
||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
||||
- `ON DELETE CASCADE` on relationship foreign keys
|
||||
|
||||
After each episode is saved, `extraction.js` automatically extracts named
|
||||
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
|
||||
|
||||
> For full details on the extraction pipeline, prompt format, constrained
|
||||
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
|
||||
|
||||
## Summaries Layer
|
||||
|
||||
Session summaries are generated by `orchestration-service/src/services/summarization.js`
|
||||
after each episode write and stored here via `POST /summaries`. The memory
|
||||
service is responsible only for CRUD — generation logic lives in orchestration.
|
||||
|
||||
> For full details on trigger conditions, prompt format, cumulative updates,
|
||||
> and ChatML token stripping, see `summarization.md`.
|
||||
|
||||
## Project Delete Behaviour
|
||||
|
||||
Deleting a project runs as a transaction — it first nulls out `project_id`
|
||||
on all assigned sessions, then deletes the project. This avoids a foreign
|
||||
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
|
||||
|
||||
```js
|
||||
const doDelete = db.transaction(() => {
|
||||
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
|
||||
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
|
||||
});
|
||||
```json
|
||||
{
|
||||
"entities": [
|
||||
{ "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
|
||||
{ "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
|
||||
],
|
||||
"relationships": [
|
||||
{
|
||||
"from": "Alice", "fromType": "person",
|
||||
"to": "NexusAI", "toType": "project",
|
||||
"label": "works_on",
|
||||
"notes": "Alice is the primary developer."
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
|
||||
`knows`, `located_in`, `part_of`, `created_by`).
|
||||
|
||||
## JSON Parsing
|
||||
|
||||
The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
|
||||
tolerates any preamble or trailing prose the model emits alongside the JSON.
|
||||
If the match fails or `JSON.parse` throws, the function logs a warning and
|
||||
returns without writing anything.
|
||||
|
||||
## Entity Processing
|
||||
|
||||
For each entity in `parsed.entities`:
|
||||
|
||||
1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
|
||||
2. Call `upsertEntity(name, type, notes)`:
|
||||
- **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
|
||||
- **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
|
||||
3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
|
||||
4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
|
||||
5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
|
||||
|
||||
**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
|
||||
|
||||
**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
|
||||
|
||||
## Relationship Processing
|
||||
|
||||
After all entities are saved, relationships are processed:
|
||||
|
||||
1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
|
||||
2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
|
||||
3. Call `upsertRelationship(fromId, toId, label, notes)`:
|
||||
- **Insert**: creates new row with `mention_count = 1`
|
||||
- **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
|
||||
|
||||
Relationships are unidirectional in storage. Bidirectionality is handled at
|
||||
query time by the graph traversal layer.
|
||||
|
||||
## Project Scoping
|
||||
|
||||
`projectId` is threaded through from the episode creation call. It is stored
|
||||
in the Qdrant entity payload, which enables project-scoped entity search in
|
||||
orchestration. SQLite entities and relationships are global — scoping only
|
||||
applies at the Qdrant retrieval layer.
|
||||
|
||||
## Error Behaviour
|
||||
|
||||
All steps after the initial model call are wrapped in a single outer try/catch.
|
||||
If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
|
||||
parsed, the function logs at `warn` level and returns. There is no retry logic.
|
||||
Individual entity embedding failures are caught per-entity and logged at `warn`
|
||||
level without affecting other entities in the same batch.
|
||||
|
||||
213
docs/services/knowledge-graph.md
Normal file
213
docs/services/knowledge-graph.md
Normal file
@@ -0,0 +1,213 @@
|
||||
# Knowledge Graph
|
||||
|
||||
**Location:** `packages/memory-service/src/graph/index.js`
|
||||
**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`
|
||||
**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`
|
||||
**Consumed by:** Orchestration service context assembly
|
||||
|
||||
## Purpose
|
||||
|
||||
The knowledge graph transforms NexusAI from "remembers conversations" to
|
||||
"understands relationships between things." Rather than injecting a flat
|
||||
list of entity facts into every prompt, orchestration now retrieves a
|
||||
1-hop subgraph of connected entities and their relationships, giving the
|
||||
model structured, linked knowledge about people, projects, technologies,
|
||||
and concepts that have appeared across conversations.
|
||||
|
||||
## Schema
|
||||
|
||||
### `entity_episodes` (join table)
|
||||
|
||||
Tracks which episodes contributed to each entity's knowledge. Defined in
|
||||
`schema.js` — exists on all installs.
|
||||
|
||||
```sql
|
||||
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (entity_id, episode_id)
|
||||
);
|
||||
```
|
||||
|
||||
Both FKs cascade on delete — removing an entity or episode automatically
|
||||
cleans up its join rows.
|
||||
|
||||
### New columns on `entities`
|
||||
|
||||
Added via migration in `db/index.js`:
|
||||
|
||||
| Column | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
|
||||
| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
|
||||
| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
|
||||
| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
|
||||
|
||||
### New columns on `relationships`
|
||||
|
||||
| Column | Type | Default | Description |
|
||||
|---|---|---|---|
|
||||
| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
|
||||
| `notes` | TEXT | NULL | Relationship context sentence from extraction |
|
||||
|
||||
## Entity Promotion Model
|
||||
|
||||
Entities are not created equal — some are mentioned once in passing, others
|
||||
recur across many conversations. `mention_count` is the signal:
|
||||
|
||||
- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
|
||||
- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
|
||||
- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
|
||||
|
||||
The same pattern applies to relationships — `mention_count` rises each time
|
||||
the same `(from_id, to_id, label)` triple is extracted.
|
||||
|
||||
## Graph Traversal
|
||||
|
||||
`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
|
||||
CTE support. No external graph database is needed.
|
||||
|
||||
### `getNeighborhood(entityId, depth)`
|
||||
|
||||
Traverses the graph from a single entity, following edges in **both directions**,
|
||||
up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
|
||||
|
||||
Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
|
||||
|
||||
**SQLite query:**
|
||||
|
||||
```sql
|
||||
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||
SELECT ?, 0
|
||||
UNION
|
||||
SELECT
|
||||
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||
t.depth + 1
|
||||
FROM relationships r
|
||||
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||
WHERE t.depth < ?
|
||||
)
|
||||
SELECT DISTINCT entity_id FROM traverse
|
||||
```
|
||||
|
||||
`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
|
||||
cycles — a node already in the traversal set is not re-visited.
|
||||
|
||||
After collecting node IDs, two follow-up queries fetch:
|
||||
- All entity rows for those IDs
|
||||
- All relationship rows where both `from_id` and `to_id` are in the node set
|
||||
|
||||
This ensures edges between neighbors are included even if they aren't on the
|
||||
traversal path from the seed.
|
||||
|
||||
### `getEntityNeighbors(entityIds[])`
|
||||
|
||||
Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
|
||||
(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
|
||||
|
||||
1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
|
||||
2. Deduplicates seeds + neighbors using a JavaScript `Set`
|
||||
3. Fetches all entity rows and all relationship rows within the combined node set
|
||||
|
||||
This is intentionally simpler than the recursive version — orchestration always
|
||||
uses depth=1, and the bulk query avoids N separate CTE calls.
|
||||
|
||||
## Graph-Aware Context Assembly
|
||||
|
||||
Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
|
||||
graph at step 7 of the chat pipeline:
|
||||
|
||||
1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
|
||||
2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
|
||||
3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
|
||||
4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
|
||||
|
||||
### Prompt Format
|
||||
|
||||
`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
|
||||
|
||||
```
|
||||
Here is what you know about entities relevant to this conversation and their connections:
|
||||
- Alice (person): software engineer working on NexusAI
|
||||
→ works_on NexusAI (project)
|
||||
→ knows Bob (person)
|
||||
- NexusAI (project): AI assistant framework
|
||||
- Bob (person): Alice's colleague
|
||||
```
|
||||
|
||||
- One line per node: `- {name} ({type}): {notes}`
|
||||
- Outbound edges indented below: ` → {label} {target_name} ({target_type})`
|
||||
- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
|
||||
- Only outbound edges are shown — each relationship appears once, from the `from_id` side
|
||||
|
||||
## Project Scoping
|
||||
|
||||
The knowledge graph respects project boundaries at the **entry point**, not
|
||||
during traversal:
|
||||
|
||||
- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
|
||||
- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
|
||||
- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
|
||||
|
||||
Entities are tagged with `projectId` in the Qdrant payload at extraction time.
|
||||
Entities extracted from non-project sessions have `projectId: null` and only
|
||||
appear in unfiltered global searches.
|
||||
|
||||
## API Reference
|
||||
|
||||
### `GET /graph/neighborhood/:entityId`
|
||||
|
||||
Returns the neighborhood of a single entity.
|
||||
|
||||
**Query params:**
|
||||
|
||||
| Param | Default | Max | Description |
|
||||
|---|---|---|---|
|
||||
| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
|
||||
"neighborhood": {
|
||||
"nodes": [
|
||||
{ "id": 5, "name": "Alice", "type": "person", "notes": "..." },
|
||||
{ "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
|
||||
],
|
||||
"edges": [
|
||||
{ "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Returns 404 if the entity does not exist.
|
||||
|
||||
### `POST /graph/neighbors`
|
||||
|
||||
Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
|
||||
orchestration — not intended for direct client use.
|
||||
|
||||
**Request body:**
|
||||
```json
|
||||
{ "entityIds": [5, 8, 12] }
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"nodes": [ ...entity objects... ],
|
||||
"edges": [ ...relationship objects... ]
|
||||
}
|
||||
```
|
||||
|
||||
Returns 400 if `entityIds` is missing or empty.
|
||||
|
||||
## Constants (`packages/shared/src/config/constants.js`)
|
||||
|
||||
| Constant | Value | Description |
|
||||
|---|---|---|
|
||||
| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
|
||||
| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
|
||||
| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
|
||||
| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
|
||||
@@ -9,8 +9,8 @@
|
||||
|
||||
Responsible for all reading and writing of long-term memory. Acts as the
|
||||
sole interface to both SQLite and Qdrant — no other service accesses these
|
||||
stores directly. On episode creation, automatically calls the embedding
|
||||
service to generate and store a vector in Qdrant.
|
||||
stores directly. On episode creation, automatically triggers entity and
|
||||
relationship extraction and embeds results into Qdrant.
|
||||
|
||||
## Dependencies
|
||||
|
||||
@@ -45,19 +45,22 @@ src/
|
||||
├── semantic/
|
||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||
├── entities/
|
||||
│ ├── index.js # Entity + relationship CRUD
|
||||
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
||||
│ ├── index.js # Entity + relationship CRUD (upsert, mention tracking)
|
||||
│ └── extraction.js # Automatic entity + relationship extraction via qwen2.5:3b
|
||||
├── graph/
|
||||
│ └── index.js # Knowledge graph traversal (neighborhood queries, recursive CTE)
|
||||
└── index.js # Express app + all route definitions
|
||||
```
|
||||
|
||||
## SQLite Schema
|
||||
|
||||
Seven core tables:
|
||||
Eight core tables:
|
||||
|
||||
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||
- **entities** — named things the system learns about (people, places, concepts)
|
||||
- **relationships** — directional labeled links between entities
|
||||
- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
|
||||
- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
|
||||
- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
|
||||
- **summaries** — condensed episode groups for efficient context retrieval
|
||||
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
||||
|
||||
@@ -73,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
||||
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
||||
// Knowledge graph columns:
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||
```
|
||||
|
||||
New migrations are always appended here — never modify the schema file for
|
||||
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||
`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
|
||||
|
||||
New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||
|
||||
### FTS5 Full-Text Search
|
||||
|
||||
@@ -117,8 +128,7 @@ defined in `@nexusai/shared` — not hardcoded here.
|
||||
|
||||
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
||||
collection that doesn't already exist at startup — all three collections are
|
||||
guaranteed to exist before any requests are handled, avoiding race conditions
|
||||
between the first entity embed and an entity search.
|
||||
guaranteed to exist before any requests are handled.
|
||||
|
||||
Each collection exposes upsert, search (with optional Qdrant filter), and
|
||||
delete operations. The `wait: true` flag is used on all writes.
|
||||
@@ -143,15 +153,27 @@ saved and searchable via FTS. The error is logged but not surfaced.
|
||||
Entities and relationships use upsert semantics with composite unique
|
||||
constraints to prevent duplicates:
|
||||
|
||||
- `UNIQUE(name, type)` on entities
|
||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
||||
- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
|
||||
- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
|
||||
- `ON DELETE CASCADE` on relationship foreign keys
|
||||
|
||||
After each episode is saved, `extraction.js` automatically extracts named
|
||||
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
|
||||
entities **and relationships** from the conversation using `qwen2.5:3b` on
|
||||
Ollama — fire-and-forget. Each saved entity is also linked to the episode
|
||||
via the `entity_episodes` join table.
|
||||
|
||||
> For full details on the extraction pipeline, prompt format, constrained
|
||||
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
|
||||
> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.
|
||||
> For the knowledge graph traversal layer, see `knowledge-graph.md`.
|
||||
|
||||
## Knowledge Graph Layer
|
||||
|
||||
`src/graph/index.js` provides SQLite-based graph traversal over the entities
|
||||
and relationships tables. Two functions are exposed via HTTP:
|
||||
|
||||
- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
|
||||
- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
|
||||
|
||||
> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.
|
||||
|
||||
## Summaries Layer
|
||||
|
||||
@@ -175,4 +197,4 @@ const doDelete = db.transaction(() => {
|
||||
});
|
||||
```
|
||||
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
|
||||
@@ -42,9 +42,10 @@ src/
|
||||
│ ├── inference.js # HTTP client for inference service
|
||||
│ ├── embedding.js # HTTP client for embedding service
|
||||
│ ├── qdrant.js # HTTP client for Qdrant (direct vector search)
|
||||
│ ├── graph.js # HTTP client for memory-service graph endpoints
|
||||
│ └── summarization.js # Session summarisation — triggers after each episode
|
||||
├── chat/
|
||||
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
||||
│ └── index.js # Core pipeline — context assembly, graph expansion, auto-naming
|
||||
├── config/
|
||||
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
||||
├── routes/
|
||||
@@ -71,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
||||
|---|---|---|
|
||||
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||
| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
|
||||
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
|
||||
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
|
||||
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||
| `temperature` | 0.7 | Inference temperature |
|
||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||
@@ -100,24 +103,35 @@ difference is how the inference response is delivered to the client.
|
||||
|
||||
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
||||
|
||||
5. **Semantic search** — embed user message, query Qdrant for similar past
|
||||
episodes. Deduplicated against recent episodes. Non-critical.
|
||||
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
|
||||
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
|
||||
Both paths are filtered against `recentIds` before fusion. FTS is scoped
|
||||
to the current session or all project sessions. If `keywordWeight` is `0`,
|
||||
the FTS call is skipped entirely. Non-critical — failures fall back to
|
||||
whichever strategy succeeded.
|
||||
|
||||
6. **Entity search** — query `entities` Qdrant collection filtered by
|
||||
`projectId`. Non-project sessions receive no entity context. Non-critical.
|
||||
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
||||
point ID equals the SQLite entity ID). Non-critical.
|
||||
|
||||
7. **Prompt assembly** — combine system prompt, entity context, semantic
|
||||
episodes, recent episodes, and user message.
|
||||
7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
|
||||
memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
|
||||
`{ nodes, edges }` — entity objects plus the relationships connecting them.
|
||||
If no entities were found or the graph call fails, falls back to flat entity
|
||||
list (no edges). Non-critical.
|
||||
|
||||
8. **Inference** — send to inference service. `/chat` awaits full response;
|
||||
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
|
||||
recent episodes, and user message.
|
||||
|
||||
9. **Inference** — send to inference service. `/chat` awaits full response;
|
||||
`/chat/stream` pipes SSE chunks to the client.
|
||||
|
||||
9. **Episode write** — write exchange back to memory with `projectId`.
|
||||
10. **Episode write** — write exchange back to memory with `projectId`.
|
||||
|
||||
10. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
|
||||
11. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
|
||||
fire-and-forget. See `summarization.md` for full details.
|
||||
|
||||
11. **Auto-naming** — on first message with no session name, fires a secondary
|
||||
12. **Auto-naming** — on first message with no session name, fires a secondary
|
||||
inference call (max 20 tokens, temperature 0.3) to generate a session name.
|
||||
|
||||
### Prompt Structure
|
||||
@@ -125,8 +139,9 @@ difference is how the inference response is delivered to the client.
|
||||
```
|
||||
[Resolved system prompt]
|
||||
|
||||
Here is what you know about entities relevant to this conversation:
|
||||
Here is what you know about entities relevant to this conversation and their connections:
|
||||
- {name} ({type}): {notes}
|
||||
→ {label} {neighbor_name} ({neighbor_type})
|
||||
---
|
||||
Here are some relevant memories from earlier conversations:
|
||||
User: {past user message}
|
||||
@@ -141,6 +156,12 @@ User: {current message}
|
||||
Assistant:
|
||||
```
|
||||
|
||||
The entity block renders the full graph neighborhood — seed entities matched
|
||||
by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
|
||||
shows its `notes` and any outbound relationships with their targets. Neighbor
|
||||
nodes that have no outbound edges within the subgraph appear without connection
|
||||
lines.
|
||||
|
||||
## Summarisation
|
||||
|
||||
After each episode write, `triggerSummary` is called fire-and-forget. It
|
||||
@@ -199,4 +220,7 @@ handle /health* { reverse_proxy localhost:4000 }
|
||||
|
||||
After updating: `caddy reload --config /path/to/Caddyfile`
|
||||
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
> Note: `/graph` routes are on the memory-service (port 3002) and are called
|
||||
> internally by orchestration — they do not need a Caddy entry.
|
||||
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
|
||||
153
docs/services/retrieval-fusion.md
Normal file
153
docs/services/retrieval-fusion.md
Normal file
@@ -0,0 +1,153 @@
|
||||
# Retrieval Fusion
|
||||
|
||||
**Implementation:** `packages/orchestration-service/src/chat/index.js`
|
||||
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
|
||||
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
|
||||
|
||||
## Purpose
|
||||
|
||||
Rather than relying solely on Qdrant vector similarity (which finds semantically
|
||||
related content but misses exact keyword matches) or FTS5 keyword search alone
|
||||
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
|
||||
merges the ranked results from both strategies into a single better-ranked list.
|
||||
|
||||
Episodes that rank highly in **both** lists score highest. An episode that is
|
||||
the top semantic match but irrelevant by keyword, or vice versa, scores lower
|
||||
than one that satisfies both.
|
||||
|
||||
## How RRF Works
|
||||
|
||||
For each episode `d`, its fused score is:
|
||||
|
||||
```
|
||||
RRF(d) = w_semantic / (k + rank_semantic(d))
|
||||
+ w_keyword / (k + rank_keyword(d))
|
||||
```
|
||||
|
||||
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
|
||||
- `k = 60` — smoothing constant (standard; not exposed in settings)
|
||||
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
|
||||
|
||||
Setting a weight to `0` removes that strategy's contribution entirely. Setting
|
||||
`keywordWeight` to `0` also short-circuits the FTS network call.
|
||||
|
||||
## Architecture
|
||||
|
||||
Fusion lives in orchestration — the service already coordinates multiple data
|
||||
sources, and fusion is a retrieval strategy, not a storage concern.
|
||||
|
||||
```
|
||||
getFusedEpisodes()
|
||||
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
|
||||
│ (existing path, unchanged)
|
||||
└── getFTSResults() — memory-service /episodes/search → full rows directly
|
||||
(skipped entirely if keywordWeight == 0)
|
||||
↓
|
||||
fuseEpisodeResults() — pure RRF, no I/O
|
||||
↓
|
||||
fusedEpisodes[] — top semanticLimit episodes by RRF score
|
||||
```
|
||||
|
||||
### Data Shape Consistency
|
||||
|
||||
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
|
||||
the same shape — and both must be filtered against `recentIds` first:
|
||||
|
||||
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
|
||||
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
|
||||
|
||||
FTS requests `semanticLimit * 2` results to provide headroom for the
|
||||
`recentIds` filter without under-serving the fusion.
|
||||
|
||||
## FTS Session Scoping
|
||||
|
||||
Without scoping, FTS5 searches across all episodes in the database. For
|
||||
context assembly, results must be constrained to the current session or
|
||||
project session pool — the same scope used for Qdrant semantic search.
|
||||
|
||||
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
|
||||
optional `sessionIds` array. When provided, the SQL becomes:
|
||||
|
||||
```sql
|
||||
SELECT e.* FROM episodes e
|
||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||
WHERE episodes_fts MATCH ?
|
||||
AND e.session_id IN (?, ?, ...)
|
||||
ORDER BY rank
|
||||
LIMIT ?
|
||||
```
|
||||
|
||||
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
|
||||
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
|
||||
|
||||
In orchestration, `ftsSessionIds` is set to:
|
||||
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
|
||||
- `[session.id]` — otherwise (single session only)
|
||||
|
||||
This mirrors the Qdrant scoping logic exactly.
|
||||
|
||||
## `fuseEpisodeResults` — Implementation Detail
|
||||
|
||||
```js
|
||||
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||
const k = RETRIEVAL.RRF_K; // 60
|
||||
const scores = new Map(); // episode.id → { episode, score }
|
||||
|
||||
// Score semantic results (already filtered against recentIds)
|
||||
semanticEps.forEach((ep, i) => {
|
||||
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||
});
|
||||
|
||||
// Score + merge keyword results (already filtered against recentIds)
|
||||
keywordEps.forEach((ep, i) => {
|
||||
const contrib = keywordWeight / (k + i + 1);
|
||||
if (scores.has(ep.id)) {
|
||||
scores.get(ep.id).score += contrib; // appears in both — sum scores
|
||||
} else if (contrib > 0) {
|
||||
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
|
||||
}
|
||||
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
|
||||
});
|
||||
|
||||
return [...scores.values()]
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, limit)
|
||||
.map(({ episode }) => episode);
|
||||
}
|
||||
```
|
||||
|
||||
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
|
||||
the result set with a score of 0 when `keywordWeight` is 0 — verified by
|
||||
the test suite.
|
||||
|
||||
## Settings
|
||||
|
||||
| Setting | Default | Range | Description |
|
||||
|---|---|---|---|
|
||||
| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
|
||||
| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
|
||||
|
||||
Both are readable via `GET /settings` and writable via `PATCH /settings`
|
||||
without a service restart. Changes take effect on the next chat request.
|
||||
|
||||
**To enable keyword search:**
|
||||
```bash
|
||||
curl -X PATCH http://localhost:4000/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"keywordWeight": 1.0}'
|
||||
```
|
||||
|
||||
**To favour keyword matches over semantic:**
|
||||
```bash
|
||||
curl -X PATCH http://localhost:4000/settings \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
|
||||
```
|
||||
|
||||
## Constants (`packages/shared/src/config/constants.js`)
|
||||
|
||||
| Constant | Value | Description |
|
||||
|---|---|---|
|
||||
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
|
||||
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
|
||||
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
|
||||
3
package-lock.json
generated
3
package-lock.json
generated
@@ -4224,8 +4224,7 @@
|
||||
"dependencies": {
|
||||
"@nexusai/shared": "^1.0.0",
|
||||
"dotenv": "^17.4.0",
|
||||
"express": "^5.2.1",
|
||||
"ollama": "^0.6.3"
|
||||
"express": "^5.2.1"
|
||||
}
|
||||
},
|
||||
"packages/inference-service": {
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { API_DEFAULTS } from "../config/constants";
|
||||
|
||||
|
||||
const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';
|
||||
|
||||
// ── Sessions ────────────────────────────────────────────────
|
||||
|
||||
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
|
||||
import { fetchSessions, deleteSession } from '../api/orchestration';
|
||||
import { CLIENT_DEFAULTS } from '../config/constants';
|
||||
|
||||
|
||||
const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;
|
||||
|
||||
export default function AllChatsView({ onSelectSession, onBack, projects }) {
|
||||
|
||||
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
|
||||
import ProjectModal from './ProjectModal';
|
||||
import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';
|
||||
|
||||
|
||||
export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
|
||||
const [projects, setProjects] = useState([]);
|
||||
const [loading, setLoading] = useState(true);
|
||||
|
||||
@@ -3,6 +3,7 @@ import { useSettings } from '../hooks/useSettings';
|
||||
import { useModels } from '../hooks/useModels';
|
||||
import { getServiceHealth } from '../api/orchestration';
|
||||
|
||||
|
||||
export default function SettingsView({ onNavigate, onBack, modelProps }) {
|
||||
const { settings, saveSetting, saving } = useSettings();
|
||||
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import { useState, useEffect, useCallback } from 'react';
|
||||
import { fetchProjects } from '../api/orchestration';
|
||||
|
||||
|
||||
export function useProjects() {
|
||||
const [projects, setProjects] = useState([]);
|
||||
|
||||
|
||||
64
packages/embedding-service/CLAUDE.md
Normal file
64
packages/embedding-service/CLAUDE.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||
|
||||
## Running This Service
|
||||
|
||||
```bash
|
||||
npm run embedding # From repo root
|
||||
npm -w packages/embedding-service run dev # With --watch
|
||||
```
|
||||
|
||||
Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
|
||||
|
||||
## Single-File Service
|
||||
|
||||
The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `PORT` | `3003` | Port to listen on |
|
||||
| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
|
||||
| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
|
||||
|
||||
Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
|
||||
|
||||
## Ollama API Details
|
||||
|
||||
Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
|
||||
|
||||
```json
|
||||
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||
```
|
||||
|
||||
Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
|
||||
|
||||
The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
|
||||
|
||||
## Batch Endpoint
|
||||
|
||||
`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
|
||||
|
||||
## Error Responses
|
||||
|
||||
| Condition | Status | Notes |
|
||||
|---|---|---|
|
||||
| Missing/empty `text` | 400 | |
|
||||
| Ollama call fails | 502 | Upstream failure — correct status |
|
||||
| Empty `texts` array | 400 | |
|
||||
|
||||
## Known Issue
|
||||
|
||||
The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/health` | Static response — does not verify Ollama is reachable |
|
||||
| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
|
||||
| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |
|
||||
@@ -9,7 +9,6 @@
|
||||
"dependencies": {
|
||||
"@nexusai/shared": "^1.0.0",
|
||||
"dotenv": "^17.4.0",
|
||||
"express": "^5.2.1",
|
||||
"ollama": "^0.6.3"
|
||||
"express": "^5.2.1"
|
||||
}
|
||||
}
|
||||
|
||||
@@ -3,7 +3,7 @@ const express = require('express');
|
||||
const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');
|
||||
|
||||
const app = express();
|
||||
app.use(express.json());
|
||||
app.use(express.json({ limit: '1mb' })); // limit request body to 1mb to prevent abuse - embedding requests should be small
|
||||
|
||||
const PORT = getEnv('PORT', PORTS.EMBEDDING);
|
||||
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL);
|
||||
@@ -14,7 +14,8 @@ async function embedText(text) {
|
||||
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ model: EMBED_MODEL, input: text })
|
||||
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
|
||||
signal: AbortSignal.timeout(30_000),
|
||||
});
|
||||
|
||||
if (!res.ok) {
|
||||
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
|
||||
app.post('/embed', async (req, res) => {
|
||||
const { text } = req.body;
|
||||
if (!text || typeof text !== 'string' || text.trim() === '') {
|
||||
return res.status(400).json({ error: 'text is required and must be empty' });
|
||||
return res.status(400).json({ error: 'text is required and must not be empty' });
|
||||
}
|
||||
|
||||
try {
|
||||
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
|
||||
}
|
||||
|
||||
try {
|
||||
//sequential embedding for now, Ollama doesn't natively parallize embeddings
|
||||
const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
|
||||
if (invalid !== -1)
|
||||
return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
|
||||
|
||||
const embeddings = [];
|
||||
for (const text of texts) {
|
||||
embeddings.push(await embedText(text.trim()));
|
||||
|
||||
75
packages/inference-service/CLAUDE.md
Normal file
75
packages/inference-service/CLAUDE.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||
|
||||
## Running This Service
|
||||
|
||||
```bash
|
||||
npm run inference # From repo root
|
||||
npm -w packages/inference-service run dev # With --watch
|
||||
```
|
||||
|
||||
Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
|
||||
|
||||
## Provider Pattern
|
||||
|
||||
`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
|
||||
|
||||
| `INFERENCE_PROVIDER` | Module | Backend |
|
||||
|---|---|---|
|
||||
| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
|
||||
| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
|
||||
|
||||
An unknown provider throws immediately at startup — fail-fast, not at request time.
|
||||
|
||||
Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|---|---|---|
|
||||
| `PORT` | `3001` | Port to listen on |
|
||||
| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
|
||||
| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
|
||||
| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
|
||||
|
||||
`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
|
||||
|
||||
## Options Resolution
|
||||
|
||||
Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
|
||||
|
||||
## Streaming Chunk Format
|
||||
|
||||
The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
|
||||
|
||||
**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
|
||||
|
||||
**llama.cpp** yields:
|
||||
- Per-token: `{ response: delta, done: false }`
|
||||
- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
|
||||
|
||||
The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
|
||||
|
||||
## Known Issue: `maxTokens` Missing from Streaming Route
|
||||
|
||||
`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
|
||||
|
||||
## SSE Format (route → caller)
|
||||
|
||||
```
|
||||
data: {"response":"Hello"} ← per token
|
||||
data: {"response":" world"}
|
||||
data: {"done":true,"model":"...","tokenCount":42} ← final metadata
|
||||
data: [DONE] ← sentinel
|
||||
```
|
||||
|
||||
## API Endpoints
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/health` | Returns `{ service, status, provider, model }` |
|
||||
| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
|
||||
| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |
|
||||
@@ -1,10 +1,10 @@
|
||||
require ('dotenv').config();
|
||||
const express = require('express');
|
||||
const {getEnv, PORTS, OLLAMA} = require('@nexusai/shared');
|
||||
const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
|
||||
const inferenceRouter = require('./routes/inference');
|
||||
|
||||
const app = express();
|
||||
app.use(express.json());
|
||||
app.use(express.json({ limit: '8mb' })); // prompts include full context window
|
||||
|
||||
const PORT = getEnv('PORT', PORTS.INFERENCE);
|
||||
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
|
||||
@@ -24,5 +24,5 @@ app.use('/', inferenceRouter);
|
||||
|
||||
// Start the server
|
||||
app.listen(PORT, () => {
|
||||
console.log(`Inference Service is running on port ${PORT}`);
|
||||
logger.info(`Inference Service is running on port ${PORT}`);
|
||||
});
|
||||
@@ -1,4 +1,4 @@
|
||||
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS } = require("@nexusai/shared");
|
||||
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS, logger } = require("@nexusai/shared");
|
||||
|
||||
const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
|
||||
const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
|
||||
@@ -89,7 +89,7 @@ async function* completeStream(prompt, options = {}) {
|
||||
}
|
||||
}
|
||||
|
||||
console.log('[llamacpp] finalTokenCount:', finalTokenCount);
|
||||
logger.info('[llamacpp] finalTokenCount:', finalTokenCount);
|
||||
|
||||
yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
|
||||
}
|
||||
|
||||
@@ -57,7 +57,16 @@ async function* completeStream(prompt, options = {} ) {
|
||||
});
|
||||
|
||||
for await (const chunk of stream) {
|
||||
yield chunk;
|
||||
if (chunk.done) {
|
||||
yield {
|
||||
response: '',
|
||||
done: true,
|
||||
model: chunk.model,
|
||||
tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
|
||||
};
|
||||
} else {
|
||||
yield chunk;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
const { Router } = require('express');
|
||||
const { complete, completeStream } = require('../infer');
|
||||
const { logger } = require('@nexusai/shared');
|
||||
|
||||
const router = Router();
|
||||
|
||||
@@ -15,14 +16,14 @@ router.post('/complete', async (req, res) => {
|
||||
const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
|
||||
res.json(result);
|
||||
} catch (error) {
|
||||
console.error('[Inference] Completion error:', error.message);
|
||||
res.status(500).json({ error: error.message });
|
||||
logger.error('[Inference] Completion error:', error.message);
|
||||
res.status(500).json({ error: 'Inference failed', detail: error.message });
|
||||
}
|
||||
});
|
||||
|
||||
// Streaming completion endpoint - sends partial responses as they arrive
|
||||
router.post('/complete/stream', async (req, res) => {
|
||||
const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
|
||||
const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;
|
||||
|
||||
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
|
||||
|
||||
@@ -34,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
|
||||
let lastModel = model;
|
||||
let tokenCount = 0;
|
||||
|
||||
for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
|
||||
for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
|
||||
if (chunk.response) {
|
||||
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
|
||||
}
|
||||
@@ -42,7 +43,7 @@ router.post('/complete/stream', async (req, res) => {
|
||||
// capture final metadata from the done signal
|
||||
lastModel = chunk.model ?? lastModel;
|
||||
tokenCount = chunk.tokenCount ?? tokenCount;
|
||||
console.log('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
|
||||
logger.info('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -51,7 +52,7 @@ router.post('/complete/stream', async (req, res) => {
|
||||
res.write('data: [DONE]\n\n');
|
||||
|
||||
} catch (err) {
|
||||
console.error('[Inference] Streaming error:', err.message);
|
||||
logger.error('[Inference] Streaming error:', err.message);
|
||||
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
|
||||
} finally {
|
||||
res.end();
|
||||
|
||||
114
packages/memory-service/CLAUDE.md
Normal file
114
packages/memory-service/CLAUDE.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the dual-store memory model.
|
||||
|
||||
## Running This Service
|
||||
|
||||
```bash
|
||||
npm run memory # From repo root (node src/index.js)
|
||||
npm -w packages/memory-service run dev # With --watch
|
||||
```
|
||||
|
||||
Default port: **3002**. Requires Qdrant and the embedding-service to be reachable on startup.
|
||||
|
||||
## SQLite Schema
|
||||
|
||||
`src/db/schema.js` is the source of truth for the data model. Key schema facts:
|
||||
|
||||
- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
|
||||
- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
|
||||
- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
|
||||
- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
|
||||
- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
|
||||
|
||||
**New columns on `entities` (added via migration):**
|
||||
- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
|
||||
- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
|
||||
- `source TEXT DEFAULT 'extraction'` — `'extraction'` or `'manual'`
|
||||
- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
|
||||
|
||||
**New columns on `relationships` (added via migration):**
|
||||
- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
|
||||
- `notes TEXT` — relationship context sentence from extraction
|
||||
|
||||
## Async Pipeline: Episode Creation
|
||||
|
||||
`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
|
||||
|
||||
1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
|
||||
2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
|
||||
|
||||
Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
|
||||
|
||||
## Entity Extraction Details
|
||||
|
||||
`src/entities/extraction.js`:
|
||||
|
||||
- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
|
||||
- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
|
||||
- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
|
||||
- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
|
||||
- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
|
||||
- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
|
||||
- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
|
||||
- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
|
||||
- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
|
||||
|
||||
> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
|
||||
|
||||
## Knowledge Graph
|
||||
|
||||
`src/graph/index.js` provides two SQLite traversal functions:
|
||||
|
||||
- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
|
||||
|
||||
- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
|
||||
|
||||
The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
|
||||
|
||||
> For full design rationale and usage see `docs/services/knowledge-graph.md`.
|
||||
|
||||
## Summarization Strategy
|
||||
|
||||
`src/summarization/project.js`:
|
||||
|
||||
- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
|
||||
- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
|
||||
- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
|
||||
- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
|
||||
- Uses temp 0.2 and `num_predict 1200`.
|
||||
|
||||
## Qdrant Client
|
||||
|
||||
`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
|
||||
|
||||
## API Endpoints Quick Reference
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/health` | Static response, no dependency checks |
|
||||
| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
|
||||
| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
|
||||
| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
|
||||
| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
|
||||
| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
|
||||
| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
|
||||
| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
|
||||
| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
|
||||
| GET | `/entities/by-type/:type` | All entities of given type |
|
||||
| GET/DELETE | `/entities/:id` | |
|
||||
| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
|
||||
| GET | `/entities/:id/relationships` | Outbound only |
|
||||
| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
|
||||
| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
|
||||
| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
|
||||
| GET/POST | `/projects` | POST requires non-empty `name` |
|
||||
| GET/PATCH/DELETE | `/projects/:id` | |
|
||||
| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
|
||||
| GET | `/projects/:id/overview` | Returns null (not 404) if no overview exists |
|
||||
| GET | `/projects/:id/summaries` | All summaries for project |
|
||||
| POST | `/summaries` | Requires `content` + at least one of `sessionId`/`projectId` |
|
||||
| GET | `/sessions/:id/summaries` | |
|
||||
| PATCH/DELETE | `/summaries/:id` | |
|
||||
@@ -1,6 +1,6 @@
|
||||
const Database = require('better-sqlite3');
|
||||
const schema = require('./schema');
|
||||
const {getEnv, SQLITE } = require('@nexusai/shared');
|
||||
const {getEnv, SQLITE, logger } = require('@nexusai/shared');
|
||||
|
||||
let db; // Declare db variable in a scope accessible to all functions
|
||||
|
||||
@@ -54,15 +54,20 @@ function getDB() {
|
||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
|
||||
} catch {}
|
||||
|
||||
try {
|
||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
|
||||
} catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||
|
||||
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||
|
||||
|
||||
// Sync FTS index with any existing episodes data
|
||||
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
|
||||
SELECT id, user_message, ai_response FROM episodes`);
|
||||
|
||||
console.log(`Connected to SQLite database at ${path}`);
|
||||
logger.info(`Connected to SQLite database at ${path}`);
|
||||
}
|
||||
return db;
|
||||
}
|
||||
|
||||
@@ -38,6 +38,20 @@ const schema = `
|
||||
UNIQUE(from_id, to_id, label)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_id);
|
||||
|
||||
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (entity_id, episode_id)
|
||||
);
|
||||
|
||||
CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity ON entity_episodes(entity_id);
|
||||
CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
|
||||
|
||||
|
||||
|
||||
CREATE TABLE IF NOT EXISTS projects (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT NOT NULL,
|
||||
|
||||
@@ -1,14 +1,18 @@
|
||||
const semantic = require('../semantic')
|
||||
const { getEnv, SERVICES, formatEpisodeText, ENTITIES } = require('@nexusai/shared');
|
||||
const { upsertEntity } = require('./index');
|
||||
const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
|
||||
const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');
|
||||
|
||||
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
|
||||
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
|
||||
|
||||
const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
|
||||
const ENTITY_TYPES = ENTITIES.TYPES;
|
||||
const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];
|
||||
|
||||
// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
|
||||
// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
|
||||
// or other model, this format will need to change — most alternatives use either
|
||||
// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
|
||||
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
||||
const knownBlock = knownEntities.length > 0
|
||||
? [
|
||||
@@ -20,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
||||
|
||||
return [
|
||||
'<|im_start|>system',
|
||||
'You are a named entity extractor. You output only valid JSON.',
|
||||
'You are a named entity and relationship extractor. You output only valid JSON.',
|
||||
'<|im_end|>',
|
||||
'<|im_start|>user',
|
||||
'Read the conversation below and extract every named entity mentioned.',
|
||||
`Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
|
||||
'For each entity found, provide: name, type, and a one-sentence notes field.',
|
||||
'Return your answer as: { "entities": [ ... ] }',
|
||||
'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
|
||||
'For each entity found, provide:',
|
||||
' "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
|
||||
'Read the conversation below and extract all named entities and the relationships between them.',
|
||||
`Entity types: ${ENTITY_TYPES.join(', ')}`,
|
||||
'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
|
||||
'Use "person" only for real people',
|
||||
'For each entity provide:',
|
||||
' "name": short proper noun only (max 4 words)',
|
||||
' "type": one of the valid types',
|
||||
' "notes": one specific sentence about this entity based on the conversation (not generic)',
|
||||
' "notes": one specific sentence about this entity based on the conversation',
|
||||
'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
|
||||
'Only include relationships between entities you have listed above.',
|
||||
'Return this exact JSON structure:',
|
||||
'{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
|
||||
'',
|
||||
knownBlock,
|
||||
'--- CONVERSATION ---', // clear delimiter helps smaller models
|
||||
'--- CONVERSATION ---',
|
||||
`User: ${userMessage}`,
|
||||
`Assistant: ${aiResponse}`,
|
||||
'--- END CONVERSATION ---',
|
||||
@@ -58,8 +65,8 @@ async function embedEntity(entity) {
|
||||
return data.embedding;
|
||||
}
|
||||
|
||||
async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
|
||||
console.log('[entities] Extraction triggered')
|
||||
async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
|
||||
logger.info('[entities] Extraction triggered')
|
||||
try {
|
||||
// Fetch existing entities to guide the model toward consistent name/type pairs
|
||||
const db = require('../db').getDB();
|
||||
@@ -80,6 +87,7 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
||||
num_predict: ENTITIES.NUM_PREDICT,
|
||||
},
|
||||
}),
|
||||
signal: AbortSignal.timeout(60_000),
|
||||
});
|
||||
|
||||
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||
@@ -87,27 +95,41 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
||||
const data = await res.json();
|
||||
const raw = data.response?.trim() ?? '';
|
||||
|
||||
const parsed = JSON.parse(raw);
|
||||
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
||||
if (entities.length === 0) {
|
||||
console.log('[entities] No entities found in this exchange — skipping');
|
||||
return; // not an error, just nothing to extract
|
||||
const jsonMatch = raw.match(/\{[\s\S]*\}/);
|
||||
if (!jsonMatch) {
|
||||
logger.warn('[entities] No JSON object found in response');
|
||||
logger.debug('[entities] Raw response was:', raw);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!Array.isArray(entities)) throw new Error('Response was not a JSON array');
|
||||
let parsed;
|
||||
try {
|
||||
parsed = JSON.parse(jsonMatch[0]);
|
||||
} catch (err) {
|
||||
logger.warn('[entities] Failed to parse extraction response:', err.message);
|
||||
logger.debug('[entities] Raw response was:', raw);
|
||||
return;
|
||||
}
|
||||
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
||||
if (entities.length === 0) {
|
||||
logger.debug('[entities] No entities found in this exchange — skipping');
|
||||
return;
|
||||
}
|
||||
|
||||
// Map of "name::type" → saved entity, used for relationship resolution below
|
||||
const entityMap = new Map();
|
||||
let saved = 0;
|
||||
|
||||
|
||||
for (const { name, type, notes } of entities) {
|
||||
|
||||
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
|
||||
if (IGNORED_NAMES.includes(name.toLowerCase())) continue;
|
||||
|
||||
const entity = upsertEntity(name, type, notes ?? null);
|
||||
console.log('[entities] Upserted entity:', entity);
|
||||
entityMap.set(`${name}::${type}`, entity);
|
||||
logger.info('[entities] Upserted entity:', entity);
|
||||
|
||||
if (episodeId) linkEntityToEpisode(entity.id, episodeId);
|
||||
|
||||
// Embed and upsert to Qdrant fire-and-forget
|
||||
embedEntity(entity)
|
||||
.then(vector => semantic.upsertEntity(entity.id, vector, {
|
||||
name: entity.name,
|
||||
@@ -116,17 +138,34 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
||||
projectId: projectId ?? null,
|
||||
}))
|
||||
.catch(err => {
|
||||
console.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
|
||||
logger.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
|
||||
});
|
||||
|
||||
saved++;
|
||||
}
|
||||
|
||||
if (saved > 0) console.log(`[entities] Extracted and stored ${saved} entities`);
|
||||
if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
|
||||
|
||||
// Process extracted relationships — both entities must have been saved above
|
||||
const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
|
||||
let relSaved = 0;
|
||||
|
||||
for (const { from, fromType, to, toType, label, notes } of relationships) {
|
||||
if (!from || !fromType || !to || !toType || !label) continue;
|
||||
|
||||
const fromEntity = entityMap.get(`${from}::${fromType}`);
|
||||
const toEntity = entityMap.get(`${to}::${toType}`);
|
||||
if (!fromEntity || !toEntity) continue;
|
||||
|
||||
upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
|
||||
relSaved++;
|
||||
}
|
||||
|
||||
if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);
|
||||
|
||||
} catch (err) {
|
||||
// Non-critical — log and move on, episode is already saved
|
||||
console.warn('[entities] Extraction failed:', err.message);
|
||||
logger.warn('[entities] Extraction failed:', err.message);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
|
||||
/******* Entities ********/
|
||||
|
||||
// Upsert an entity - insert or update if (name, type) already exists
|
||||
function upsertEntity(name, type, notes = null, metadata = null) {
|
||||
function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
|
||||
const db = getDB();
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO entities (name, type, notes, metadata)
|
||||
VALUES (?, ?, ?, ?)
|
||||
ON CONFLICT(name, type) DO UPDATE SET
|
||||
notes = COALESCE(entities.notes, excluded.notes),
|
||||
metadata = excluded.metadata,
|
||||
updated_at = unixepoch()
|
||||
`);
|
||||
const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
|
||||
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
|
||||
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||
ON CONFLICT(name, type) DO UPDATE SET
|
||||
-- First extraction wins: notes are never overwritten once set.
|
||||
-- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
|
||||
-- quality scoring exists, a higher-confidence extraction should be able
|
||||
-- to replace stale notes rather than being silently dropped.
|
||||
notes = COALESCE(entities.notes, excluded.notes),
|
||||
metadata = excluded.metadata,
|
||||
mention_count = entities.mention_count + 1,
|
||||
last_seen_at = unixepoch(),
|
||||
updated_at = unixepoch()
|
||||
`);
|
||||
stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
|
||||
return getEntityByNameType(name, type);
|
||||
}
|
||||
|
||||
@@ -40,15 +45,17 @@ function deleteEntity(id) {
|
||||
/********* Relationships *********/
|
||||
|
||||
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
|
||||
function upsertRelationship(fromId, toId, label, metadata = null){
|
||||
function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
|
||||
const db = getDB();
|
||||
const stmt = db.prepare(`
|
||||
INSERT INTO relationships (from_id, to_id, label, metadata)
|
||||
VALUES (?, ?, ?, ?)
|
||||
ON CONFLICT(from_id, to_id, label) DO NOTHING
|
||||
INSERT INTO relationships (from_id, to_id, label, notes, metadata)
|
||||
VALUES (?, ?, ?, ?, ?)
|
||||
ON CONFLICT(from_id, to_id, label) DO UPDATE SET
|
||||
mention_count = relationships.mention_count + 1,
|
||||
-- First extraction wins for notes — same policy as entities.
|
||||
notes = COALESCE(relationships.notes, excluded.notes)
|
||||
`);
|
||||
|
||||
const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
|
||||
stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
|
||||
return getRelationship(fromId, toId, label);
|
||||
}
|
||||
|
||||
@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
|
||||
}
|
||||
|
||||
// Retrive all relationships originating from a given entity
|
||||
function getRelationshipsByEntity(entityId) {
|
||||
function getOutboundRelationships(entityId) {
|
||||
const db = getDB();
|
||||
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
|
||||
}
|
||||
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
|
||||
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
|
||||
}
|
||||
|
||||
function linkEntityToEpisode(entityId, episodeId) {
|
||||
const db = getDB();
|
||||
db.prepare(`
|
||||
INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
|
||||
VALUES (?, ?)
|
||||
`).run(entityId, episodeId);
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
upsertEntity,
|
||||
getEntity,
|
||||
getEntitiesByType,
|
||||
getEntityByNameType,
|
||||
deleteEntity,
|
||||
linkEntityToEpisode,
|
||||
upsertRelationship,
|
||||
getRelationship,
|
||||
getRelationshipsByEntity,
|
||||
getOutboundRelationships,
|
||||
deleteRelationship
|
||||
}
|
||||
@@ -1,5 +1,5 @@
|
||||
const {getDB} = require('../db');
|
||||
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES } = require('@nexusai/shared');
|
||||
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES, logger } = require('@nexusai/shared');
|
||||
const semantic = require('../semantic');
|
||||
const { extractAndStoreEntities } = require('../entities/extraction')
|
||||
|
||||
@@ -125,10 +125,10 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
|
||||
sessionId: episode.session_id,
|
||||
createdAt: episode.created_at
|
||||
}))
|
||||
.catch(err => console.error(`Failed to embed episode ${episode.id}:`, err.message));
|
||||
.catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));
|
||||
|
||||
extractAndStoreEntities(userMessage, aiResponse, projectId)
|
||||
.catch(err => console.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||
|
||||
|
||||
return episode;
|
||||
@@ -163,22 +163,32 @@ function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
|
||||
ORDER BY created_at DESC
|
||||
LIMIT ?
|
||||
`);
|
||||
return stmt.all(limit).map(parseRow);
|
||||
return stmt.all(sessionId, limit).map(parseRow);
|
||||
}
|
||||
|
||||
|
||||
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
|
||||
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
|
||||
// FTS5 full-text search across all episodes
|
||||
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
|
||||
const db = getDB();
|
||||
const stmt = db.prepare(`
|
||||
const safeQuery = `"${query.replace(/"/g, '""')}"`;
|
||||
if (sessionIds && sessionIds.length > 0) {
|
||||
const ph = sessionIds.map(() => '?').join(',');
|
||||
return db.prepare(`
|
||||
SELECT e.* FROM episodes e
|
||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||
WHERE episodes_fts MATCH ?
|
||||
AND e.session_id IN (${ph})
|
||||
ORDER BY rank
|
||||
LIMIT ?
|
||||
`).all(safeQuery, ...sessionIds, limit).map(parseRow);
|
||||
}
|
||||
return db.prepare(`
|
||||
SELECT e.* FROM episodes e
|
||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||
WHERE episodes_fts MATCH ?
|
||||
ORDER BY rank
|
||||
LIMIT ?
|
||||
`);
|
||||
return stmt.all(query, limit).map(parseRow);
|
||||
`).all(safeQuery, limit).map(parseRow);
|
||||
}
|
||||
|
||||
// Deletes an episode by its ID
|
||||
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
|
||||
const res = await fetch(`${url}/embed`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ text })
|
||||
body: JSON.stringify({ text }),
|
||||
signal: AbortSignal.timeout(30_000),
|
||||
})
|
||||
|
||||
if (!res.ok) {
|
||||
|
||||
77
packages/memory-service/src/graph/index.js
Normal file
77
packages/memory-service/src/graph/index.js
Normal file
@@ -0,0 +1,77 @@
|
||||
const { getDB } = require('../db');
|
||||
const { parseRow, ENTITIES } = require('@nexusai/shared');
|
||||
|
||||
// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
|
||||
function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
|
||||
const db = getDB();
|
||||
|
||||
const nodeRows = db.prepare(`
|
||||
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||
SELECT ?, 0
|
||||
UNION
|
||||
SELECT
|
||||
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||
t.depth + 1
|
||||
FROM relationships r
|
||||
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||
WHERE t.depth < ?
|
||||
)
|
||||
SELECT DISTINCT entity_id FROM traverse
|
||||
`).all(entityId, depth);
|
||||
|
||||
const nodeIds = nodeRows.map(r => r.entity_id);
|
||||
if (nodeIds.length === 0) return { nodes: [], edges: [] };
|
||||
|
||||
const ph = nodeIds.map(() => '?').join(',');
|
||||
const nodes = db.prepare(
|
||||
`SELECT * FROM entities WHERE id IN (${ph})`
|
||||
).all(...nodeIds).map(parseRow);
|
||||
|
||||
const edges = db.prepare(
|
||||
`SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
|
||||
).all(...nodeIds, ...nodeIds).map(parseRow);
|
||||
|
||||
return { nodes, edges };
|
||||
}
|
||||
|
||||
// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
|
||||
function getEntityNeighbors(entityIds) {
|
||||
if (!entityIds.length) return { nodes: [], edges: [] };
|
||||
const db = getDB();
|
||||
|
||||
const ph = entityIds.map(() => '?').join(',');
|
||||
|
||||
// entityIds appears three times — once for the CASE (finding the neighbor),
|
||||
// and once each for the FROM and TO sides of the WHERE clause
|
||||
const neighborRows = db.prepare(`
|
||||
SELECT DISTINCT
|
||||
CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
|
||||
FROM relationships
|
||||
WHERE from_id IN (${ph}) OR to_id IN (${ph})
|
||||
`).all(...entityIds, ...entityIds, ...entityIds);
|
||||
|
||||
const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
|
||||
const allPh = allIds.map(() => '?').join(',');
|
||||
|
||||
const nodes = db.prepare(
|
||||
`SELECT * FROM entities WHERE id IN (${allPh})`
|
||||
).all(...allIds).map(parseRow);
|
||||
|
||||
const edges = db.prepare(
|
||||
`SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
|
||||
).all(...allIds, ...allIds).map(parseRow);
|
||||
|
||||
return { nodes, edges };
|
||||
}
|
||||
|
||||
// Returns episode IDs linked to any of the given entity IDs via entity_episodes
|
||||
function getEpisodeIdsByEntities(entityIds) {
|
||||
if (!entityIds.length) return [];
|
||||
const db = getDB();
|
||||
const ph = entityIds.map(() => '?').join(',');
|
||||
return db.prepare(
|
||||
`SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
|
||||
).all(...entityIds).map(r => r.episode_id);
|
||||
}
|
||||
|
||||
module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };
|
||||
@@ -1,17 +1,18 @@
|
||||
require ('dotenv').config();
|
||||
const express = require('express');
|
||||
const {getEnv, PORTS, EPISODIC} = require('@nexusai/shared');
|
||||
const {getEnv, PORTS, EPISODIC, logger} = require('@nexusai/shared');
|
||||
const { getDB } = require('./db');
|
||||
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
|
||||
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
|
||||
const { generateAndStoreProjectSummary } = require('./summarization/project');
|
||||
const graph = require('./graph');
|
||||
|
||||
const episodic = require('./episodic');
|
||||
const semantic = require('./semantic');
|
||||
const entities = require('./entities');
|
||||
|
||||
const app = express();
|
||||
app.use(express.json());
|
||||
app.use(express.json({ limit: '2mb' }));
|
||||
|
||||
const PORT = getEnv('PORT', PORTS.MEMORY);
|
||||
|
||||
@@ -19,8 +20,8 @@ const PORT = getEnv('PORT', PORTS.MEMORY);
|
||||
const db = getDB();
|
||||
|
||||
semantic.initCollections()
|
||||
.then(() => console.log(`QDrant collections ready`))
|
||||
.catch(err => console.error(`QDrant initialization error:`, err.message));
|
||||
.then(() => logger.info(`QDrant collections ready`))
|
||||
.catch(err => logger.error(`QDrant initialization error:`, err.message));
|
||||
|
||||
// Health check endpoint
|
||||
app.get('/health', (req, res) => {
|
||||
@@ -80,7 +81,7 @@ app.patch('/sessions/by-external/:externalId', (req, res) => {
|
||||
const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
|
||||
res.json(session);
|
||||
} catch (err) {
|
||||
res.status(500).json({error: err.message });
|
||||
res.status(500).json({ error: 'Failed to update session', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -130,10 +131,12 @@ app.get('/episodes', (req, res) => {
|
||||
|
||||
// Search MUST come before /:id — otherwise 'search' gets captured as an id
|
||||
app.get('/episodes/search', (req, res) => {
|
||||
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
|
||||
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
|
||||
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
|
||||
const results = episodic.searchEpisodes(q, Number(limit));
|
||||
res.json(results);
|
||||
const parsedSessionIds = sessionIds
|
||||
? sessionIds.split(',').map(Number).filter(Boolean)
|
||||
: null;
|
||||
res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
|
||||
});
|
||||
|
||||
app.get('/episodes/:id', (req, res) => {
|
||||
@@ -158,7 +161,7 @@ app.delete('/episodes/:id', (req, res) => {
|
||||
episodic.deleteEpisode(id);
|
||||
|
||||
semantic.deleteEpisode(id) // fire-and-forget
|
||||
.catch(err => console.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
|
||||
.catch(err => logger.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
|
||||
|
||||
res.status(204).send();
|
||||
});
|
||||
@@ -202,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {
|
||||
|
||||
// Upsert a relationship between two entities
|
||||
app.post('/relationships', (req, res) => {
|
||||
const {fromId, toId, label, metadata } = req.body;
|
||||
const { fromId, toId, label, notes, metadata } = req.body;
|
||||
if (!fromId || !toId || !label) {
|
||||
return res.status(400).json({ error: 'fromId, toId and label are required' });
|
||||
}
|
||||
const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
|
||||
const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
|
||||
res.status(201).json(relationship);
|
||||
});
|
||||
|
||||
// Get all relationships for a given entity ID
|
||||
app.get('/entities/:id/relationships', (req, res) => {
|
||||
res.json(entities.getRelationshipsByEntity(req.params.id));
|
||||
res.json(entities.getOutboundRelationships(req.params.id));
|
||||
});
|
||||
|
||||
// Delete a specific relationship
|
||||
@@ -225,6 +228,37 @@ app.delete('/relationships', (req, res) => {
|
||||
res.status(204).send();
|
||||
})
|
||||
|
||||
/********************************* */
|
||||
/********** Graph Routes ********** */
|
||||
/********************************* */
|
||||
|
||||
// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
|
||||
app.get('/graph/neighborhood/:entityId', (req, res) => {
|
||||
const entity = entities.getEntity(req.params.entityId);
|
||||
if (!entity) return res.status(404).json({ error: 'Entity not found' });
|
||||
|
||||
const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
|
||||
const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
|
||||
res.json({ entity, neighborhood });
|
||||
});
|
||||
|
||||
// Bulk 1-hop neighborhood — body: { entityIds: [...] }
|
||||
app.post('/graph/neighbors', (req, res) => {
|
||||
const { entityIds } = req.body;
|
||||
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||
return res.status(400).json({ error: 'entityIds array is required' });
|
||||
}
|
||||
res.json(graph.getEntityNeighbors(entityIds.map(Number)));
|
||||
});
|
||||
|
||||
app.post('/episodes/by-entities', (req, res) => {
|
||||
const { entityIds } = req.body;
|
||||
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||
return res.status(400).json({ error: 'entityIds array is required' });
|
||||
}
|
||||
res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
|
||||
});
|
||||
|
||||
/*********************************** */
|
||||
/********** Project Routes ********** */
|
||||
/*********************************** */
|
||||
@@ -235,7 +269,7 @@ app.post('/projects', (req, res) => {
|
||||
try {
|
||||
res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to create project', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -255,7 +289,7 @@ app.post('/projects/:id/summarize', async (req, res) => {
|
||||
if (err.message.includes('No session summaries or episodes')) {
|
||||
return res.status(422).json({ error: err.message });
|
||||
}
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to generate project summary', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -310,7 +344,7 @@ app.post('/summaries', (req, res) => {
|
||||
const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
|
||||
res.status(201).json(summary);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to create summary', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -338,5 +372,5 @@ app.delete('/summaries/:id', (req, res) => {
|
||||
/********** Start Server ********** */
|
||||
/********************************** */
|
||||
app.listen(PORT, () => {
|
||||
console.log(`Memory Service is running on port ${PORT}`);
|
||||
logger.info(`Memory Service is running on port ${PORT}`);
|
||||
});
|
||||
@@ -1,5 +1,5 @@
|
||||
const {QdrantClient} = require('@qdrant/js-client-rest');
|
||||
const {QDRANT, COLLECTIONS, getEnv} = require('@nexusai/shared');
|
||||
const {QDRANT, COLLECTIONS, getEnv, logger} = require('@nexusai/shared');
|
||||
|
||||
let client;
|
||||
|
||||
@@ -24,9 +24,9 @@ async function initCollections() {
|
||||
distance: QDRANT.DISTANCE_METRIC
|
||||
}
|
||||
});
|
||||
console.log(`Created Qdrant collection: ${name}`);
|
||||
logger.info(`Created Qdrant collection: ${name}`);
|
||||
} else {
|
||||
console.log(`Qdrant collection already exists: ${name}`);
|
||||
logger.info(`Qdrant collection already exists: ${name}`);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
156
packages/orchestration-service/CLAUDE.md
Normal file
156
packages/orchestration-service/CLAUDE.md
Normal file
@@ -0,0 +1,156 @@
|
||||
# CLAUDE.md
|
||||
|
||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||
|
||||
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
|
||||
|
||||
## Running This Service
|
||||
|
||||
```bash
|
||||
npm run orchestration # From repo root (node src/index.js)
|
||||
npm -w packages/orchestration-service run dev # With --watch
|
||||
```
|
||||
|
||||
Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
|
||||
|
||||
## Context Assembly (`src/chat/index.js`)
|
||||
|
||||
`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
|
||||
|
||||
1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
|
||||
2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
|
||||
3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
|
||||
4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
|
||||
- No project: `must: [sessionId == this session]`
|
||||
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
|
||||
- Dedup against recent episode IDs before including.
|
||||
5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
|
||||
6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
|
||||
7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
|
||||
8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
|
||||
|
||||
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
|
||||
|
||||
## Graph Context Format
|
||||
|
||||
`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
|
||||
|
||||
```
|
||||
- Alice (person): software engineer working on NexusAI
|
||||
→ works_on NexusAI (project)
|
||||
→ knows Bob (person)
|
||||
- NexusAI (project): AI assistant framework
|
||||
- Bob (person): Alice's colleague
|
||||
```
|
||||
|
||||
Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
|
||||
|
||||
## System Prompt Resolution
|
||||
|
||||
Priority from highest to lowest:
|
||||
1. `project.system_prompt` (stored on the project row in memory-service)
|
||||
2. `settings.systemPrompt` (saved in `data/settings.json`)
|
||||
3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
|
||||
|
||||
## Settings (`src/config/settings.js`)
|
||||
|
||||
Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
|
||||
|
||||
| Field | Constraint |
|
||||
|---|---|
|
||||
| `recentEpisodeLimit` | integer, 1–20 |
|
||||
| `semanticLimit` | integer, 1–20 |
|
||||
| `scoreThreshold` | number, 0–1 |
|
||||
| `temperature` | number, 0–2 |
|
||||
| `repeatPenalty` | number, 1–2 |
|
||||
| `topP` | number, 0–1 |
|
||||
| `topK` | integer, 1–100 |
|
||||
| `modelsFolderPath` | path must exist and be readable |
|
||||
| `systemPrompt` | string (trimmed); `null` reverts to shared default |
|
||||
|
||||
`data/settings.json` is created on first save. Parent directories are created if missing.
|
||||
|
||||
## Streaming SSE (`src/chat/index.js` — `chatStream`)
|
||||
|
||||
The route sets SSE headers and delegates to `chatStream`, which:
|
||||
1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
|
||||
2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
|
||||
3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
|
||||
4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
|
||||
5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
|
||||
|
||||
If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
|
||||
|
||||
## Fire-and-Forget Tasks
|
||||
|
||||
After every successful chat turn:
|
||||
- **Summarization** (`services/summarization.js` → `triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
|
||||
- **Auto-naming** (`chat/index.js` → `autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
|
||||
|
||||
Both tasks catch all errors and log warnings without surfacing to the client.
|
||||
|
||||
## Summarization Recency Guard
|
||||
|
||||
`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
|
||||
|
||||
When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
|
||||
|
||||
## Qdrant Calls (Direct, Not Via Memory-Service)
|
||||
|
||||
`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
|
||||
|
||||
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
|
||||
|
||||
## Retrieval Fusion (`src/chat/index.js`)
|
||||
|
||||
Three functions handle fusion — all pure or lightly async, all non-critical:
|
||||
|
||||
- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
|
||||
- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
|
||||
- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
|
||||
|
||||
FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
|
||||
|
||||
> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
|
||||
|
||||
## Graph Service Client (`src/services/graph.js`)
|
||||
|
||||
Thin HTTP client for memory-service graph endpoints. One function:
|
||||
|
||||
- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
|
||||
|
||||
## Models Endpoint
|
||||
|
||||
`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
|
||||
|
||||
`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
|
||||
|
||||
## Health Check
|
||||
|
||||
`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
|
||||
|
||||
## Background Model (qwen2.5:3b)
|
||||
Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
|
||||
|
||||
## API Endpoints Quick Reference
|
||||
|
||||
| Method | Path | Notes |
|
||||
|---|---|---|
|
||||
| GET | `/health` | Returns service URLs |
|
||||
| GET | `/health/services` | Parallel status of all dependencies |
|
||||
| POST | `/chat` | Blocking completion |
|
||||
| POST | `/chat/stream` | SSE streaming |
|
||||
| GET/PATCH | `/settings` | Persistent settings |
|
||||
| GET | `/models` | `.gguf` file scan |
|
||||
| GET | `/models/props` | llama-server model info |
|
||||
| GET | `/sessions` | Delegates to memory-service |
|
||||
| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
|
||||
| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
|
||||
| DELETE | `/sessions/:sessionId` | |
|
||||
| GET | `/episodes` | Delegates; supports `q` for FTS |
|
||||
| DELETE | `/episodes/:id` | Delegates |
|
||||
| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
|
||||
| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
|
||||
| GET | `/summaries/project/:projectId/overview` | |
|
||||
| GET | `/summaries/session/:sessionId` | Resolves external ID first |
|
||||
| GET | `/summaries/project/:projectId` | |
|
||||
@@ -2,34 +2,32 @@ const memory = require("../services/memory");
|
||||
const inference = require("../services/inference");
|
||||
const embedding = require("../services/embedding");
|
||||
const qdrant = require("../services/qdrant");
|
||||
const { ORCHESTRATION } = require("@nexusai/shared");
|
||||
const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
|
||||
const appSettings = require("../config/settings");
|
||||
const {triggerSummary} = require('../services/summarization')
|
||||
const graph = require('../services/graph');
|
||||
|
||||
function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
|
||||
function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
|
||||
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
|
||||
|
||||
if (entities.length > 0) {
|
||||
parts.push(
|
||||
"Here is what you know about entities relevant to this conversation:",
|
||||
);
|
||||
for (const e of entities) {
|
||||
parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
|
||||
const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
|
||||
if (graphText) {
|
||||
parts.push("Here is what you know about entities relevant to this conversation and their connections:");
|
||||
parts.push(graphText);
|
||||
parts.push("---");
|
||||
}
|
||||
parts.push("---");
|
||||
}
|
||||
|
||||
if (semanticEpisodes.length > 0) {
|
||||
parts.push("Here are some relevant memories from earlier conversations:");
|
||||
for (const ep of semanticEpisodes) {
|
||||
if (selected.length > 0) {
|
||||
parts.push("Relevant memories from earlier conversations:");
|
||||
for (const ep of selected) {
|
||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||
}
|
||||
parts.push("---");
|
||||
}
|
||||
|
||||
if (recentEpisodes.length > 0) {
|
||||
parts.push(`Here are some relevant memories from your past conversations:`);
|
||||
for (const ep of recentEpisodes) {
|
||||
if (guaranteed.length > 0) {
|
||||
parts.push("Recent conversation history (most recent exchanges):");
|
||||
for (const ep of guaranteed) {
|
||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||
}
|
||||
parts.push("--- End of recent memories ---\n");
|
||||
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
|
||||
].join("\n");
|
||||
}
|
||||
|
||||
function formatGraphContext(nodes, edges) {
|
||||
if (!nodes.length) return null;
|
||||
|
||||
const nodeMap = new Map(nodes.map(n => [n.id, n]));
|
||||
|
||||
// Build outbound adjacency
|
||||
const outbound = new Map(nodes.map(n => [n.id, []]));
|
||||
for (const edge of edges) {
|
||||
if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
|
||||
const target = nodeMap.get(edge.to_id);
|
||||
outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
|
||||
}
|
||||
}
|
||||
|
||||
return nodes.map(n => {
|
||||
const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
|
||||
for (const conn of outbound.get(n.id) ?? []) lines.push(` → ${conn}`);
|
||||
return lines.join('\n');
|
||||
}).join('\n');
|
||||
}
|
||||
|
||||
|
||||
async function autoNameSession(externalId, userMessage, aiResponse) {
|
||||
try {
|
||||
const prompt = buildNamingPrompt(userMessage, aiResponse);
|
||||
@@ -64,12 +84,12 @@ async function autoNameSession(externalId, userMessage, aiResponse) {
|
||||
const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
|
||||
if (name) {
|
||||
await memory.updateSession(externalId, { name });
|
||||
console.log(
|
||||
logger.info(
|
||||
`[orchestration] Auto-named session "${externalId}": "${name}"`,
|
||||
);
|
||||
}
|
||||
} catch (err) {
|
||||
console.warn(
|
||||
logger.warn(
|
||||
"[orchestration] Auto-naming failed (non-critical):",
|
||||
err.message,
|
||||
);
|
||||
@@ -99,7 +119,7 @@ async function getSemanticEpisodes(
|
||||
);
|
||||
return fetched.filter(Boolean);
|
||||
} catch (err) {
|
||||
console.warn(
|
||||
logger.warn(
|
||||
`[orchestration] Semantic search failed, continuing without: `,
|
||||
err.message,
|
||||
);
|
||||
@@ -107,233 +127,287 @@ async function getSemanticEpisodes(
|
||||
}
|
||||
}
|
||||
|
||||
async function getRelevantEntities(userMessage, projectId=null) {
|
||||
try {
|
||||
const vector = await embedding.embed(userMessage);
|
||||
const results = await qdrant.searchEntities(vector, { projectId });
|
||||
console.log(
|
||||
"[orchestration] Entity search results:",
|
||||
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
||||
);
|
||||
return results.map((r) => r.payload).filter(Boolean);
|
||||
} catch (err) {
|
||||
console.warn(
|
||||
"[orchestration] Entity search failed, continuing without:",
|
||||
err.message,
|
||||
);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
async function chat(externalId, userMessage, options = {}) {
|
||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt} =
|
||||
appSettings.load();
|
||||
// 1. Resolve or create session
|
||||
let session = await memory.getSessionByExternalId(externalId);
|
||||
if (!session) session = await memory.createSession(externalId);
|
||||
|
||||
let projectSessionIds = null;
|
||||
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
|
||||
if (session.project_id) {
|
||||
async function getRelevantEntities(userMessage, projectId = null) {
|
||||
try {
|
||||
const project = await memory.getProject(session.project_id);
|
||||
if (project) {
|
||||
const projectSessions = await memory.getProjectSessions(session.project_id);
|
||||
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
|
||||
projectSessionIds = projectSessions.map((s) => s.id);
|
||||
}
|
||||
const vector = await embedding.embed(userMessage);
|
||||
const results = await qdrant.searchEntities(vector, { projectId });
|
||||
logger.info(
|
||||
'[orchestration] Entity search results:',
|
||||
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
||||
);
|
||||
// Include the Qdrant point ID (== SQLite entity ID) for graph traversal
|
||||
return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
|
||||
} catch (err) {
|
||||
console.warn(
|
||||
"[orchestration] Failed to resolve project context:",
|
||||
err.message,
|
||||
);
|
||||
logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
// 2. Fetch recent episodes for context
|
||||
const recentEpisodes = await memory.getRecentEpisodes(
|
||||
session.id,
|
||||
recentEpisodeLimit,
|
||||
);
|
||||
const isFirstMessage = recentEpisodes.length === 0;
|
||||
const recentIds = new Set(recentEpisodes.map((e) => e.id));
|
||||
|
||||
// 3. Semantic Search
|
||||
const semanticEpisodes = await getSemanticEpisodes(
|
||||
userMessage,
|
||||
session.id,
|
||||
recentIds,
|
||||
projectSessionIds,
|
||||
{ semanticLimit, scoreThreshold },
|
||||
);
|
||||
|
||||
// 3b. Entity Search
|
||||
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
|
||||
|
||||
// 4. Assemble prompt
|
||||
const prompt = buildPrompt(
|
||||
recentEpisodes,
|
||||
semanticEpisodes,
|
||||
entities,
|
||||
userMessage,
|
||||
activeSystemPrompt,
|
||||
);
|
||||
|
||||
// 5. Run inference
|
||||
const result = await inference.complete(prompt, {...options, temperature, repeatPenalty, topP, topK});
|
||||
|
||||
// 6. Write episode back to memory
|
||||
try {
|
||||
await memory.createEpisode(
|
||||
session.id, userMessage, result.text,
|
||||
(result.evalCount || 0) + (result.promptEvalCount || 0),
|
||||
session.project_id ?? null,
|
||||
);
|
||||
} catch (err) {
|
||||
console.error('[orchestration] Failed to save episode:', err.message);
|
||||
}
|
||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||
triggerSummary(session, allEpisodes);
|
||||
|
||||
|
||||
// 8. Auto-name on first message
|
||||
if (isFirstMessage && !session.name) {
|
||||
autoNameSession(externalId, userMessage, result.text).catch(() => {}); // already logged inside autoNameSession
|
||||
}
|
||||
|
||||
// 9. Return response
|
||||
return {
|
||||
sessionId: externalId,
|
||||
response: result.text,
|
||||
model: result.model,
|
||||
tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
|
||||
};
|
||||
}
|
||||
|
||||
async function chatStream(externalId, userMessage, onChunk, options = {}) {
|
||||
async function getFTSResults(userMessage, { limit, sessionIds }) {
|
||||
try {
|
||||
return await memory.searchEpisodes(userMessage, { limit, sessionIds });
|
||||
} catch (err) {
|
||||
logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
|
||||
return [];
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
try {
|
||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt } = appSettings.load();
|
||||
// Returns {episode, score}[] — scores needed for buildScoredPool downstream
|
||||
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||
const k = RETRIEVAL.RRF_K;
|
||||
const scores = new Map();
|
||||
|
||||
semanticEps.forEach((ep, i) => {
|
||||
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||
});
|
||||
|
||||
keywordEps.forEach((ep, i) => {
|
||||
const contrib = keywordWeight / (k + i + 1);
|
||||
if (scores.has(ep.id)) {
|
||||
scores.get(ep.id).score += contrib;
|
||||
} else if (contrib > 0) {
|
||||
scores.set(ep.id, { episode: ep, score: contrib });
|
||||
}
|
||||
});
|
||||
|
||||
return [...scores.values()]
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, limit);
|
||||
|
||||
}
|
||||
|
||||
function estimateTokens(episode) {
|
||||
return episode.token_count
|
||||
?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
|
||||
}
|
||||
|
||||
function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
|
||||
const k = RETRIEVAL.RRF_K;
|
||||
const pool = new Map(); // episode.id → {episode, score}
|
||||
|
||||
for (const { episode, score } of fusedWithScores) {
|
||||
pool.set(episode.id, { episode, score });
|
||||
}
|
||||
|
||||
recentEpisodes.forEach((ep, i) => {
|
||||
const recencyScore = 1.0 / (k + i + 1);
|
||||
if (pool.has(ep.id)) {
|
||||
pool.get(ep.id).score += recencyScore;
|
||||
} else {
|
||||
pool.set(ep.id, { episode: ep, score: recencyScore });
|
||||
}
|
||||
});
|
||||
|
||||
for (const id of entityBoostedIds) {
|
||||
if (pool.has(id)) pool.get(id).score += entityWeight;
|
||||
}
|
||||
|
||||
return [...pool.values()].sort((a, b) => b.score - a.score);
|
||||
}
|
||||
|
||||
function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
|
||||
let budget = contextBudget;
|
||||
const sortByTime = (a, b) => a.created_at - b.created_at;
|
||||
|
||||
// Guarantee floor: always include the N most recent episodes
|
||||
const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
|
||||
const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
|
||||
for (const ep of guaranteed) budget -= estimateTokens(ep);
|
||||
|
||||
// Fill remaining budget from scored pool, highest-priority first
|
||||
const selected = [];
|
||||
for (const { episode } of scoredPool) {
|
||||
if (guaranteedIds.has(episode.id)) continue;
|
||||
const cost = estimateTokens(episode);
|
||||
|
||||
// // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
|
||||
if (budget - cost < 0) break;
|
||||
selected.push(episode);
|
||||
budget -= cost;
|
||||
}
|
||||
|
||||
return {
|
||||
guaranteed: [...guaranteed].sort(sortByTime),
|
||||
selected: selected.sort(sortByTime),
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
|
||||
const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
|
||||
const ftsSessionIds = projectSessionIds ?? [session.id];
|
||||
|
||||
const ftsPromise = keywordWeight > 0
|
||||
// FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
|
||||
? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
|
||||
: Promise.resolve([]);
|
||||
|
||||
const [semanticEps, rawKeywordEps] = await Promise.all([
|
||||
getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
|
||||
ftsPromise,
|
||||
]);
|
||||
|
||||
const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
|
||||
return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
|
||||
}
|
||||
|
||||
async function assembleContext(externalId, userMessage) {
|
||||
const settings = appSettings.load();
|
||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold,
|
||||
temperature, repeatPenalty, topP, topK, systemPrompt,
|
||||
semanticWeight, keywordWeight,
|
||||
contextBudget, entityWeight, minRecentEpisodes } = settings;
|
||||
|
||||
// 1. Resolve or create session
|
||||
let session = await memory.getSessionByExternalId(externalId);
|
||||
if (!session) session = await memory.createSession(externalId);
|
||||
|
||||
// 2. Resolve project context
|
||||
let projectSessionIds = null;
|
||||
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
|
||||
if (session.project_id) {
|
||||
try {
|
||||
const project = await memory.getProject(session.project_id);
|
||||
if (project) {
|
||||
const projectSessions = await memory.getProjectSessions(
|
||||
session.project_id,
|
||||
);
|
||||
projectSessionIds = projectSessions.map((s) => s.id);
|
||||
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
|
||||
}
|
||||
|
||||
} catch (err) {
|
||||
console.warn(
|
||||
"[orchestration] Failed to resolve project context:",
|
||||
err.message,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
const recentEpisodes = await memory.getRecentEpisodes(
|
||||
session.id,
|
||||
recentEpisodeLimit,
|
||||
);
|
||||
const isFirstMessage = recentEpisodes.length === 0;
|
||||
const recentIds = new Set(recentEpisodes.map((e) => e.id));
|
||||
const semanticEpisodes = await getSemanticEpisodes(
|
||||
userMessage,
|
||||
session.id,
|
||||
recentIds,
|
||||
projectSessionIds,
|
||||
{semanticLimit, scoreThreshold }
|
||||
);
|
||||
|
||||
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
|
||||
|
||||
const prompt = buildPrompt(
|
||||
recentEpisodes,
|
||||
semanticEpisodes,
|
||||
entities,
|
||||
userMessage,
|
||||
activeSystemPrompt,
|
||||
);
|
||||
const res = await inference.completeStream(prompt, {...options, temperature, repeatPenalty, topP, topK});
|
||||
|
||||
let fullText = "";
|
||||
let model = "";
|
||||
let tokenCount = 0;
|
||||
let buffer = "";
|
||||
|
||||
for await (const chunk of res.body) {
|
||||
buffer += Buffer.from(chunk).toString("utf8");
|
||||
|
||||
const events = buffer.split("\n\n");
|
||||
buffer = events.pop() || "";
|
||||
|
||||
for (const event of events) {
|
||||
const lines = event.split("\n");
|
||||
const dataLines = lines
|
||||
.filter((line) => line.startsWith("data: "))
|
||||
.map((line) => line.slice(6));
|
||||
|
||||
if (dataLines.length === 0) continue;
|
||||
|
||||
const raw = dataLines.join("\n").trim();
|
||||
if (raw === "[DONE]") continue;
|
||||
|
||||
try {
|
||||
const data = JSON.parse(raw);
|
||||
|
||||
if (data.response) {
|
||||
fullText += data.response;
|
||||
onChunk(data.response);
|
||||
}
|
||||
|
||||
if (data.model) model = data.model;
|
||||
if (data.done && data.tokenCount !== undefined) {
|
||||
tokenCount = data.tokenCount;
|
||||
}
|
||||
|
||||
if (data.error) {
|
||||
throw new Error(data.error);
|
||||
}
|
||||
const project = await memory.getProject(session.project_id);
|
||||
if (project) {
|
||||
const projectSessions = await memory.getProjectSessions(session.project_id);
|
||||
if (project.system_prompt) activeSystemPrompt = project.system_prompt;
|
||||
projectSessionIds = projectSessions.map(s => s.id);
|
||||
}
|
||||
} catch (err) {
|
||||
console.error(
|
||||
"[orchestration] Failed to parse inference SSE event:",
|
||||
raw,
|
||||
err.message,
|
||||
);
|
||||
logger.warn('[orchestration] Failed to resolve project context:', err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (fullText.trim()) {
|
||||
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
|
||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||
triggerSummary(session, allEpisodes);
|
||||
} else {
|
||||
console.warn(
|
||||
"[orchestration] Stream finished with no assistant text; episode not saved",
|
||||
);
|
||||
// 3. Fetch recent episodes
|
||||
const recentEpisodes = await memory.getRecentEpisodes(session.id, recentEpisodeLimit);
|
||||
const isFirstMessage = recentEpisodes.length === 0;
|
||||
const recentIds = new Set(recentEpisodes.map(e => e.id));
|
||||
|
||||
// 4. Fused retrieval + entity search in parallel (both are independent)
|
||||
const [fusedWithScores, entityResults] = await Promise.all([
|
||||
getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
|
||||
getRelevantEntities(userMessage, session.project_id ?? null),
|
||||
]);
|
||||
|
||||
// 5. Entity-linked episode IDs for scoring bonus
|
||||
const entityIds = entityResults.map(e => e.id);
|
||||
let entityBoostedIds = new Set();
|
||||
if (entityIds.length > 0) {
|
||||
try {
|
||||
const result = await memory.getEpisodesByEntities(entityIds);
|
||||
entityBoostedIds = new Set(result.episodeIds);
|
||||
} catch (err) {
|
||||
logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
|
||||
}
|
||||
}
|
||||
|
||||
// 6. Build unified scored pool and select within token budget
|
||||
const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
|
||||
const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);
|
||||
|
||||
// 7. Graph neighborhood expansion
|
||||
let neighborhood = { nodes: [], edges: [] };
|
||||
if (entityIds.length > 0) {
|
||||
try {
|
||||
neighborhood = await graph.getNeighbors(entityIds);
|
||||
} catch (err) {
|
||||
logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
|
||||
neighborhood = { nodes: entityResults, edges: [] };
|
||||
}
|
||||
}
|
||||
|
||||
// 8. Assemble prompt
|
||||
const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
|
||||
|
||||
return {
|
||||
session,
|
||||
prompt,
|
||||
isFirstMessage,
|
||||
inferenceOptions: { temperature, repeatPenalty, topP, topK },
|
||||
};
|
||||
}
|
||||
|
||||
async function chat(externalId, userMessage, options = {}) {
|
||||
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
|
||||
|
||||
const result = await inference.complete(prompt, { ...options, ...inferenceOptions });
|
||||
|
||||
try {
|
||||
await memory.createEpisode(
|
||||
session.id, userMessage, result.text,
|
||||
(result.evalCount || 0) + (result.promptEvalCount || 0),
|
||||
session.project_id ?? null,
|
||||
);
|
||||
} catch (err) {
|
||||
logger.error('[orchestration] Failed to save episode:', err.message);
|
||||
}
|
||||
|
||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||
triggerSummary(session, allEpisodes);
|
||||
|
||||
if (isFirstMessage && !session.name) {
|
||||
autoNameSession(externalId, userMessage, fullText).catch(() => {});
|
||||
autoNameSession(externalId, userMessage, result.text).catch(() => {});
|
||||
}
|
||||
|
||||
return { model, tokenCount };
|
||||
} catch (err) {
|
||||
console.error(
|
||||
"[orchestration] chatStream fatal error:",
|
||||
err.message,
|
||||
err.stack,
|
||||
);
|
||||
throw err;
|
||||
}
|
||||
return {
|
||||
sessionId: externalId,
|
||||
response: result.text,
|
||||
model: result.model,
|
||||
tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
|
||||
};
|
||||
}
|
||||
|
||||
async function chatStream(externalId, userMessage, onChunk, options = {}) {
|
||||
try {
|
||||
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
|
||||
|
||||
const res = await inference.completeStream(prompt, { ...options, ...inferenceOptions });
|
||||
|
||||
let fullText = '', model = '', tokenCount = 0, buffer = '';
|
||||
|
||||
for await (const chunk of res.body) {
|
||||
buffer += Buffer.from(chunk).toString('utf8');
|
||||
const events = buffer.split('\n\n');
|
||||
buffer = events.pop() || '';
|
||||
|
||||
for (const event of events) {
|
||||
const dataLines = event.split('\n')
|
||||
.filter(line => line.startsWith('data: '))
|
||||
.map(line => line.slice(6));
|
||||
|
||||
if (!dataLines.length) continue;
|
||||
const raw = dataLines.join('\n').trim();
|
||||
if (raw === '[DONE]') continue;
|
||||
|
||||
try {
|
||||
const data = JSON.parse(raw);
|
||||
if (data.response) { fullText += data.response; onChunk(data.response); }
|
||||
if (data.model) model = data.model;
|
||||
if (data.done && data.tokenCount !== undefined) tokenCount = data.tokenCount;
|
||||
if (data.error) throw new Error(data.error);
|
||||
} catch (err) {
|
||||
logger.error('[orchestration] Failed to parse SSE event:', raw, err.message);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (fullText.trim()) {
|
||||
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
|
||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||
triggerSummary(session, allEpisodes);
|
||||
} else {
|
||||
logger.warn('[orchestration] Stream finished with no assistant text; episode not saved');
|
||||
}
|
||||
|
||||
if (isFirstMessage && !session.name) {
|
||||
autoNameSession(externalId, userMessage, fullText).catch(() => {});
|
||||
}
|
||||
|
||||
return { model, tokenCount };
|
||||
} catch (err) {
|
||||
logger.error('[orchestration] chatStream fatal error:', err.message, err.stack);
|
||||
throw err;
|
||||
}
|
||||
}
|
||||
|
||||
module.exports = { chat, chatStream };
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
const fs = require('fs');
|
||||
const path = require('path');
|
||||
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
|
||||
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');
|
||||
|
||||
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
|
||||
|
||||
@@ -14,6 +14,11 @@ const DEFAULTS = {
|
||||
topP: INFERENCE_DEFAULTS.TOP_P,
|
||||
topK: INFERENCE_DEFAULTS.TOP_K,
|
||||
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
|
||||
semanticWeight: RETRIEVAL.SEMANTIC_WEIGHT,
|
||||
keywordWeight: RETRIEVAL.KEYWORD_WEIGHT,
|
||||
contextBudget: ORCHESTRATION.CONTEXT_BUDGET,
|
||||
entityWeight: ORCHESTRATION.ENTITY_WEIGHT,
|
||||
minRecentEpisodes: ORCHESTRATION.MIN_RECENT_EPISODES,
|
||||
};
|
||||
|
||||
function load() {
|
||||
|
||||
@@ -15,7 +15,7 @@ const summariesRouter = require('./routes/summaries')
|
||||
const cors = require('cors');
|
||||
|
||||
const app = express();
|
||||
app.use(express.json());
|
||||
app.use(express.json({ limit: '2mb' }));
|
||||
|
||||
app.use(cors({
|
||||
origin: [
|
||||
|
||||
@@ -1,6 +1,8 @@
|
||||
const { Router } = require('express')
|
||||
const { chat, chatStream } = require('../chat/index');
|
||||
const memory = require('../services/memory')
|
||||
const logger = require('@nexusai/shared');
|
||||
|
||||
|
||||
const router = Router();
|
||||
|
||||
@@ -17,8 +19,8 @@ router.post('/', async (req, res) => {
|
||||
});
|
||||
res.json(result)
|
||||
} catch (err) {
|
||||
console.error(`[orchestration] chat error: `, err.message)
|
||||
res.status(500).json ({ error: err.message})
|
||||
logger.error(`[orchestration] chat error: `, err.message)
|
||||
res.status(500).json ({ error: 'Chat failed', detail: err.message })
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -9,7 +9,7 @@ router.get('/', async (req, res) => {
|
||||
const result = await memory.getEpisodes({ limit, offset, sessionId, q });
|
||||
res.json(result);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch episodes', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -18,7 +18,7 @@ router.delete('/:id', async (req, res) => {
|
||||
await memory.deleteEpisode(req.params.id);
|
||||
res.status(204).send();
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to delete episode', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ const fs = require('fs');
|
||||
const path = require('path');
|
||||
const appSettings = require('../config/settings');
|
||||
|
||||
const { getEnv, LLAMACPP } = require('@nexusai/shared');
|
||||
const { getEnv, LLAMACPP, logger } = require('@nexusai/shared');
|
||||
const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);
|
||||
|
||||
router.get('/', (req, res) => {
|
||||
@@ -38,7 +38,7 @@ router.get('/', (req, res) => {
|
||||
|
||||
res.json(models);
|
||||
} catch (err) {
|
||||
console.error('[models] Failed to scan folder:', err.message);
|
||||
logger.error('[models] Failed to scan folder:', err.message);
|
||||
res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
|
||||
}
|
||||
});
|
||||
@@ -53,7 +53,7 @@ router.get('/props', async (req, res) => {
|
||||
modelAlias: data.model_alias,
|
||||
});
|
||||
} catch (err) {
|
||||
console.error('[models/props]', err.message);
|
||||
logger.error('[models/props]', err.message);
|
||||
res.status(503).json({ error: 'Could not reach llama-server' });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -7,7 +7,7 @@ router.get('/', async (req, res) => {
|
||||
try {
|
||||
res.json(await memory.getProjects());
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch projects', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -17,7 +17,7 @@ router.post('/', async (req, res) => {
|
||||
try {
|
||||
res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to create project', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -25,7 +25,7 @@ router.patch('/:id', async (req, res) => {
|
||||
try {
|
||||
res.json(await memory.updateProject(req.params.id, req.body));
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to update project', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -34,7 +34,7 @@ router.delete('/:id', async (req, res) => {
|
||||
await memory.deleteProject(req.params.id);
|
||||
res.status(204).send();
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to delete project', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -15,7 +15,7 @@ router.get('/:sessionId/history', async (req, res) => {
|
||||
const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
|
||||
res.json({ sessionId, episodes: history });
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch session history', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -26,7 +26,7 @@ router.get('/', async (req, res) => {
|
||||
const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
|
||||
res.json(sessions);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch sessions', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -45,7 +45,7 @@ router.patch('/:sessionId', async (req, res) => {
|
||||
});
|
||||
res.json(session);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to update session', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -54,7 +54,7 @@ router.delete('/:sessionId', async (req, res) => {
|
||||
await memory.deleteSession(req.params.sessionId);
|
||||
res.status(204).send();
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to delete session', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
|
||||
updates.systemPrompt = val.trim() || null; // null reverts to default
|
||||
}
|
||||
|
||||
if (req.body.semanticWeight !== undefined) {
|
||||
const val = Number(req.body.semanticWeight);
|
||||
if (isNaN(val) || val < 0 || val > 5)
|
||||
return res.status(400).json({ error: 'semanticWeight must be 0–5' });
|
||||
updates.semanticWeight = val;
|
||||
}
|
||||
|
||||
if (req.body.keywordWeight !== undefined) {
|
||||
const val = Number(req.body.keywordWeight);
|
||||
if (isNaN(val) || val < 0 || val > 5)
|
||||
return res.status(400).json({ error: 'keywordWeight must be 0–5' });
|
||||
updates.keywordWeight = val;
|
||||
}
|
||||
|
||||
if (req.body.contextBudget !== undefined) {
|
||||
const val = Number(req.body.contextBudget);
|
||||
if (!Number.isInteger(val) || val < 512 || val > 32768)
|
||||
return res.status(400).json({ error: 'contextBudget must be 512–32768' });
|
||||
updates.contextBudget = val;
|
||||
}
|
||||
|
||||
if (req.body.entityWeight !== undefined) {
|
||||
const val = Number(req.body.entityWeight);
|
||||
if (isNaN(val) || val < 0 || val > 2)
|
||||
return res.status(400).json({ error: 'entityWeight must be 0–2' });
|
||||
updates.entityWeight = val;
|
||||
}
|
||||
|
||||
if (req.body.minRecentEpisodes !== undefined) {
|
||||
const val = Number(req.body.minRecentEpisodes);
|
||||
if (!Number.isInteger(val) || val < 0 || val > 10)
|
||||
return res.status(400).json({ error: 'minRecentEpisodes must be 0–10' });
|
||||
updates.minRecentEpisodes = val;
|
||||
}
|
||||
|
||||
res.json(settings.save(updates));
|
||||
});
|
||||
|
||||
|
||||
@@ -21,7 +21,7 @@ router.get('/project/:projectId/overview', async (req, res) => {
|
||||
const summary = await memory.getProjectOverviewSummary(req.params.projectId);
|
||||
res.json(summary);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch project overview summary', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -32,7 +32,7 @@ router.get('/session/:sessionId', async (req, res) => {
|
||||
const summaries = await memory.getSummariesBySession(session.id);
|
||||
res.json(summaries);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch session summaries', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
@@ -41,7 +41,7 @@ router.get('/project/:projectId', async (req, res) => {
|
||||
const summaries = await memory.getSummariesByProject(req.params.projectId);
|
||||
res.json(summaries);
|
||||
} catch (err) {
|
||||
res.status(500).json({ error: err.message });
|
||||
res.status(500).json({ error: 'Failed to fetch project summaries', detail: err.message });
|
||||
}
|
||||
});
|
||||
|
||||
|
||||
15
packages/orchestration-service/src/services/graph.js
Normal file
15
packages/orchestration-service/src/services/graph.js
Normal file
@@ -0,0 +1,15 @@
|
||||
const { getEnv, SERVICES } = require('@nexusai/shared');
|
||||
|
||||
const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
|
||||
|
||||
async function getNeighbors(entityIds) {
|
||||
const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ entityIds }),
|
||||
});
|
||||
if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
|
||||
module.exports = { getNeighbors };
|
||||
@@ -196,6 +196,26 @@ async function getProjectOverviewSummary(projectId) {
|
||||
return res.json(); // null if none exists yet
|
||||
}
|
||||
|
||||
async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
|
||||
const url = new URL(`${BASE_URL}/episodes/search`);
|
||||
url.searchParams.set('q', query);
|
||||
url.searchParams.set('limit', limit);
|
||||
if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
|
||||
const res = await fetch(url.toString());
|
||||
if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
|
||||
return res.json();
|
||||
}
|
||||
|
||||
async function getEpisodesByEntities(entityIds) {
|
||||
const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({ entityIds }),
|
||||
});
|
||||
if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
|
||||
return res.json(); // { episodeIds: [...] }
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
getSessionByExternalId,
|
||||
createSession,
|
||||
@@ -220,4 +240,6 @@ module.exports = {
|
||||
getSummariesByProject,
|
||||
generateProjectSummary,
|
||||
getProjectOverviewSummary,
|
||||
searchEpisodes,
|
||||
getEpisodesByEntities,
|
||||
}
|
||||
@@ -1,4 +1,4 @@
|
||||
const { getEnv, SERVICES, SUMMARIES } = require('@nexusai/shared');
|
||||
const { getEnv, SERVICES, SUMMARIES, logger } = require('@nexusai/shared');
|
||||
|
||||
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
||||
@@ -104,7 +104,7 @@ async function maybeSummarize(session, allEpisodes) {
|
||||
const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
|
||||
|
||||
// add temporarily before the generateSummary call
|
||||
console.log('[summarization] episodes to summarize:', episodesToSummarize.length);
|
||||
logger.debug('[summarization] episodes to summarize:', episodesToSummarize.length);
|
||||
|
||||
const content = await generateSummary(
|
||||
episodesToSummarize,
|
||||
@@ -126,7 +126,7 @@ async function maybeSummarize(session, allEpisodes) {
|
||||
episodeRange,
|
||||
}),
|
||||
});
|
||||
console.log(`[summarization] Created new summary for session ${session.id}`);
|
||||
logger.debug(`[summarization] Created new summary for session ${session.id}`);
|
||||
} else {
|
||||
await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
|
||||
method: 'PATCH',
|
||||
@@ -137,14 +137,14 @@ async function maybeSummarize(session, allEpisodes) {
|
||||
episodeRange,
|
||||
}),
|
||||
});
|
||||
console.log(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
|
||||
logger.debug(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
|
||||
}
|
||||
}
|
||||
|
||||
async function triggerSummary(session, allEpisodes) {
|
||||
// Intentionally fire-and-forget — caller doesn't await this
|
||||
maybeSummarize(session, allEpisodes).catch(err =>
|
||||
console.warn('[summarization] Summary failed (non-critical):', err.message)
|
||||
logger.warn('[summarization] Summary failed (non-critical):', err.message)
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -28,6 +28,9 @@ const ORCHESTRATION = {
|
||||
ENTITIES_LIMIT: 5,
|
||||
ENTITIES_THRESHOLD: 0.55,
|
||||
TEMPERATURE: 0.7,
|
||||
CONTEXT_BUDGET: 4096,
|
||||
ENTITY_WEIGHT: 0.5,
|
||||
MIN_RECENT_EPISODES: 2,
|
||||
CORS_ORIGIN: 'http://localhost:5173',
|
||||
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
|
||||
}
|
||||
@@ -79,8 +82,27 @@ const SUMMARIES = {
|
||||
|
||||
const ENTITIES = {
|
||||
TEMPERATURE: 0.1, // Low temperature, more precise extraction, less creative
|
||||
NUM_PREDICT: 1024, // Max tokens to consider for entity extraction (e.g. recent conversation)
|
||||
NUM_PREDICT: 1500, // Max tokens to consider for entity extraction (e.g. recent conversation)
|
||||
THRESHOLD: 0.55, // Minimum confidence score for an extracted entity to be included in the results
|
||||
PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
|
||||
GRAPH_HOP_DEPTH: 1, // Default traversal depth for neighborhood queries
|
||||
TYPES: [
|
||||
'person',
|
||||
'place',
|
||||
'project',
|
||||
'technology',
|
||||
'concept',
|
||||
'organization',
|
||||
'character',
|
||||
'event',
|
||||
'topic'
|
||||
],
|
||||
}
|
||||
|
||||
const RETRIEVAL = {
|
||||
RRF_K: 60, // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
|
||||
SEMANTIC_WEIGHT: 1.0, // Weight applied to semantic (QDrant) results
|
||||
KEYWORD_WEIGHT: 0, // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
|
||||
}
|
||||
|
||||
module.exports = {
|
||||
@@ -95,5 +117,6 @@ module.exports = {
|
||||
SQLITE,
|
||||
ORCHESTRATION,
|
||||
SUMMARIES,
|
||||
ENTITIES
|
||||
ENTITIES,
|
||||
RETRIEVAL,
|
||||
};
|
||||
@@ -1,5 +1,5 @@
|
||||
const {getEnv} = require('./config/env');
|
||||
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES } = require('./config/constants');
|
||||
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
|
||||
const {parseRow, formatEpisodeText} = require('./utils')
|
||||
const logger = require('./utils/logger');
|
||||
|
||||
@@ -20,4 +20,5 @@ module.exports = {
|
||||
SUMMARIES,
|
||||
ENTITIES,
|
||||
logger,
|
||||
RETRIEVAL,
|
||||
};
|
||||
67
test-fusion.js
Normal file
67
test-fusion.js
Normal file
@@ -0,0 +1,67 @@
|
||||
// test-fusion.js
|
||||
const { RETRIEVAL } = require('./packages/shared/src/config/constants');
|
||||
|
||||
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||
const k = RETRIEVAL.RRF_K;
|
||||
const scores = new Map();
|
||||
semanticEps.forEach((ep, i) => {
|
||||
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||
});
|
||||
keywordEps.forEach((ep, i) => {
|
||||
const contrib = keywordWeight / (k + i + 1);
|
||||
if (scores.has(ep.id)) {
|
||||
scores.get(ep.id).score += contrib;
|
||||
} else if (contrib > 0) {
|
||||
scores.set(ep.id, { episode: ep, score: contrib });
|
||||
}
|
||||
});
|
||||
return [...scores.values()]
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, limit)
|
||||
.map(({ episode }) => episode);
|
||||
}
|
||||
|
||||
// --- Test 1: episodes in both lists rank highest ---
|
||||
const semantic = [
|
||||
{ id: 1, user_message: 'ep1 — semantic only, rank 1' },
|
||||
{ id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
|
||||
{ id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
|
||||
];
|
||||
const keyword = [
|
||||
{ id: 3, user_message: 'ep3 — rank 1 FTS' },
|
||||
{ id: 2, user_message: 'ep2 — rank 2 FTS' },
|
||||
{ id: 4, user_message: 'ep4 — FTS only, rank 3' },
|
||||
];
|
||||
|
||||
const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||
console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
|
||||
result.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id} "${ep.user_message}"`));
|
||||
console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
|
||||
console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
|
||||
console.log(' PASS\n');
|
||||
|
||||
// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
|
||||
const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
|
||||
console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
|
||||
result2.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id}`));
|
||||
console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
|
||||
console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
|
||||
console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
|
||||
console.log(' PASS\n');
|
||||
|
||||
// --- Test 3: limit is respected ---
|
||||
const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
|
||||
console.log('Test 3 — limit:2 should return exactly 2 results:');
|
||||
console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
|
||||
console.log(' PASS\n');
|
||||
|
||||
// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
|
||||
const semOnly = [{ id: 10, user_message: 'sem' }];
|
||||
const ftsOnly = [{ id: 20, user_message: 'fts' }];
|
||||
const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||
console.log('Test 4 — no overlap, both should appear:');
|
||||
console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
|
||||
console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
|
||||
console.log(' PASS\n');
|
||||
|
||||
console.log('All tests passed.');
|
||||
Reference in New Issue
Block a user