Compare commits

...

20 Commits

Author SHA1 Message Date
Storme-bit
e4908193bd smarter context assembly implementation 2026-04-27 21:41:32 -07:00
Storme-bit
b58a4e4692 minor clean up 2026-04-27 20:17:05 -07:00
Storme-bit
055683424d retrieval fusion 2026-04-27 07:03:46 -07:00
Storme-bit
27ad614130 retrieval fusion 2026-04-27 05:56:23 -07:00
Storme-bit
8ade5c68ca retrieval fusion 2026-04-27 05:46:01 -07:00
Storme-bit
49982a85de retrieval fusion 2026-04-27 05:21:43 -07:00
Storme-bit
9c6c5c9a42 entity extraction prompt 2026-04-27 03:50:13 -07:00
Storme-bit
c9cbac87ac knowledge graph entity fixes 2026-04-27 03:41:56 -07:00
Storme-bit
1a97b19280 roadmap phase 1 complete 2026-04-27 03:10:39 -07:00
Storme-bit
9fe8e568cf roadmap phase 1 complete 2026-04-27 00:28:42 -07:00
Storme-bit
5ad01c6ad8 clean up 2026-04-27 00:14:51 -07:00
Storme-bit
aac0923351 Merge branch 'main' of http://192.168.0.205:3100/storme/nexusai 2026-04-27 00:10:16 -07:00
Storme-bit
54218894c0 logger clean up 2026-04-27 00:09:16 -07:00
Storme-bit
66a95f4479 logger clean up 2026-04-27 00:07:51 -07:00
78476e166f Delete .claude/settings.local.json 2026-04-27 06:57:49 +00:00
Storme-bit
696ead29f8 chat/index.js cleanup 2026-04-26 23:04:31 -07:00
Storme-bit
45db47a584 error response consistency, human readible1 2026-04-26 23:00:55 -07:00
Storme-bit
095c9a623e error response consistency, human readible1 2026-04-26 23:00:18 -07:00
Storme-bit
f5011fddca logger updates 2026-04-26 22:29:57 -07:00
Storme-bit
86e78cc4c6 logger updates 2026-04-26 22:28:54 -07:00
51 changed files with 1933 additions and 563 deletions

1
.gitignore vendored
View File

@@ -5,4 +5,5 @@ data/
.env
.env.*
*.db
.claude/settings.local.json
EOF

View File

@@ -1,2 +0,0 @@
{
}

108
CLAUDE.md Normal file
View File

@@ -0,0 +1,108 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Development Commands
```bash
# Start individual services
npm run memory # Memory Service (port 3002)
npm run embedding # Embedding Service (port 3003)
npm run inference # Inference Service (port 3001)
npm run orchestration # Orchestration Service (port 4000)
npm run mini1 # Start memory + embedding concurrently
# Per-service dev mode (with --watch)
npm -w packages/<service-name> run dev
# Chat client
npm -w packages/chat-client run dev # Vite dev server (port 5173)
npm -w packages/chat-client run build # Production build
```
No test framework or linter is configured.
## Architecture Overview
NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
### Services
| Package | Port | Role |
|---|---|---|
| `orchestration-service` | 4000 | Central gateway; coordinates all others |
| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
| `chat-client` | 5173 | React/Vite frontend |
| `shared` | — | Constants, env helpers, logger, formatters |
All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
### Chat Request Flow
1. Client POSTs to orchestration `/chat/stream`
2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
3. Embedding computed for user message (embedding-service)
4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
5. Inference streams response (inference-service)
6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
7. Entity extraction triggered async (qwen2.5:3b via inference-service)
8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
9. Auto-naming on first message (temp 0.3, 20 tokens max)
### Memory Model
**Dual store — neither works alone:**
- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
### Key File Locations
**Orchestration** (`packages/orchestration-service/src/`):
- `chat/index.js` — Core prompt building and memory assembly
- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
**Memory** (`packages/memory-service/src/`):
- `db/schema.js` — SQLite table definitions (source of truth for data model)
- `episodic/` — Episode CRUD
- `semantic/` — Qdrant operations
- `entities/` — Entity extraction + CRUD
- `summarization/` — Project summary generation
**Shared** (`packages/shared/src/`):
- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
- `config/env.js``getEnv()` helper with fallback to constants
- `utils.js``parseRow()`, `formatEpisodeText()`, `logger`
**Frontend** (`packages/chat-client/src/`):
- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
- `hooks/``useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
- `api/orchestration.js` — Fetch wrapper for all API calls
- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
### Configuration
Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
### Deployment
Home lab across 3 nodes, managed with Docker Compose:
- **Main PC** — RTX A4000 (inference via llama.cpp)
- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
## Key Development Principles
- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
- **All services read settings on every request** — no restart required for config changes
- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend

View File

@@ -73,8 +73,8 @@ service by ID after the vector search.
The core four-service architecture is complete and operational. Key capabilities:
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
- **Projects** — sessions grouped with shared or isolated memory pools
- **Auto-naming** — sessions named automatically from first exchange via inference
- **Project-scoped semantic search** — Qdrant filtered by project session IDs

View File

@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
|---|---|---|---|
| `recentEpisodeLimit` | integer | 120 | Recent episodes injected into prompt |
| `semanticLimit` | integer | 120 | Max semantic search results |
| `scoreThreshold` | float | 01 | Minimum similarity score |
| `scoreThreshold` | float | 01 | Minimum similarity score for Qdrant results |
| `semanticWeight` | float | 05 | RRF weight for Qdrant semantic results |
| `keywordWeight` | float | 05 | RRF weight for FTS5 keyword results (`0` = disabled) |
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
| `temperature` | float | 02 | Inference randomness |
| `repeatPenalty` | float | 12 | Repeat token penalty |
@@ -360,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.
**DELETE /relationships — body:**
```json
{ "fromId": 1, "toId": 2, "label": "uses" }
{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
```
notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.
Relationships are identified by the composite key `(fromId, toId, label)`.
Delete uses request body rather than URL params since this three-part key
is awkward to encode in a path.
### Graph
| Method | Path | Description |
|---|---|---|
| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
**GET /graph/neighborhood/:entityId — query params:**
| Param | Default | Max | Description |
|---|---|---|---|
| depth | 1 | 3 | Traversal depth |
Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
**POST /graph/neighbors — body:**
```json
{ "entityIds": [5, 8, 12] }
Returns { nodes: [...], edges: [...] }. Used internally by orchestration not a client-facing endpoint.
---
## Embedding Service port 3003

View File

@@ -39,36 +39,36 @@
---
## Phase 1 — Loose Ends & Stability
## Phase 1 — Loose Ends & Stability - COMPLETE ✅
*Target: Next development session (Saturday)*
### Bug Fixes
- [ ] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
**Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
**Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
### Tech Debt
- [ ] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
- [ ] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
- [ ] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
- [ ] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
**Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
**Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
**Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
**Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
---
## Phase 2 — Memory System Upgrades
*The core intelligence layer*
### 1. Knowledge Graph (SQLite)
### 1. Knowledge Graph (SQLite)
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
- [ ] Graph schema — `nodes` and `edges` tables with typed relationships
- [ ] Entity → node promotion pipeline
- [ ] Relationship traversal queries
- [ ] Graph-aware context assembly in orchestration
- [x] Graph schema — `nodes` and `edges` tables with typed relationships
- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
- [x] Relationship traversal queries
- [x] Graph-aware context assembly in orchestration
### 2. Retrieval Fusion + Full-Text Search
### 2. Retrieval Fusion + Full-Text Search
Multi-strategy retrieval merged into a single ranked result set.
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- [ ] Configurable weights per retrieval strategy
- [ ] Score threshold tuning per collection
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
### 3. Memory Consolidation Lifecycle
Prevents long-term memory degradation and enables compression.

View File

@@ -1,178 +1,140 @@
# Memory Service
# Entity Extraction
**Package:** `@nexusai/memory-service`
**Location:** `packages/memory-service`
**Deployed on:** Mini PC 1 (192.168.0.81)
**Port:** 3002
**Location:** `packages/memory-service/src/entities/extraction.js`
**Triggered by:** Episode creation (`POST /episodes`)
**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
## Purpose
Responsible for all reading and writing of long-term memory. Acts as the
sole interface to both SQLite and Qdrant — no other service accesses these
stores directly. On episode creation, automatically calls the embedding
service to generate and store a vector in Qdrant.
After each episode is saved to SQLite, the extraction pipeline runs
asynchronously in the background to identify named entities and the
relationships between them. Results are written back to SQLite and
embedded into Qdrant — the episode response is never delayed.
## Dependencies
## Trigger
- `express` — HTTP API
- `better-sqlite3` — SQLite driver
- `@qdrant/js-client-rest` — Qdrant vector store client
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities and constants
## Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3002 | Port to listen on |
| SQLITE_PATH | Yes | — | Path to SQLite database file |
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
## Internal Structure
```
src/
├── db/
│ ├── index.js # SQLite connection + initialization + migrations
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
│ ├── projects.js # Project CRUD functions
│ └── summaries.js # Summary CRUD functions
├── episodic/
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
├── semantic/
│ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/
│ ├── index.js # Entity + relationship CRUD
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
└── index.js # Express app + all route definitions
```
## SQLite Schema
Seven core tables:
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
### Migrations
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
idempotent migrations in `db/index.js` at startup:
`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
immediately after the SQLite insert, without awaiting it:
```js
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
```
New migrations are always appended here — never modify the schema file for
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
If extraction throws, the episode is unaffected — the error is logged and
swallowed.
### FTS5 Full-Text Search
## Model Settings
An `episodes_fts` virtual table enables keyword search across all episodes.
Three triggers (`episodes_fts_insert`, `episodes_fts_update`, `episodes_fts_delete`)
keep the FTS index automatically in sync with the episodes table.
| Setting | Value | Notes |
|---|---|---|
| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
| Temperature | 0.1 | Low for consistent, deterministic output |
| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
### SQLite Configuration
## Prompt Structure
- `journal_mode = WAL` — non-blocking reads during writes
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs set via `db.pragma()`, not `db.exec()`
The prompt is built by `buildExtractionPrompt()`. It includes:
### Dynamic Updates
1. **System message** — declares the model's role as an entity and relationship extractor
2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
4. **Conversation** — the raw user message and AI response, delimited clearly
Both `updateSession` and `updateProject` build their `SET` clause dynamically
from only the fields passed — prevents partial updates from overwriting fields
that weren't touched.
```
<|im_start|>system
You are a named entity and relationship extractor. You output only valid JSON.
<|im_end|>
<|im_start|>user
Read the conversation below and extract all named entities and the relationships between them.
Entity types: person, place, project, technology, concept, organization
...
Return this exact JSON structure:
{ "entities": [...], "relationships": [...] }
`updateProject` allowlist:
```js
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
Already known entities (use these exact name and type values if the same entity appears):
- "NexusAI" (project)
- "Alice" (person)
--- CONVERSATION ---
User: ...
Assistant: ...
--- END CONVERSATION ---
<|im_end|>
<|im_start|>assistant
```
## Qdrant / Semantic Layer
## Expected JSON Output
Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
| Collection | Purpose |
|---|---|
| `episodes` | Embeddings for individual conversation exchanges |
| `entities` | Embeddings for named entities |
| `summaries` | Embeddings for condensed episode summaries |
All collections use **768-dimension vectors** with **Cosine similarity**,
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
defined in `@nexusai/shared` — not hardcoded here.
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
collection that doesn't already exist at startup — all three collections are
guaranteed to exist before any requests are handled, avoiding race conditions
between the first entity embed and an entity search.
Each collection exposes upsert, search (with optional Qdrant filter), and
delete operations. The `wait: true` flag is used on all writes.
## Embedding Write Path
When a new episode is created:
1. Episode saved to SQLite synchronously — response returned immediately
2. User message + AI response combined: `User: ...\nAssistant: ...`
3. Text sent to embedding service (`POST /embed`)
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
This step is **fire-and-forget** — if embedding fails, the episode is still
saved and searchable via FTS. The error is logged but not surfaced.
> The Qdrant payload stores `sessionId` (the internal integer ID). See
> `memory-isolation.md` for how project-level filtering works.
## Entity Layer
Entities and relationships use upsert semantics with composite unique
constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on relationship foreign keys
After each episode is saved, `extraction.js` automatically extracts named
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
> For full details on the extraction pipeline, prompt format, constrained
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
## Summaries Layer
Session summaries are generated by `orchestration-service/src/services/summarization.js`
after each episode write and stored here via `POST /summaries`. The memory
service is responsible only for CRUD — generation logic lives in orchestration.
> For full details on trigger conditions, prompt format, cumulative updates,
> and ChatML token stripping, see `summarization.md`.
## Project Delete Behaviour
Deleting a project runs as a transaction — it first nulls out `project_id`
on all assigned sessions, then deletes the project. This avoids a foreign
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
```js
const doDelete = db.transaction(() => {
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
});
```json
{
"entities": [
{ "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
{ "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
],
"relationships": [
{
"from": "Alice", "fromType": "person",
"to": "NexusAI", "toType": "project",
"label": "works_on",
"notes": "Alice is the primary developer."
}
]
}
```
For all HTTP endpoints, see `api-routes.md`.
Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
`knows`, `located_in`, `part_of`, `created_by`).
## JSON Parsing
The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
tolerates any preamble or trailing prose the model emits alongside the JSON.
If the match fails or `JSON.parse` throws, the function logs a warning and
returns without writing anything.
## Entity Processing
For each entity in `parsed.entities`:
1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
2. Call `upsertEntity(name, type, notes)`:
- **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
- **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
## Relationship Processing
After all entities are saved, relationships are processed:
1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
3. Call `upsertRelationship(fromId, toId, label, notes)`:
- **Insert**: creates new row with `mention_count = 1`
- **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
Relationships are unidirectional in storage. Bidirectionality is handled at
query time by the graph traversal layer.
## Project Scoping
`projectId` is threaded through from the episode creation call. It is stored
in the Qdrant entity payload, which enables project-scoped entity search in
orchestration. SQLite entities and relationships are global — scoping only
applies at the Qdrant retrieval layer.
## Error Behaviour
All steps after the initial model call are wrapped in a single outer try/catch.
If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
parsed, the function logs at `warn` level and returns. There is no retry logic.
Individual entity embedding failures are caught per-entity and logged at `warn`
level without affecting other entities in the same batch.

View File

@@ -0,0 +1,213 @@
# Knowledge Graph
**Location:** `packages/memory-service/src/graph/index.js`
**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`
**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`
**Consumed by:** Orchestration service context assembly
## Purpose
The knowledge graph transforms NexusAI from "remembers conversations" to
"understands relationships between things." Rather than injecting a flat
list of entity facts into every prompt, orchestration now retrieves a
1-hop subgraph of connected entities and their relationships, giving the
model structured, linked knowledge about people, projects, technologies,
and concepts that have appeared across conversations.
## Schema
### `entity_episodes` (join table)
Tracks which episodes contributed to each entity's knowledge. Defined in
`schema.js` — exists on all installs.
```sql
CREATE TABLE IF NOT EXISTS entity_episodes (
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
PRIMARY KEY (entity_id, episode_id)
);
```
Both FKs cascade on delete — removing an entity or episode automatically
cleans up its join rows.
### New columns on `entities`
Added via migration in `db/index.js`:
| Column | Type | Default | Description |
|---|---|---|---|
| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
### New columns on `relationships`
| Column | Type | Default | Description |
|---|---|---|---|
| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
| `notes` | TEXT | NULL | Relationship context sentence from extraction |
## Entity Promotion Model
Entities are not created equal — some are mentioned once in passing, others
recur across many conversations. `mention_count` is the signal:
- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
The same pattern applies to relationships — `mention_count` rises each time
the same `(from_id, to_id, label)` triple is extracted.
## Graph Traversal
`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
CTE support. No external graph database is needed.
### `getNeighborhood(entityId, depth)`
Traverses the graph from a single entity, following edges in **both directions**,
up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
**SQLite query:**
```sql
WITH RECURSIVE traverse(entity_id, depth) AS (
SELECT ?, 0
UNION
SELECT
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
t.depth + 1
FROM relationships r
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
WHERE t.depth < ?
)
SELECT DISTINCT entity_id FROM traverse
```
`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
cycles — a node already in the traversal set is not re-visited.
After collecting node IDs, two follow-up queries fetch:
- All entity rows for those IDs
- All relationship rows where both `from_id` and `to_id` are in the node set
This ensures edges between neighbors are included even if they aren't on the
traversal path from the seed.
### `getEntityNeighbors(entityIds[])`
Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
2. Deduplicates seeds + neighbors using a JavaScript `Set`
3. Fetches all entity rows and all relationship rows within the combined node set
This is intentionally simpler than the recursive version — orchestration always
uses depth=1, and the bulk query avoids N separate CTE calls.
## Graph-Aware Context Assembly
Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
graph at step 7 of the chat pipeline:
1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
### Prompt Format
`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
```
Here is what you know about entities relevant to this conversation and their connections:
- Alice (person): software engineer working on NexusAI
→ works_on NexusAI (project)
→ knows Bob (person)
- NexusAI (project): AI assistant framework
- Bob (person): Alice's colleague
```
- One line per node: `- {name} ({type}): {notes}`
- Outbound edges indented below: ` → {label} {target_name} ({target_type})`
- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
- Only outbound edges are shown — each relationship appears once, from the `from_id` side
## Project Scoping
The knowledge graph respects project boundaries at the **entry point**, not
during traversal:
- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
Entities are tagged with `projectId` in the Qdrant payload at extraction time.
Entities extracted from non-project sessions have `projectId: null` and only
appear in unfiltered global searches.
## API Reference
### `GET /graph/neighborhood/:entityId`
Returns the neighborhood of a single entity.
**Query params:**
| Param | Default | Max | Description |
|---|---|---|---|
| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
**Response:**
```json
{
"entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
"neighborhood": {
"nodes": [
{ "id": 5, "name": "Alice", "type": "person", "notes": "..." },
{ "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
],
"edges": [
{ "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
]
}
}
```
Returns 404 if the entity does not exist.
### `POST /graph/neighbors`
Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
orchestration — not intended for direct client use.
**Request body:**
```json
{ "entityIds": [5, 8, 12] }
```
**Response:**
```json
{
"nodes": [ ...entity objects... ],
"edges": [ ...relationship objects... ]
}
```
Returns 400 if `entityIds` is missing or empty.
## Constants (`packages/shared/src/config/constants.js`)
| Constant | Value | Description |
|---|---|---|
| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |

View File

@@ -9,8 +9,8 @@
Responsible for all reading and writing of long-term memory. Acts as the
sole interface to both SQLite and Qdrant — no other service accesses these
stores directly. On episode creation, automatically calls the embedding
service to generate and store a vector in Qdrant.
stores directly. On episode creation, automatically triggers entity and
relationship extraction and embeds results into Qdrant.
## Dependencies
@@ -45,19 +45,22 @@ src/
├── semantic/
│ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/
│ ├── index.js # Entity + relationship CRUD
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
│ ├── index.js # Entity + relationship CRUD (upsert, mention tracking)
│ └── extraction.js # Automatic entity + relationship extraction via qwen2.5:3b
├── graph/
│ └── index.js # Knowledge graph traversal (neighborhood queries, recursive CTE)
└── index.js # Express app + all route definitions
```
## SQLite Schema
Seven core tables:
Eight core tables:
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
- **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
@@ -73,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
// Knowledge graph columns:
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
```
New migrations are always appended here — never modify the schema file for
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
### FTS5 Full-Text Search
@@ -117,8 +128,7 @@ defined in `@nexusai/shared` — not hardcoded here.
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
collection that doesn't already exist at startup — all three collections are
guaranteed to exist before any requests are handled, avoiding race conditions
between the first entity embed and an entity search.
guaranteed to exist before any requests are handled.
Each collection exposes upsert, search (with optional Qdrant filter), and
delete operations. The `wait: true` flag is used on all writes.
@@ -143,15 +153,27 @@ saved and searchable via FTS. The error is logged but not surfaced.
Entities and relationships use upsert semantics with composite unique
constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
- `ON DELETE CASCADE` on relationship foreign keys
After each episode is saved, `extraction.js` automatically extracts named
entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
entities **and relationships** from the conversation using `qwen2.5:3b` on
Ollama — fire-and-forget. Each saved entity is also linked to the episode
via the `entity_episodes` join table.
> For full details on the extraction pipeline, prompt format, constrained
> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.
> For the knowledge graph traversal layer, see `knowledge-graph.md`.
## Knowledge Graph Layer
`src/graph/index.js` provides SQLite-based graph traversal over the entities
and relationships tables. Two functions are exposed via HTTP:
- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.
## Summaries Layer
@@ -175,4 +197,4 @@ const doDelete = db.transaction(() => {
});
```
For all HTTP endpoints, see `api-routes.md`.
For all HTTP endpoints, see `api-routes.md`.

View File

@@ -42,9 +42,10 @@ src/
│ ├── inference.js # HTTP client for inference service
│ ├── embedding.js # HTTP client for embedding service
│ ├── qdrant.js # HTTP client for Qdrant (direct vector search)
│ ├── graph.js # HTTP client for memory-service graph endpoints
│ └── summarization.js # Session summarisation — triggers after each episode
├── chat/
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
│ └── index.js # Core pipeline — context assembly, graph expansion, auto-naming
├── config/
│ └── settings.js # Settings load/save — reads/writes data/settings.json
├── routes/
@@ -71,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|---|---|---|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
| `semanticLimit` | 5 | Semantic search results injected into prompt |
| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
| `temperature` | 0.7 | Inference temperature |
| `repeatPenalty` | 1.1 | Repeat token penalty |
@@ -100,24 +103,35 @@ difference is how the inference response is delivered to the client.
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
5. **Semantic search** — embed user message, query Qdrant for similar past
episodes. Deduplicated against recent episodes. Non-critical.
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
Both paths are filtered against `recentIds` before fusion. FTS is scoped
to the current session or all project sessions. If `keywordWeight` is `0`,
the FTS call is skipped entirely. Non-critical — failures fall back to
whichever strategy succeeded.
6. **Entity search** — query `entities` Qdrant collection filtered by
`projectId`. Non-project sessions receive no entity context. Non-critical.
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
point ID equals the SQLite entity ID). Non-critical.
7. **Prompt assembly** — combine system prompt, entity context, semantic
episodes, recent episodes, and user message.
7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
`{ nodes, edges }` — entity objects plus the relationships connecting them.
If no entities were found or the graph call fails, falls back to flat entity
list (no edges). Non-critical.
8. **Inference** — send to inference service. `/chat` awaits full response;
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
recent episodes, and user message.
9. **Inference** — send to inference service. `/chat` awaits full response;
`/chat/stream` pipes SSE chunks to the client.
9. **Episode write** — write exchange back to memory with `projectId`.
10. **Episode write** — write exchange back to memory with `projectId`.
10. **Summarisation trigger**`triggerSummary(session, allEpisodes)` called
11. **Summarisation trigger**`triggerSummary(session, allEpisodes)` called
fire-and-forget. See `summarization.md` for full details.
11. **Auto-naming** — on first message with no session name, fires a secondary
12. **Auto-naming** — on first message with no session name, fires a secondary
inference call (max 20 tokens, temperature 0.3) to generate a session name.
### Prompt Structure
@@ -125,8 +139,9 @@ difference is how the inference response is delivered to the client.
```
[Resolved system prompt]
Here is what you know about entities relevant to this conversation:
Here is what you know about entities relevant to this conversation and their connections:
- {name} ({type}): {notes}
→ {label} {neighbor_name} ({neighbor_type})
---
Here are some relevant memories from earlier conversations:
User: {past user message}
@@ -141,6 +156,12 @@ User: {current message}
Assistant:
```
The entity block renders the full graph neighborhood — seed entities matched
by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
shows its `notes` and any outbound relationships with their targets. Neighbor
nodes that have no outbound edges within the subgraph appear without connection
lines.
## Summarisation
After each episode write, `triggerSummary` is called fire-and-forget. It
@@ -199,4 +220,7 @@ handle /health* { reverse_proxy localhost:4000 }
After updating: `caddy reload --config /path/to/Caddyfile`
For all HTTP endpoints, see `api-routes.md`.
> Note: `/graph` routes are on the memory-service (port 3002) and are called
> internally by orchestration — they do not need a Caddy entry.
For all HTTP endpoints, see `api-routes.md`.

View File

@@ -0,0 +1,153 @@
# Retrieval Fusion
**Implementation:** `packages/orchestration-service/src/chat/index.js`
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
## Purpose
Rather than relying solely on Qdrant vector similarity (which finds semantically
related content but misses exact keyword matches) or FTS5 keyword search alone
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
merges the ranked results from both strategies into a single better-ranked list.
Episodes that rank highly in **both** lists score highest. An episode that is
the top semantic match but irrelevant by keyword, or vice versa, scores lower
than one that satisfies both.
## How RRF Works
For each episode `d`, its fused score is:
```
RRF(d) = w_semantic / (k + rank_semantic(d))
+ w_keyword / (k + rank_keyword(d))
```
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
- `k = 60` — smoothing constant (standard; not exposed in settings)
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
Setting a weight to `0` removes that strategy's contribution entirely. Setting
`keywordWeight` to `0` also short-circuits the FTS network call.
## Architecture
Fusion lives in orchestration — the service already coordinates multiple data
sources, and fusion is a retrieval strategy, not a storage concern.
```
getFusedEpisodes()
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
│ (existing path, unchanged)
└── getFTSResults() — memory-service /episodes/search → full rows directly
(skipped entirely if keywordWeight == 0)
fuseEpisodeResults() — pure RRF, no I/O
fusedEpisodes[] — top semanticLimit episodes by RRF score
```
### Data Shape Consistency
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
the same shape — and both must be filtered against `recentIds` first:
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
FTS requests `semanticLimit * 2` results to provide headroom for the
`recentIds` filter without under-serving the fusion.
## FTS Session Scoping
Without scoping, FTS5 searches across all episodes in the database. For
context assembly, results must be constrained to the current session or
project session pool — the same scope used for Qdrant semantic search.
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
optional `sessionIds` array. When provided, the SQL becomes:
```sql
SELECT e.* FROM episodes e
JOIN episodes_fts fts ON e.id = fts.rowid
WHERE episodes_fts MATCH ?
AND e.session_id IN (?, ?, ...)
ORDER BY rank
LIMIT ?
```
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
In orchestration, `ftsSessionIds` is set to:
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
- `[session.id]` — otherwise (single session only)
This mirrors the Qdrant scoping logic exactly.
## `fuseEpisodeResults` — Implementation Detail
```js
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
const k = RETRIEVAL.RRF_K; // 60
const scores = new Map(); // episode.id → { episode, score }
// Score semantic results (already filtered against recentIds)
semanticEps.forEach((ep, i) => {
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
});
// Score + merge keyword results (already filtered against recentIds)
keywordEps.forEach((ep, i) => {
const contrib = keywordWeight / (k + i + 1);
if (scores.has(ep.id)) {
scores.get(ep.id).score += contrib; // appears in both — sum scores
} else if (contrib > 0) {
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
}
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
});
return [...scores.values()]
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map(({ episode }) => episode);
}
```
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
the result set with a score of 0 when `keywordWeight` is 0 — verified by
the test suite.
## Settings
| Setting | Default | Range | Description |
|---|---|---|---|
| `semanticWeight` | 1.0 | 05 | Weight applied to Qdrant semantic results |
| `keywordWeight` | 0 | 05 | Weight applied to FTS5 keyword results. `0` = disabled |
Both are readable via `GET /settings` and writable via `PATCH /settings`
without a service restart. Changes take effect on the next chat request.
**To enable keyword search:**
```bash
curl -X PATCH http://localhost:4000/settings \
-H "Content-Type: application/json" \
-d '{"keywordWeight": 1.0}'
```
**To favour keyword matches over semantic:**
```bash
curl -X PATCH http://localhost:4000/settings \
-H "Content-Type: application/json" \
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
```
## Constants (`packages/shared/src/config/constants.js`)
| Constant | Value | Description |
|---|---|---|
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |

3
package-lock.json generated
View File

@@ -4224,8 +4224,7 @@
"dependencies": {
"@nexusai/shared": "^1.0.0",
"dotenv": "^17.4.0",
"express": "^5.2.1",
"ollama": "^0.6.3"
"express": "^5.2.1"
}
},
"packages/inference-service": {

View File

@@ -1,5 +1,6 @@
import { API_DEFAULTS } from "../config/constants";
const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';
// ── Sessions ────────────────────────────────────────────────

View File

@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
import { fetchSessions, deleteSession } from '../api/orchestration';
import { CLIENT_DEFAULTS } from '../config/constants';
const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;
export default function AllChatsView({ onSelectSession, onBack, projects }) {

View File

@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
import ProjectModal from './ProjectModal';
import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';
export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
const [projects, setProjects] = useState([]);
const [loading, setLoading] = useState(true);

View File

@@ -3,6 +3,7 @@ import { useSettings } from '../hooks/useSettings';
import { useModels } from '../hooks/useModels';
import { getServiceHealth } from '../api/orchestration';
export default function SettingsView({ onNavigate, onBack, modelProps }) {
const { settings, saveSetting, saving } = useSettings();

View File

@@ -1,6 +1,7 @@
import { useState, useEffect, useCallback } from 'react';
import { fetchProjects } from '../api/orchestration';
export function useProjects() {
const [projects, setProjects] = useState([]);

View File

@@ -0,0 +1,64 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
## Running This Service
```bash
npm run embedding # From repo root
npm -w packages/embedding-service run dev # With --watch
```
Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
## Single-File Service
The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `PORT` | `3003` | Port to listen on |
| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
## Ollama API Details
Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
```json
{ "model": "nomic-embed-text", "input": "text to embed" }
```
Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
## Batch Endpoint
`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
## Error Responses
| Condition | Status | Notes |
|---|---|---|
| Missing/empty `text` | 400 | |
| Ollama call fails | 502 | Upstream failure — correct status |
| Empty `texts` array | 400 | |
## Known Issue
The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
## API Endpoints
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | Static response — does not verify Ollama is reachable |
| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |

View File

@@ -9,7 +9,6 @@
"dependencies": {
"@nexusai/shared": "^1.0.0",
"dotenv": "^17.4.0",
"express": "^5.2.1",
"ollama": "^0.6.3"
"express": "^5.2.1"
}
}

View File

@@ -3,7 +3,7 @@ const express = require('express');
const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');
const app = express();
app.use(express.json());
app.use(express.json({ limit: '1mb' })); // limit request body to 1mb to prevent abuse - embedding requests should be small
const PORT = getEnv('PORT', PORTS.EMBEDDING);
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL);
@@ -14,7 +14,8 @@ async function embedText(text) {
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model: EMBED_MODEL, input: text })
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
signal: AbortSignal.timeout(30_000),
});
if (!res.ok) {
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
app.post('/embed', async (req, res) => {
const { text } = req.body;
if (!text || typeof text !== 'string' || text.trim() === '') {
return res.status(400).json({ error: 'text is required and must be empty' });
return res.status(400).json({ error: 'text is required and must not be empty' });
}
try {
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
}
try {
//sequential embedding for now, Ollama doesn't natively parallize embeddings
const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
if (invalid !== -1)
return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
const embeddings = [];
for (const text of texts) {
embeddings.push(await embedText(text.trim()));

View File

@@ -0,0 +1,75 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
## Running This Service
```bash
npm run inference # From repo root
npm -w packages/inference-service run dev # With --watch
```
Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
## Provider Pattern
`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
| `INFERENCE_PROVIDER` | Module | Backend |
|---|---|---|
| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
An unknown provider throws immediately at startup — fail-fast, not at request time.
Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `PORT` | `3001` | Port to listen on |
| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
## Options Resolution
Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
## Streaming Chunk Format
The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
**llama.cpp** yields:
- Per-token: `{ response: delta, done: false }`
- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
## Known Issue: `maxTokens` Missing from Streaming Route
`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
## SSE Format (route → caller)
```
data: {"response":"Hello"} ← per token
data: {"response":" world"}
data: {"done":true,"model":"...","tokenCount":42} ← final metadata
data: [DONE] ← sentinel
```
## API Endpoints
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | Returns `{ service, status, provider, model }` |
| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |

View File

@@ -1,10 +1,10 @@
require ('dotenv').config();
const express = require('express');
const {getEnv, PORTS, OLLAMA} = require('@nexusai/shared');
const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
const inferenceRouter = require('./routes/inference');
const app = express();
app.use(express.json());
app.use(express.json({ limit: '8mb' })); // prompts include full context window
const PORT = getEnv('PORT', PORTS.INFERENCE);
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
@@ -24,5 +24,5 @@ app.use('/', inferenceRouter);
// Start the server
app.listen(PORT, () => {
console.log(`Inference Service is running on port ${PORT}`);
logger.info(`Inference Service is running on port ${PORT}`);
});

View File

@@ -1,4 +1,4 @@
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS } = require("@nexusai/shared");
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS, logger } = require("@nexusai/shared");
const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
@@ -89,7 +89,7 @@ async function* completeStream(prompt, options = {}) {
}
}
console.log('[llamacpp] finalTokenCount:', finalTokenCount);
logger.info('[llamacpp] finalTokenCount:', finalTokenCount);
yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
}

View File

@@ -57,7 +57,16 @@ async function* completeStream(prompt, options = {} ) {
});
for await (const chunk of stream) {
yield chunk;
if (chunk.done) {
yield {
response: '',
done: true,
model: chunk.model,
tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
};
} else {
yield chunk;
}
}
}

View File

@@ -1,5 +1,6 @@
const { Router } = require('express');
const { complete, completeStream } = require('../infer');
const { logger } = require('@nexusai/shared');
const router = Router();
@@ -15,14 +16,14 @@ router.post('/complete', async (req, res) => {
const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
res.json(result);
} catch (error) {
console.error('[Inference] Completion error:', error.message);
res.status(500).json({ error: error.message });
logger.error('[Inference] Completion error:', error.message);
res.status(500).json({ error: 'Inference failed', detail: error.message });
}
});
// Streaming completion endpoint - sends partial responses as they arrive
router.post('/complete/stream', async (req, res) => {
const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
@@ -34,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
let lastModel = model;
let tokenCount = 0;
for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
if (chunk.response) {
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
}
@@ -42,7 +43,7 @@ router.post('/complete/stream', async (req, res) => {
// capture final metadata from the done signal
lastModel = chunk.model ?? lastModel;
tokenCount = chunk.tokenCount ?? tokenCount;
console.log('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
logger.info('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
}
}
@@ -51,7 +52,7 @@ router.post('/complete/stream', async (req, res) => {
res.write('data: [DONE]\n\n');
} catch (err) {
console.error('[Inference] Streaming error:', err.message);
logger.error('[Inference] Streaming error:', err.message);
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
} finally {
res.end();

View File

@@ -0,0 +1,114 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the dual-store memory model.
## Running This Service
```bash
npm run memory # From repo root (node src/index.js)
npm -w packages/memory-service run dev # With --watch
```
Default port: **3002**. Requires Qdrant and the embedding-service to be reachable on startup.
## SQLite Schema
`src/db/schema.js` is the source of truth for the data model. Key schema facts:
- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
**New columns on `entities` (added via migration):**
- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
- `source TEXT DEFAULT 'extraction'``'extraction'` or `'manual'`
- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
**New columns on `relationships` (added via migration):**
- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
- `notes TEXT` — relationship context sentence from extraction
## Async Pipeline: Episode Creation
`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
## Entity Extraction Details
`src/entities/extraction.js`:
- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
## Knowledge Graph
`src/graph/index.js` provides two SQLite traversal functions:
- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
> For full design rationale and usage see `docs/services/knowledge-graph.md`.
## Summarization Strategy
`src/summarization/project.js`:
- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
- Uses temp 0.2 and `num_predict 1200`.
## Qdrant Client
`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
## API Endpoints Quick Reference
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | Static response, no dependency checks |
| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
| GET | `/entities/by-type/:type` | All entities of given type |
| GET/DELETE | `/entities/:id` | |
| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
| GET | `/entities/:id/relationships` | Outbound only |
| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
| GET/POST | `/projects` | POST requires non-empty `name` |
| GET/PATCH/DELETE | `/projects/:id` | |
| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
| GET | `/projects/:id/overview` | Returns null (not 404) if no overview exists |
| GET | `/projects/:id/summaries` | All summaries for project |
| POST | `/summaries` | Requires `content` + at least one of `sessionId`/`projectId` |
| GET | `/sessions/:id/summaries` | |
| PATCH/DELETE | `/summaries/:id` | |

View File

@@ -1,6 +1,6 @@
const Database = require('better-sqlite3');
const schema = require('./schema');
const {getEnv, SQLITE } = require('@nexusai/shared');
const {getEnv, SQLITE, logger } = require('@nexusai/shared');
let db; // Declare db variable in a scope accessible to all functions
@@ -54,15 +54,20 @@ function getDB() {
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
} catch {}
try {
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
} catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
// Sync FTS index with any existing episodes data
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
SELECT id, user_message, ai_response FROM episodes`);
console.log(`Connected to SQLite database at ${path}`);
logger.info(`Connected to SQLite database at ${path}`);
}
return db;
}

View File

@@ -38,6 +38,20 @@ const schema = `
UNIQUE(from_id, to_id, label)
);
CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_id);
CREATE TABLE IF NOT EXISTS entity_episodes (
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
PRIMARY KEY (entity_id, episode_id)
);
CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity ON entity_episodes(entity_id);
CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
CREATE TABLE IF NOT EXISTS projects (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,

View File

@@ -1,14 +1,18 @@
const semantic = require('../semantic')
const { getEnv, SERVICES, formatEpisodeText, ENTITIES } = require('@nexusai/shared');
const { upsertEntity } = require('./index');
const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
const ENTITY_TYPES = ENTITIES.TYPES;
const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];
// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
// or other model, this format will need to change — most alternatives use either
// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
const knownBlock = knownEntities.length > 0
? [
@@ -20,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
return [
'<|im_start|>system',
'You are a named entity extractor. You output only valid JSON.',
'You are a named entity and relationship extractor. You output only valid JSON.',
'<|im_end|>',
'<|im_start|>user',
'Read the conversation below and extract every named entity mentioned.',
`Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
'For each entity found, provide: name, type, and a one-sentence notes field.',
'Return your answer as: { "entities": [ ... ] }',
'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
'For each entity found, provide:',
' "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
'Read the conversation below and extract all named entities and the relationships between them.',
`Entity types: ${ENTITY_TYPES.join(', ')}`,
'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
'Use "person" only for real people',
'For each entity provide:',
' "name": short proper noun only (max 4 words)',
' "type": one of the valid types',
' "notes": one specific sentence about this entity based on the conversation (not generic)',
' "notes": one specific sentence about this entity based on the conversation',
'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
'Only include relationships between entities you have listed above.',
'Return this exact JSON structure:',
'{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
'',
knownBlock,
'--- CONVERSATION ---', // clear delimiter helps smaller models
'--- CONVERSATION ---',
`User: ${userMessage}`,
`Assistant: ${aiResponse}`,
'--- END CONVERSATION ---',
@@ -58,8 +65,8 @@ async function embedEntity(entity) {
return data.embedding;
}
async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
console.log('[entities] Extraction triggered')
async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
logger.info('[entities] Extraction triggered')
try {
// Fetch existing entities to guide the model toward consistent name/type pairs
const db = require('../db').getDB();
@@ -80,6 +87,7 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
num_predict: ENTITIES.NUM_PREDICT,
},
}),
signal: AbortSignal.timeout(60_000),
});
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
@@ -87,27 +95,41 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
const data = await res.json();
const raw = data.response?.trim() ?? '';
const parsed = JSON.parse(raw);
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
if (entities.length === 0) {
console.log('[entities] No entities found in this exchange — skipping');
return; // not an error, just nothing to extract
const jsonMatch = raw.match(/\{[\s\S]*\}/);
if (!jsonMatch) {
logger.warn('[entities] No JSON object found in response');
logger.debug('[entities] Raw response was:', raw);
return;
}
if (!Array.isArray(entities)) throw new Error('Response was not a JSON array');
let parsed;
try {
parsed = JSON.parse(jsonMatch[0]);
} catch (err) {
logger.warn('[entities] Failed to parse extraction response:', err.message);
logger.debug('[entities] Raw response was:', raw);
return;
}
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
if (entities.length === 0) {
logger.debug('[entities] No entities found in this exchange — skipping');
return;
}
// Map of "name::type" → saved entity, used for relationship resolution below
const entityMap = new Map();
let saved = 0;
for (const { name, type, notes } of entities) {
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
if (IGNORED_NAMES.includes(name.toLowerCase())) continue;
const entity = upsertEntity(name, type, notes ?? null);
console.log('[entities] Upserted entity:', entity);
entityMap.set(`${name}::${type}`, entity);
logger.info('[entities] Upserted entity:', entity);
if (episodeId) linkEntityToEpisode(entity.id, episodeId);
// Embed and upsert to Qdrant fire-and-forget
embedEntity(entity)
.then(vector => semantic.upsertEntity(entity.id, vector, {
name: entity.name,
@@ -116,17 +138,34 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
projectId: projectId ?? null,
}))
.catch(err => {
console.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
logger.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
});
saved++;
}
if (saved > 0) console.log(`[entities] Extracted and stored ${saved} entities`);
if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
// Process extracted relationships — both entities must have been saved above
const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
let relSaved = 0;
for (const { from, fromType, to, toType, label, notes } of relationships) {
if (!from || !fromType || !to || !toType || !label) continue;
const fromEntity = entityMap.get(`${from}::${fromType}`);
const toEntity = entityMap.get(`${to}::${toType}`);
if (!fromEntity || !toEntity) continue;
upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
relSaved++;
}
if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);
} catch (err) {
// Non-critical — log and move on, episode is already saved
console.warn('[entities] Extraction failed:', err.message);
logger.warn('[entities] Extraction failed:', err.message);
}
}

View File

@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
/******* Entities ********/
// Upsert an entity - insert or update if (name, type) already exists
function upsertEntity(name, type, notes = null, metadata = null) {
function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
const db = getDB();
const stmt = db.prepare(`
INSERT INTO entities (name, type, notes, metadata)
VALUES (?, ?, ?, ?)
ON CONFLICT(name, type) DO UPDATE SET
notes = COALESCE(entities.notes, excluded.notes),
metadata = excluded.metadata,
updated_at = unixepoch()
`);
const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
const stmt = db.prepare(`
INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
VALUES (?, ?, ?, ?, ?, unixepoch())
ON CONFLICT(name, type) DO UPDATE SET
-- First extraction wins: notes are never overwritten once set.
-- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
-- quality scoring exists, a higher-confidence extraction should be able
-- to replace stale notes rather than being silently dropped.
notes = COALESCE(entities.notes, excluded.notes),
metadata = excluded.metadata,
mention_count = entities.mention_count + 1,
last_seen_at = unixepoch(),
updated_at = unixepoch()
`);
stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
return getEntityByNameType(name, type);
}
@@ -40,15 +45,17 @@ function deleteEntity(id) {
/********* Relationships *********/
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
function upsertRelationship(fromId, toId, label, metadata = null){
function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
const db = getDB();
const stmt = db.prepare(`
INSERT INTO relationships (from_id, to_id, label, metadata)
VALUES (?, ?, ?, ?)
ON CONFLICT(from_id, to_id, label) DO NOTHING
INSERT INTO relationships (from_id, to_id, label, notes, metadata)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT(from_id, to_id, label) DO UPDATE SET
mention_count = relationships.mention_count + 1,
-- First extraction wins for notes — same policy as entities.
notes = COALESCE(relationships.notes, excluded.notes)
`);
const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
return getRelationship(fromId, toId, label);
}
@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
}
// Retrive all relationships originating from a given entity
function getRelationshipsByEntity(entityId) {
function getOutboundRelationships(entityId) {
const db = getDB();
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
}
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
}
function linkEntityToEpisode(entityId, episodeId) {
const db = getDB();
db.prepare(`
INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
VALUES (?, ?)
`).run(entityId, episodeId);
}
module.exports = {
upsertEntity,
getEntity,
getEntitiesByType,
getEntityByNameType,
deleteEntity,
linkEntityToEpisode,
upsertRelationship,
getRelationship,
getRelationshipsByEntity,
getOutboundRelationships,
deleteRelationship
}

View File

@@ -1,5 +1,5 @@
const {getDB} = require('../db');
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES } = require('@nexusai/shared');
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES, logger } = require('@nexusai/shared');
const semantic = require('../semantic');
const { extractAndStoreEntities } = require('../entities/extraction')
@@ -125,10 +125,10 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
sessionId: episode.session_id,
createdAt: episode.created_at
}))
.catch(err => console.error(`Failed to embed episode ${episode.id}:`, err.message));
.catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));
extractAndStoreEntities(userMessage, aiResponse, projectId)
.catch(err => console.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
return episode;
@@ -163,22 +163,32 @@ function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
ORDER BY created_at DESC
LIMIT ?
`);
return stmt.all(limit).map(parseRow);
return stmt.all(sessionId, limit).map(parseRow);
}
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
// FTS5 full-text search across all episodes
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
const db = getDB();
const stmt = db.prepare(`
const safeQuery = `"${query.replace(/"/g, '""')}"`;
if (sessionIds && sessionIds.length > 0) {
const ph = sessionIds.map(() => '?').join(',');
return db.prepare(`
SELECT e.* FROM episodes e
JOIN episodes_fts fts ON e.id = fts.rowid
WHERE episodes_fts MATCH ?
AND e.session_id IN (${ph})
ORDER BY rank
LIMIT ?
`).all(safeQuery, ...sessionIds, limit).map(parseRow);
}
return db.prepare(`
SELECT e.* FROM episodes e
JOIN episodes_fts fts ON e.id = fts.rowid
WHERE episodes_fts MATCH ?
ORDER BY rank
LIMIT ?
`);
return stmt.all(query, limit).map(parseRow);
`).all(safeQuery, limit).map(parseRow);
}
// Deletes an episode by its ID
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
const res = await fetch(`${url}/embed`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ text })
body: JSON.stringify({ text }),
signal: AbortSignal.timeout(30_000),
})
if (!res.ok) {

View File

@@ -0,0 +1,77 @@
const { getDB } = require('../db');
const { parseRow, ENTITIES } = require('@nexusai/shared');
// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
const db = getDB();
const nodeRows = db.prepare(`
WITH RECURSIVE traverse(entity_id, depth) AS (
SELECT ?, 0
UNION
SELECT
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
t.depth + 1
FROM relationships r
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
WHERE t.depth < ?
)
SELECT DISTINCT entity_id FROM traverse
`).all(entityId, depth);
const nodeIds = nodeRows.map(r => r.entity_id);
if (nodeIds.length === 0) return { nodes: [], edges: [] };
const ph = nodeIds.map(() => '?').join(',');
const nodes = db.prepare(
`SELECT * FROM entities WHERE id IN (${ph})`
).all(...nodeIds).map(parseRow);
const edges = db.prepare(
`SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
).all(...nodeIds, ...nodeIds).map(parseRow);
return { nodes, edges };
}
// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
function getEntityNeighbors(entityIds) {
if (!entityIds.length) return { nodes: [], edges: [] };
const db = getDB();
const ph = entityIds.map(() => '?').join(',');
// entityIds appears three times — once for the CASE (finding the neighbor),
// and once each for the FROM and TO sides of the WHERE clause
const neighborRows = db.prepare(`
SELECT DISTINCT
CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
FROM relationships
WHERE from_id IN (${ph}) OR to_id IN (${ph})
`).all(...entityIds, ...entityIds, ...entityIds);
const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
const allPh = allIds.map(() => '?').join(',');
const nodes = db.prepare(
`SELECT * FROM entities WHERE id IN (${allPh})`
).all(...allIds).map(parseRow);
const edges = db.prepare(
`SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
).all(...allIds, ...allIds).map(parseRow);
return { nodes, edges };
}
// Returns episode IDs linked to any of the given entity IDs via entity_episodes
function getEpisodeIdsByEntities(entityIds) {
if (!entityIds.length) return [];
const db = getDB();
const ph = entityIds.map(() => '?').join(',');
return db.prepare(
`SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
).all(...entityIds).map(r => r.episode_id);
}
module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };

View File

@@ -1,17 +1,18 @@
require ('dotenv').config();
const express = require('express');
const {getEnv, PORTS, EPISODIC} = require('@nexusai/shared');
const {getEnv, PORTS, EPISODIC, logger} = require('@nexusai/shared');
const { getDB } = require('./db');
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
const { generateAndStoreProjectSummary } = require('./summarization/project');
const graph = require('./graph');
const episodic = require('./episodic');
const semantic = require('./semantic');
const entities = require('./entities');
const app = express();
app.use(express.json());
app.use(express.json({ limit: '2mb' }));
const PORT = getEnv('PORT', PORTS.MEMORY);
@@ -19,8 +20,8 @@ const PORT = getEnv('PORT', PORTS.MEMORY);
const db = getDB();
semantic.initCollections()
.then(() => console.log(`QDrant collections ready`))
.catch(err => console.error(`QDrant initialization error:`, err.message));
.then(() => logger.info(`QDrant collections ready`))
.catch(err => logger.error(`QDrant initialization error:`, err.message));
// Health check endpoint
app.get('/health', (req, res) => {
@@ -80,7 +81,7 @@ app.patch('/sessions/by-external/:externalId', (req, res) => {
const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
res.json(session);
} catch (err) {
res.status(500).json({error: err.message });
res.status(500).json({ error: 'Failed to update session', detail: err.message });
}
});
@@ -130,10 +131,12 @@ app.get('/episodes', (req, res) => {
// Search MUST come before /:id — otherwise 'search' gets captured as an id
app.get('/episodes/search', (req, res) => {
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
const results = episodic.searchEpisodes(q, Number(limit));
res.json(results);
const parsedSessionIds = sessionIds
? sessionIds.split(',').map(Number).filter(Boolean)
: null;
res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
});
app.get('/episodes/:id', (req, res) => {
@@ -158,7 +161,7 @@ app.delete('/episodes/:id', (req, res) => {
episodic.deleteEpisode(id);
semantic.deleteEpisode(id) // fire-and-forget
.catch(err => console.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
.catch(err => logger.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
res.status(204).send();
});
@@ -202,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {
// Upsert a relationship between two entities
app.post('/relationships', (req, res) => {
const {fromId, toId, label, metadata } = req.body;
const { fromId, toId, label, notes, metadata } = req.body;
if (!fromId || !toId || !label) {
return res.status(400).json({ error: 'fromId, toId and label are required' });
}
const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
res.status(201).json(relationship);
});
// Get all relationships for a given entity ID
app.get('/entities/:id/relationships', (req, res) => {
res.json(entities.getRelationshipsByEntity(req.params.id));
res.json(entities.getOutboundRelationships(req.params.id));
});
// Delete a specific relationship
@@ -225,6 +228,37 @@ app.delete('/relationships', (req, res) => {
res.status(204).send();
})
/********************************* */
/********** Graph Routes ********** */
/********************************* */
// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
app.get('/graph/neighborhood/:entityId', (req, res) => {
const entity = entities.getEntity(req.params.entityId);
if (!entity) return res.status(404).json({ error: 'Entity not found' });
const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
res.json({ entity, neighborhood });
});
// Bulk 1-hop neighborhood — body: { entityIds: [...] }
app.post('/graph/neighbors', (req, res) => {
const { entityIds } = req.body;
if (!Array.isArray(entityIds) || entityIds.length === 0) {
return res.status(400).json({ error: 'entityIds array is required' });
}
res.json(graph.getEntityNeighbors(entityIds.map(Number)));
});
app.post('/episodes/by-entities', (req, res) => {
const { entityIds } = req.body;
if (!Array.isArray(entityIds) || entityIds.length === 0) {
return res.status(400).json({ error: 'entityIds array is required' });
}
res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
});
/*********************************** */
/********** Project Routes ********** */
/*********************************** */
@@ -235,7 +269,7 @@ app.post('/projects', (req, res) => {
try {
res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to create project', detail: err.message });
}
});
@@ -255,7 +289,7 @@ app.post('/projects/:id/summarize', async (req, res) => {
if (err.message.includes('No session summaries or episodes')) {
return res.status(422).json({ error: err.message });
}
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to generate project summary', detail: err.message });
}
});
@@ -310,7 +344,7 @@ app.post('/summaries', (req, res) => {
const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
res.status(201).json(summary);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to create summary', detail: err.message });
}
});
@@ -338,5 +372,5 @@ app.delete('/summaries/:id', (req, res) => {
/********** Start Server ********** */
/********************************** */
app.listen(PORT, () => {
console.log(`Memory Service is running on port ${PORT}`);
logger.info(`Memory Service is running on port ${PORT}`);
});

View File

@@ -1,5 +1,5 @@
const {QdrantClient} = require('@qdrant/js-client-rest');
const {QDRANT, COLLECTIONS, getEnv} = require('@nexusai/shared');
const {QDRANT, COLLECTIONS, getEnv, logger} = require('@nexusai/shared');
let client;
@@ -24,9 +24,9 @@ async function initCollections() {
distance: QDRANT.DISTANCE_METRIC
}
});
console.log(`Created Qdrant collection: ${name}`);
logger.info(`Created Qdrant collection: ${name}`);
} else {
console.log(`Qdrant collection already exists: ${name}`);
logger.info(`Qdrant collection already exists: ${name}`);
}
}
}

View File

@@ -0,0 +1,156 @@
# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
## Running This Service
```bash
npm run orchestration # From repo root (node src/index.js)
npm -w packages/orchestration-service run dev # With --watch
```
Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
## Context Assembly (`src/chat/index.js`)
`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
- No project: `must: [sessionId == this session]`
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
- Dedup against recent episode IDs before including.
5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
## Graph Context Format
`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
```
- Alice (person): software engineer working on NexusAI
→ works_on NexusAI (project)
→ knows Bob (person)
- NexusAI (project): AI assistant framework
- Bob (person): Alice's colleague
```
Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
## System Prompt Resolution
Priority from highest to lowest:
1. `project.system_prompt` (stored on the project row in memory-service)
2. `settings.systemPrompt` (saved in `data/settings.json`)
3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
## Settings (`src/config/settings.js`)
Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
| Field | Constraint |
|---|---|
| `recentEpisodeLimit` | integer, 120 |
| `semanticLimit` | integer, 120 |
| `scoreThreshold` | number, 01 |
| `temperature` | number, 02 |
| `repeatPenalty` | number, 12 |
| `topP` | number, 01 |
| `topK` | integer, 1100 |
| `modelsFolderPath` | path must exist and be readable |
| `systemPrompt` | string (trimmed); `null` reverts to shared default |
`data/settings.json` is created on first save. Parent directories are created if missing.
## Streaming SSE (`src/chat/index.js` — `chatStream`)
The route sets SSE headers and delegates to `chatStream`, which:
1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
## Fire-and-Forget Tasks
After every successful chat turn:
- **Summarization** (`services/summarization.js``triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
- **Auto-naming** (`chat/index.js``autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
Both tasks catch all errors and log warnings without surfacing to the client.
## Summarization Recency Guard
`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
## Qdrant Calls (Direct, Not Via Memory-Service)
`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
## Retrieval Fusion (`src/chat/index.js`)
Three functions handle fusion — all pure or lightly async, all non-critical:
- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
## Graph Service Client (`src/services/graph.js`)
Thin HTTP client for memory-service graph endpoints. One function:
- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
## Models Endpoint
`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
## Health Check
`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
## Background Model (qwen2.5:3b)
Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
## API Endpoints Quick Reference
| Method | Path | Notes |
|---|---|---|
| GET | `/health` | Returns service URLs |
| GET | `/health/services` | Parallel status of all dependencies |
| POST | `/chat` | Blocking completion |
| POST | `/chat/stream` | SSE streaming |
| GET/PATCH | `/settings` | Persistent settings |
| GET | `/models` | `.gguf` file scan |
| GET | `/models/props` | llama-server model info |
| GET | `/sessions` | Delegates to memory-service |
| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
| DELETE | `/sessions/:sessionId` | |
| GET | `/episodes` | Delegates; supports `q` for FTS |
| DELETE | `/episodes/:id` | Delegates |
| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
| GET | `/summaries/project/:projectId/overview` | |
| GET | `/summaries/session/:sessionId` | Resolves external ID first |
| GET | `/summaries/project/:projectId` | |

View File

@@ -2,34 +2,32 @@ const memory = require("../services/memory");
const inference = require("../services/inference");
const embedding = require("../services/embedding");
const qdrant = require("../services/qdrant");
const { ORCHESTRATION } = require("@nexusai/shared");
const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
const appSettings = require("../config/settings");
const {triggerSummary} = require('../services/summarization')
const graph = require('../services/graph');
function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
if (entities.length > 0) {
parts.push(
"Here is what you know about entities relevant to this conversation:",
);
for (const e of entities) {
parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
if (graphText) {
parts.push("Here is what you know about entities relevant to this conversation and their connections:");
parts.push(graphText);
parts.push("---");
}
parts.push("---");
}
if (semanticEpisodes.length > 0) {
parts.push("Here are some relevant memories from earlier conversations:");
for (const ep of semanticEpisodes) {
if (selected.length > 0) {
parts.push("Relevant memories from earlier conversations:");
for (const ep of selected) {
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
}
parts.push("---");
}
if (recentEpisodes.length > 0) {
parts.push(`Here are some relevant memories from your past conversations:`);
for (const ep of recentEpisodes) {
if (guaranteed.length > 0) {
parts.push("Recent conversation history (most recent exchanges):");
for (const ep of guaranteed) {
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
}
parts.push("--- End of recent memories ---\n");
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
].join("\n");
}
function formatGraphContext(nodes, edges) {
if (!nodes.length) return null;
const nodeMap = new Map(nodes.map(n => [n.id, n]));
// Build outbound adjacency
const outbound = new Map(nodes.map(n => [n.id, []]));
for (const edge of edges) {
if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
const target = nodeMap.get(edge.to_id);
outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
}
}
return nodes.map(n => {
const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
for (const conn of outbound.get(n.id) ?? []) lines.push(`${conn}`);
return lines.join('\n');
}).join('\n');
}
async function autoNameSession(externalId, userMessage, aiResponse) {
try {
const prompt = buildNamingPrompt(userMessage, aiResponse);
@@ -64,12 +84,12 @@ async function autoNameSession(externalId, userMessage, aiResponse) {
const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
if (name) {
await memory.updateSession(externalId, { name });
console.log(
logger.info(
`[orchestration] Auto-named session "${externalId}": "${name}"`,
);
}
} catch (err) {
console.warn(
logger.warn(
"[orchestration] Auto-naming failed (non-critical):",
err.message,
);
@@ -99,7 +119,7 @@ async function getSemanticEpisodes(
);
return fetched.filter(Boolean);
} catch (err) {
console.warn(
logger.warn(
`[orchestration] Semantic search failed, continuing without: `,
err.message,
);
@@ -107,233 +127,287 @@ async function getSemanticEpisodes(
}
}
async function getRelevantEntities(userMessage, projectId=null) {
try {
const vector = await embedding.embed(userMessage);
const results = await qdrant.searchEntities(vector, { projectId });
console.log(
"[orchestration] Entity search results:",
results.map((r) => ({ name: r.payload?.name, score: r.score })),
);
return results.map((r) => r.payload).filter(Boolean);
} catch (err) {
console.warn(
"[orchestration] Entity search failed, continuing without:",
err.message,
);
return [];
}
}
async function chat(externalId, userMessage, options = {}) {
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt} =
appSettings.load();
// 1. Resolve or create session
let session = await memory.getSessionByExternalId(externalId);
if (!session) session = await memory.createSession(externalId);
let projectSessionIds = null;
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
if (session.project_id) {
async function getRelevantEntities(userMessage, projectId = null) {
try {
const project = await memory.getProject(session.project_id);
if (project) {
const projectSessions = await memory.getProjectSessions(session.project_id);
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
projectSessionIds = projectSessions.map((s) => s.id);
}
const vector = await embedding.embed(userMessage);
const results = await qdrant.searchEntities(vector, { projectId });
logger.info(
'[orchestration] Entity search results:',
results.map((r) => ({ name: r.payload?.name, score: r.score })),
);
// Include the Qdrant point ID (== SQLite entity ID) for graph traversal
return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
} catch (err) {
console.warn(
"[orchestration] Failed to resolve project context:",
err.message,
);
logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
return [];
}
}
// 2. Fetch recent episodes for context
const recentEpisodes = await memory.getRecentEpisodes(
session.id,
recentEpisodeLimit,
);
const isFirstMessage = recentEpisodes.length === 0;
const recentIds = new Set(recentEpisodes.map((e) => e.id));
// 3. Semantic Search
const semanticEpisodes = await getSemanticEpisodes(
userMessage,
session.id,
recentIds,
projectSessionIds,
{ semanticLimit, scoreThreshold },
);
// 3b. Entity Search
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
// 4. Assemble prompt
const prompt = buildPrompt(
recentEpisodes,
semanticEpisodes,
entities,
userMessage,
activeSystemPrompt,
);
// 5. Run inference
const result = await inference.complete(prompt, {...options, temperature, repeatPenalty, topP, topK});
// 6. Write episode back to memory
try {
await memory.createEpisode(
session.id, userMessage, result.text,
(result.evalCount || 0) + (result.promptEvalCount || 0),
session.project_id ?? null,
);
} catch (err) {
console.error('[orchestration] Failed to save episode:', err.message);
}
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
triggerSummary(session, allEpisodes);
// 8. Auto-name on first message
if (isFirstMessage && !session.name) {
autoNameSession(externalId, userMessage, result.text).catch(() => {}); // already logged inside autoNameSession
}
// 9. Return response
return {
sessionId: externalId,
response: result.text,
model: result.model,
tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
};
}
async function chatStream(externalId, userMessage, onChunk, options = {}) {
async function getFTSResults(userMessage, { limit, sessionIds }) {
try {
return await memory.searchEpisodes(userMessage, { limit, sessionIds });
} catch (err) {
logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
return [];
}
}
try {
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt } = appSettings.load();
// Returns {episode, score}[] — scores needed for buildScoredPool downstream
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
const k = RETRIEVAL.RRF_K;
const scores = new Map();
semanticEps.forEach((ep, i) => {
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
});
keywordEps.forEach((ep, i) => {
const contrib = keywordWeight / (k + i + 1);
if (scores.has(ep.id)) {
scores.get(ep.id).score += contrib;
} else if (contrib > 0) {
scores.set(ep.id, { episode: ep, score: contrib });
}
});
return [...scores.values()]
.sort((a, b) => b.score - a.score)
.slice(0, limit);
}
function estimateTokens(episode) {
return episode.token_count
?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
}
function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
const k = RETRIEVAL.RRF_K;
const pool = new Map(); // episode.id → {episode, score}
for (const { episode, score } of fusedWithScores) {
pool.set(episode.id, { episode, score });
}
recentEpisodes.forEach((ep, i) => {
const recencyScore = 1.0 / (k + i + 1);
if (pool.has(ep.id)) {
pool.get(ep.id).score += recencyScore;
} else {
pool.set(ep.id, { episode: ep, score: recencyScore });
}
});
for (const id of entityBoostedIds) {
if (pool.has(id)) pool.get(id).score += entityWeight;
}
return [...pool.values()].sort((a, b) => b.score - a.score);
}
function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
let budget = contextBudget;
const sortByTime = (a, b) => a.created_at - b.created_at;
// Guarantee floor: always include the N most recent episodes
const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
for (const ep of guaranteed) budget -= estimateTokens(ep);
// Fill remaining budget from scored pool, highest-priority first
const selected = [];
for (const { episode } of scoredPool) {
if (guaranteedIds.has(episode.id)) continue;
const cost = estimateTokens(episode);
// // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
if (budget - cost < 0) break;
selected.push(episode);
budget -= cost;
}
return {
guaranteed: [...guaranteed].sort(sortByTime),
selected: selected.sort(sortByTime),
};
}
async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
const ftsSessionIds = projectSessionIds ?? [session.id];
const ftsPromise = keywordWeight > 0
// FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
: Promise.resolve([]);
const [semanticEps, rawKeywordEps] = await Promise.all([
getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
ftsPromise,
]);
const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
}
async function assembleContext(externalId, userMessage) {
const settings = appSettings.load();
const { recentEpisodeLimit, semanticLimit, scoreThreshold,
temperature, repeatPenalty, topP, topK, systemPrompt,
semanticWeight, keywordWeight,
contextBudget, entityWeight, minRecentEpisodes } = settings;
// 1. Resolve or create session
let session = await memory.getSessionByExternalId(externalId);
if (!session) session = await memory.createSession(externalId);
// 2. Resolve project context
let projectSessionIds = null;
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
if (session.project_id) {
try {
const project = await memory.getProject(session.project_id);
if (project) {
const projectSessions = await memory.getProjectSessions(
session.project_id,
);
projectSessionIds = projectSessions.map((s) => s.id);
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
}
} catch (err) {
console.warn(
"[orchestration] Failed to resolve project context:",
err.message,
);
}
}
const recentEpisodes = await memory.getRecentEpisodes(
session.id,
recentEpisodeLimit,
);
const isFirstMessage = recentEpisodes.length === 0;
const recentIds = new Set(recentEpisodes.map((e) => e.id));
const semanticEpisodes = await getSemanticEpisodes(
userMessage,
session.id,
recentIds,
projectSessionIds,
{semanticLimit, scoreThreshold }
);
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
const prompt = buildPrompt(
recentEpisodes,
semanticEpisodes,
entities,
userMessage,
activeSystemPrompt,
);
const res = await inference.completeStream(prompt, {...options, temperature, repeatPenalty, topP, topK});
let fullText = "";
let model = "";
let tokenCount = 0;
let buffer = "";
for await (const chunk of res.body) {
buffer += Buffer.from(chunk).toString("utf8");
const events = buffer.split("\n\n");
buffer = events.pop() || "";
for (const event of events) {
const lines = event.split("\n");
const dataLines = lines
.filter((line) => line.startsWith("data: "))
.map((line) => line.slice(6));
if (dataLines.length === 0) continue;
const raw = dataLines.join("\n").trim();
if (raw === "[DONE]") continue;
try {
const data = JSON.parse(raw);
if (data.response) {
fullText += data.response;
onChunk(data.response);
}
if (data.model) model = data.model;
if (data.done && data.tokenCount !== undefined) {
tokenCount = data.tokenCount;
}
if (data.error) {
throw new Error(data.error);
}
const project = await memory.getProject(session.project_id);
if (project) {
const projectSessions = await memory.getProjectSessions(session.project_id);
if (project.system_prompt) activeSystemPrompt = project.system_prompt;
projectSessionIds = projectSessions.map(s => s.id);
}
} catch (err) {
console.error(
"[orchestration] Failed to parse inference SSE event:",
raw,
err.message,
);
logger.warn('[orchestration] Failed to resolve project context:', err.message);
}
}
}
if (fullText.trim()) {
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
triggerSummary(session, allEpisodes);
} else {
console.warn(
"[orchestration] Stream finished with no assistant text; episode not saved",
);
// 3. Fetch recent episodes
const recentEpisodes = await memory.getRecentEpisodes(session.id, recentEpisodeLimit);
const isFirstMessage = recentEpisodes.length === 0;
const recentIds = new Set(recentEpisodes.map(e => e.id));
// 4. Fused retrieval + entity search in parallel (both are independent)
const [fusedWithScores, entityResults] = await Promise.all([
getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
getRelevantEntities(userMessage, session.project_id ?? null),
]);
// 5. Entity-linked episode IDs for scoring bonus
const entityIds = entityResults.map(e => e.id);
let entityBoostedIds = new Set();
if (entityIds.length > 0) {
try {
const result = await memory.getEpisodesByEntities(entityIds);
entityBoostedIds = new Set(result.episodeIds);
} catch (err) {
logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
}
}
// 6. Build unified scored pool and select within token budget
const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);
// 7. Graph neighborhood expansion
let neighborhood = { nodes: [], edges: [] };
if (entityIds.length > 0) {
try {
neighborhood = await graph.getNeighbors(entityIds);
} catch (err) {
logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
neighborhood = { nodes: entityResults, edges: [] };
}
}
// 8. Assemble prompt
const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
return {
session,
prompt,
isFirstMessage,
inferenceOptions: { temperature, repeatPenalty, topP, topK },
};
}
async function chat(externalId, userMessage, options = {}) {
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
const result = await inference.complete(prompt, { ...options, ...inferenceOptions });
try {
await memory.createEpisode(
session.id, userMessage, result.text,
(result.evalCount || 0) + (result.promptEvalCount || 0),
session.project_id ?? null,
);
} catch (err) {
logger.error('[orchestration] Failed to save episode:', err.message);
}
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
triggerSummary(session, allEpisodes);
if (isFirstMessage && !session.name) {
autoNameSession(externalId, userMessage, fullText).catch(() => {});
autoNameSession(externalId, userMessage, result.text).catch(() => {});
}
return { model, tokenCount };
} catch (err) {
console.error(
"[orchestration] chatStream fatal error:",
err.message,
err.stack,
);
throw err;
}
return {
sessionId: externalId,
response: result.text,
model: result.model,
tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
};
}
async function chatStream(externalId, userMessage, onChunk, options = {}) {
try {
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
const res = await inference.completeStream(prompt, { ...options, ...inferenceOptions });
let fullText = '', model = '', tokenCount = 0, buffer = '';
for await (const chunk of res.body) {
buffer += Buffer.from(chunk).toString('utf8');
const events = buffer.split('\n\n');
buffer = events.pop() || '';
for (const event of events) {
const dataLines = event.split('\n')
.filter(line => line.startsWith('data: '))
.map(line => line.slice(6));
if (!dataLines.length) continue;
const raw = dataLines.join('\n').trim();
if (raw === '[DONE]') continue;
try {
const data = JSON.parse(raw);
if (data.response) { fullText += data.response; onChunk(data.response); }
if (data.model) model = data.model;
if (data.done && data.tokenCount !== undefined) tokenCount = data.tokenCount;
if (data.error) throw new Error(data.error);
} catch (err) {
logger.error('[orchestration] Failed to parse SSE event:', raw, err.message);
}
}
}
if (fullText.trim()) {
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
triggerSummary(session, allEpisodes);
} else {
logger.warn('[orchestration] Stream finished with no assistant text; episode not saved');
}
if (isFirstMessage && !session.name) {
autoNameSession(externalId, userMessage, fullText).catch(() => {});
}
return { model, tokenCount };
} catch (err) {
logger.error('[orchestration] chatStream fatal error:', err.message, err.stack);
throw err;
}
}
module.exports = { chat, chatStream };

View File

@@ -1,6 +1,6 @@
const fs = require('fs');
const path = require('path');
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
@@ -14,6 +14,11 @@ const DEFAULTS = {
topP: INFERENCE_DEFAULTS.TOP_P,
topK: INFERENCE_DEFAULTS.TOP_K,
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
semanticWeight: RETRIEVAL.SEMANTIC_WEIGHT,
keywordWeight: RETRIEVAL.KEYWORD_WEIGHT,
contextBudget: ORCHESTRATION.CONTEXT_BUDGET,
entityWeight: ORCHESTRATION.ENTITY_WEIGHT,
minRecentEpisodes: ORCHESTRATION.MIN_RECENT_EPISODES,
};
function load() {

View File

@@ -15,7 +15,7 @@ const summariesRouter = require('./routes/summaries')
const cors = require('cors');
const app = express();
app.use(express.json());
app.use(express.json({ limit: '2mb' }));
app.use(cors({
origin: [

View File

@@ -1,6 +1,8 @@
const { Router } = require('express')
const { chat, chatStream } = require('../chat/index');
const memory = require('../services/memory')
const logger = require('@nexusai/shared');
const router = Router();
@@ -17,8 +19,8 @@ router.post('/', async (req, res) => {
});
res.json(result)
} catch (err) {
console.error(`[orchestration] chat error: `, err.message)
res.status(500).json ({ error: err.message})
logger.error(`[orchestration] chat error: `, err.message)
res.status(500).json ({ error: 'Chat failed', detail: err.message })
}
});

View File

@@ -9,7 +9,7 @@ router.get('/', async (req, res) => {
const result = await memory.getEpisodes({ limit, offset, sessionId, q });
res.json(result);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch episodes', detail: err.message });
}
});
@@ -18,7 +18,7 @@ router.delete('/:id', async (req, res) => {
await memory.deleteEpisode(req.params.id);
res.status(204).send();
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to delete episode', detail: err.message });
}
});

View File

@@ -4,7 +4,7 @@ const fs = require('fs');
const path = require('path');
const appSettings = require('../config/settings');
const { getEnv, LLAMACPP } = require('@nexusai/shared');
const { getEnv, LLAMACPP, logger } = require('@nexusai/shared');
const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);
router.get('/', (req, res) => {
@@ -38,7 +38,7 @@ router.get('/', (req, res) => {
res.json(models);
} catch (err) {
console.error('[models] Failed to scan folder:', err.message);
logger.error('[models] Failed to scan folder:', err.message);
res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
}
});
@@ -53,7 +53,7 @@ router.get('/props', async (req, res) => {
modelAlias: data.model_alias,
});
} catch (err) {
console.error('[models/props]', err.message);
logger.error('[models/props]', err.message);
res.status(503).json({ error: 'Could not reach llama-server' });
}
});

View File

@@ -7,7 +7,7 @@ router.get('/', async (req, res) => {
try {
res.json(await memory.getProjects());
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch projects', detail: err.message });
}
});
@@ -17,7 +17,7 @@ router.post('/', async (req, res) => {
try {
res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to create project', detail: err.message });
}
});
@@ -25,7 +25,7 @@ router.patch('/:id', async (req, res) => {
try {
res.json(await memory.updateProject(req.params.id, req.body));
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to update project', detail: err.message });
}
});
@@ -34,7 +34,7 @@ router.delete('/:id', async (req, res) => {
await memory.deleteProject(req.params.id);
res.status(204).send();
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to delete project', detail: err.message });
}
});

View File

@@ -15,7 +15,7 @@ router.get('/:sessionId/history', async (req, res) => {
const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
res.json({ sessionId, episodes: history });
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch session history', detail: err.message });
}
});
@@ -26,7 +26,7 @@ router.get('/', async (req, res) => {
const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
res.json(sessions);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch sessions', detail: err.message });
}
});
@@ -45,7 +45,7 @@ router.patch('/:sessionId', async (req, res) => {
});
res.json(session);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to update session', detail: err.message });
}
});
@@ -54,7 +54,7 @@ router.delete('/:sessionId', async (req, res) => {
await memory.deleteSession(req.params.sessionId);
res.status(204).send();
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to delete session', detail: err.message });
}
});

View File

@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
updates.systemPrompt = val.trim() || null; // null reverts to default
}
if (req.body.semanticWeight !== undefined) {
const val = Number(req.body.semanticWeight);
if (isNaN(val) || val < 0 || val > 5)
return res.status(400).json({ error: 'semanticWeight must be 05' });
updates.semanticWeight = val;
}
if (req.body.keywordWeight !== undefined) {
const val = Number(req.body.keywordWeight);
if (isNaN(val) || val < 0 || val > 5)
return res.status(400).json({ error: 'keywordWeight must be 05' });
updates.keywordWeight = val;
}
if (req.body.contextBudget !== undefined) {
const val = Number(req.body.contextBudget);
if (!Number.isInteger(val) || val < 512 || val > 32768)
return res.status(400).json({ error: 'contextBudget must be 51232768' });
updates.contextBudget = val;
}
if (req.body.entityWeight !== undefined) {
const val = Number(req.body.entityWeight);
if (isNaN(val) || val < 0 || val > 2)
return res.status(400).json({ error: 'entityWeight must be 02' });
updates.entityWeight = val;
}
if (req.body.minRecentEpisodes !== undefined) {
const val = Number(req.body.minRecentEpisodes);
if (!Number.isInteger(val) || val < 0 || val > 10)
return res.status(400).json({ error: 'minRecentEpisodes must be 010' });
updates.minRecentEpisodes = val;
}
res.json(settings.save(updates));
});

View File

@@ -21,7 +21,7 @@ router.get('/project/:projectId/overview', async (req, res) => {
const summary = await memory.getProjectOverviewSummary(req.params.projectId);
res.json(summary);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch project overview summary', detail: err.message });
}
});
@@ -32,7 +32,7 @@ router.get('/session/:sessionId', async (req, res) => {
const summaries = await memory.getSummariesBySession(session.id);
res.json(summaries);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch session summaries', detail: err.message });
}
});
@@ -41,7 +41,7 @@ router.get('/project/:projectId', async (req, res) => {
const summaries = await memory.getSummariesByProject(req.params.projectId);
res.json(summaries);
} catch (err) {
res.status(500).json({ error: err.message });
res.status(500).json({ error: 'Failed to fetch project summaries', detail: err.message });
}
});

View File

@@ -0,0 +1,15 @@
const { getEnv, SERVICES } = require('@nexusai/shared');
const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
async function getNeighbors(entityIds) {
const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ entityIds }),
});
if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
return res.json();
}
module.exports = { getNeighbors };

View File

@@ -196,6 +196,26 @@ async function getProjectOverviewSummary(projectId) {
return res.json(); // null if none exists yet
}
async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
const url = new URL(`${BASE_URL}/episodes/search`);
url.searchParams.set('q', query);
url.searchParams.set('limit', limit);
if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
const res = await fetch(url.toString());
if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
return res.json();
}
async function getEpisodesByEntities(entityIds) {
const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ entityIds }),
});
if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
return res.json(); // { episodeIds: [...] }
}
module.exports = {
getSessionByExternalId,
createSession,
@@ -220,4 +240,6 @@ module.exports = {
getSummariesByProject,
generateProjectSummary,
getProjectOverviewSummary,
searchEpisodes,
getEpisodesByEntities,
}

View File

@@ -1,4 +1,4 @@
const { getEnv, SERVICES, SUMMARIES } = require('@nexusai/shared');
const { getEnv, SERVICES, SUMMARIES, logger } = require('@nexusai/shared');
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
@@ -104,7 +104,7 @@ async function maybeSummarize(session, allEpisodes) {
const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
// add temporarily before the generateSummary call
console.log('[summarization] episodes to summarize:', episodesToSummarize.length);
logger.debug('[summarization] episodes to summarize:', episodesToSummarize.length);
const content = await generateSummary(
episodesToSummarize,
@@ -126,7 +126,7 @@ async function maybeSummarize(session, allEpisodes) {
episodeRange,
}),
});
console.log(`[summarization] Created new summary for session ${session.id}`);
logger.debug(`[summarization] Created new summary for session ${session.id}`);
} else {
await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
method: 'PATCH',
@@ -137,14 +137,14 @@ async function maybeSummarize(session, allEpisodes) {
episodeRange,
}),
});
console.log(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
logger.debug(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
}
}
async function triggerSummary(session, allEpisodes) {
// Intentionally fire-and-forget — caller doesn't await this
maybeSummarize(session, allEpisodes).catch(err =>
console.warn('[summarization] Summary failed (non-critical):', err.message)
logger.warn('[summarization] Summary failed (non-critical):', err.message)
);
}

View File

@@ -28,6 +28,9 @@ const ORCHESTRATION = {
ENTITIES_LIMIT: 5,
ENTITIES_THRESHOLD: 0.55,
TEMPERATURE: 0.7,
CONTEXT_BUDGET: 4096,
ENTITY_WEIGHT: 0.5,
MIN_RECENT_EPISODES: 2,
CORS_ORIGIN: 'http://localhost:5173',
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
}
@@ -79,8 +82,27 @@ const SUMMARIES = {
const ENTITIES = {
TEMPERATURE: 0.1, // Low temperature, more precise extraction, less creative
NUM_PREDICT: 1024, // Max tokens to consider for entity extraction (e.g. recent conversation)
NUM_PREDICT: 1500, // Max tokens to consider for entity extraction (e.g. recent conversation)
THRESHOLD: 0.55, // Minimum confidence score for an extracted entity to be included in the results
PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
GRAPH_HOP_DEPTH: 1, // Default traversal depth for neighborhood queries
TYPES: [
'person',
'place',
'project',
'technology',
'concept',
'organization',
'character',
'event',
'topic'
],
}
const RETRIEVAL = {
RRF_K: 60, // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
SEMANTIC_WEIGHT: 1.0, // Weight applied to semantic (QDrant) results
KEYWORD_WEIGHT: 0, // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
}
module.exports = {
@@ -95,5 +117,6 @@ module.exports = {
SQLITE,
ORCHESTRATION,
SUMMARIES,
ENTITIES
ENTITIES,
RETRIEVAL,
};

View File

@@ -1,5 +1,5 @@
const {getEnv} = require('./config/env');
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES } = require('./config/constants');
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
const {parseRow, formatEpisodeText} = require('./utils')
const logger = require('./utils/logger');
@@ -20,4 +20,5 @@ module.exports = {
SUMMARIES,
ENTITIES,
logger,
RETRIEVAL,
};

67
test-fusion.js Normal file
View File

@@ -0,0 +1,67 @@
// test-fusion.js
const { RETRIEVAL } = require('./packages/shared/src/config/constants');
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
const k = RETRIEVAL.RRF_K;
const scores = new Map();
semanticEps.forEach((ep, i) => {
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
});
keywordEps.forEach((ep, i) => {
const contrib = keywordWeight / (k + i + 1);
if (scores.has(ep.id)) {
scores.get(ep.id).score += contrib;
} else if (contrib > 0) {
scores.set(ep.id, { episode: ep, score: contrib });
}
});
return [...scores.values()]
.sort((a, b) => b.score - a.score)
.slice(0, limit)
.map(({ episode }) => episode);
}
// --- Test 1: episodes in both lists rank highest ---
const semantic = [
{ id: 1, user_message: 'ep1 — semantic only, rank 1' },
{ id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
{ id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
];
const keyword = [
{ id: 3, user_message: 'ep3 — rank 1 FTS' },
{ id: 2, user_message: 'ep2 — rank 2 FTS' },
{ id: 4, user_message: 'ep4 — FTS only, rank 3' },
];
const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
result.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id} "${ep.user_message}"`));
console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
console.log(' PASS\n');
// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
result2.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id}`));
console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
console.log(' PASS\n');
// --- Test 3: limit is respected ---
const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
console.log('Test 3 — limit:2 should return exactly 2 results:');
console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
console.log(' PASS\n');
// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
const semOnly = [{ id: 10, user_message: 'sem' }];
const ftsOnly = [{ id: 20, user_message: 'fts' }];
const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
console.log('Test 4 — no overlap, both should appear:');
console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
console.log(' PASS\n');
console.log('All tests passed.');