smarter context assembly implementation

minor clean up
retrieval fusion
2026-04-27 21:41:32 -07:00 · 2026-04-27 20:17:05 -07:00 · 2026-04-27 07:03:46 -07:00 · 2026-04-27 05:56:23 -07:00 · 2026-04-27 05:46:01 -07:00 · 2026-04-27 05:21:43 -07:00
51 changed files with 1933 additions and 563 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,4 +5,5 @@ data/
 .env
 .env.*
 *.db
+.claude/settings.local.json
 EOF
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -1,2 +0,0 @@
-{
-}
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,108 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Development Commands
+
+```bash
+# Start individual services
+npm run memory           # Memory Service (port 3002)
+npm run embedding        # Embedding Service (port 3003)
+npm run inference        # Inference Service (port 3001)
+npm run orchestration    # Orchestration Service (port 4000)
+npm run mini1            # Start memory + embedding concurrently
+
+# Per-service dev mode (with --watch)
+npm -w packages/<service-name> run dev
+
+# Chat client
+npm -w packages/chat-client run dev      # Vite dev server (port 5173)
+npm -w packages/chat-client run build    # Production build
+```
+
+No test framework or linter is configured.
+
+## Architecture Overview
+
+NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
+
+### Services
+
+| Package | Port | Role |
+|---|---|---|
+| `orchestration-service` | 4000 | Central gateway; coordinates all others |
+| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
+| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
+| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
+| `chat-client` | 5173 | React/Vite frontend |
+| `shared` | — | Constants, env helpers, logger, formatters |
+
+All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
+
+### Chat Request Flow
+
+1. Client POSTs to orchestration `/chat/stream`
+2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
+3. Embedding computed for user message (embedding-service)
+4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
+5. Inference streams response (inference-service)
+6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
+7. Entity extraction triggered async (qwen2.5:3b via inference-service)
+8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
+9. Auto-naming on first message (temp 0.3, 20 tokens max)
+
+### Memory Model
+
+**Dual store — neither works alone:**
+- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
+- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
+
+Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
+
+**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
+
+### Key File Locations
+
+**Orchestration** (`packages/orchestration-service/src/`):
+- `chat/index.js` — Core prompt building and memory assembly
+- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
+- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
+- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
+
+**Memory** (`packages/memory-service/src/`):
+- `db/schema.js` — SQLite table definitions (source of truth for data model)
+- `episodic/` — Episode CRUD
+- `semantic/` — Qdrant operations
+- `entities/` — Entity extraction + CRUD
+- `summarization/` — Project summary generation
+
+**Shared** (`packages/shared/src/`):
+- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
+- `config/env.js` — `getEnv()` helper with fallback to constants
+- `utils.js` — `parseRow()`, `formatEpisodeText()`, `logger`
+
+**Frontend** (`packages/chat-client/src/`):
+- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
+- `hooks/` — `useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
+- `api/orchestration.js` — Fetch wrapper for all API calls
+- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
+
+### Configuration
+
+Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
+
+### Deployment
+
+Home lab across 3 nodes, managed with Docker Compose:
+- **Main PC** — RTX A4000 (inference via llama.cpp)
+- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
+- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
+
+Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
+
+## Key Development Principles
+
+- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
+- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
+- **All services read settings on every request** — no restart required for config changes
+- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -73,8 +73,8 @@ service by ID after the vector search.

 The core four-service architecture is complete and operational. Key capabilities:

- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
+- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
+- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
 - **Projects** — sessions grouped with shared or isolated memory pools
 - **Auto-naming** — sessions named automatically from first exchange via inference
 - **Project-scoped semantic search** — Qdrant filtered by project session IDs
--- a/docs/reference/API-routes.md
+++ b/docs/reference/API-routes.md
@@ -202,7 +202,9 @@ Returns `503` if llama-server is unreachable.
 |---|---|---|---|
 | `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
 | `semanticLimit` | integer | 1–20 | Max semantic search results |
-| `scoreThreshold` | float | 0–1 | Minimum similarity score |
+| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
+| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | string | — | Path to folder containing .gguf files |
 | `temperature` | float | 0–2 | Inference randomness |
 | `repeatPenalty` | float | 1–2 | Repeat token penalty |
@@ -360,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.

 **DELETE /relationships — body:**
 ```json
-{ "fromId": 1, "toId": 2, "label": "uses" }
+{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
 ```
+notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.

 Relationships are identified by the composite key `(fromId, toId, label)`.
 Delete uses request body rather than URL params since this three-part key
 is awkward to encode in a path.

+### Graph
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
+| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
+
+**GET /graph/neighborhood/:entityId — query params:**
+
+| Param | Default | Max | Description |
+|---|---|---|---|
+| depth | 1 | 3 | Traversal depth |
+
+Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
+
+**POST /graph/neighbors — body:**
+```json
+{ "entityIds": [5, 8, 12] }
+Returns { nodes: [...], edges: [...] }. Used internally by orchestration — not a client-facing endpoint.
+
 ---

 ## Embedding Service — port 3003
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -39,36 +39,36 @@

 ---

-## Phase 1 — Loose Ends & Stability
+## Phase 1 — Loose Ends & Stability - COMPLETE ✅
 *Target: Next development session (Saturday)*

 ### Bug Fixes
- [ ] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
+✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
+✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches

 ### Tech Debt
- [ ] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
- [ ] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
- [ ] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
- [ ] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
+✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
+✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
+✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
+✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules

 ---

 ## Phase 2 — Memory System Upgrades
 *The core intelligence layer*

-### 1. Knowledge Graph (SQLite)
+### 1. Knowledge Graph (SQLite) ✅
 The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
- [ ] Graph schema — `nodes` and `edges` tables with typed relationships
- [ ] Entity → node promotion pipeline
- [ ] Relationship traversal queries
- [ ] Graph-aware context assembly in orchestration
+- [x] Graph schema — `nodes` and `edges` tables with typed relationships
+- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
+- [x] Relationship traversal queries
+- [x] Graph-aware context assembly in orchestration

-### 2. Retrieval Fusion + Full-Text Search
+### 2. Retrieval Fusion + Full-Text Search ✅
 Multi-strategy retrieval merged into a single ranked result set.
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- [ ] Configurable weights per retrieval strategy
- [ ] Score threshold tuning per collection
+- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
+- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
+- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)

 ### 3. Memory Consolidation Lifecycle
 Prevents long-term memory degradation and enables compression.
--- a/docs/services/entity-extraction.md
+++ b/docs/services/entity-extraction.md
@@ -1,178 +1,140 @@
-# Memory Service
+# Entity Extraction

-**Package:** `@nexusai/memory-service`  
-**Location:** `packages/memory-service`  
-**Deployed on:** Mini PC 1 (192.168.0.81)  
-**Port:** 3002
+**Location:** `packages/memory-service/src/entities/extraction.js`  
+**Triggered by:** Episode creation (`POST /episodes`)  
+**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)

 ## Purpose

-Responsible for all reading and writing of long-term memory. Acts as the
-sole interface to both SQLite and Qdrant — no other service accesses these
-stores directly. On episode creation, automatically calls the embedding
-service to generate and store a vector in Qdrant.
+After each episode is saved to SQLite, the extraction pipeline runs
+asynchronously in the background to identify named entities and the
+relationships between them. Results are written back to SQLite and
+embedded into Qdrant — the episode response is never delayed.

-## Dependencies
+## Trigger

- `express` — HTTP API
- `better-sqlite3` — SQLite driver
- `@qdrant/js-client-rest` — Qdrant vector store client
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities and constants
-
-## Environment Variables
-
-| Variable | Required | Default | Description |
-|---|---|---|---|
-| PORT | No | 3002 | Port to listen on |
-| SQLITE_PATH | Yes | — | Path to SQLite database file |
-| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
-| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
-| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
-| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
-
-## Internal Structure
-
-```
-src/
-├── db/
-│   ├── index.js       # SQLite connection + initialization + migrations
-│   ├── schema.js      # Table definitions, indexes, FTS5, triggers
-│   ├── projects.js    # Project CRUD functions
-│   └── summaries.js   # Summary CRUD functions
-├── episodic/
-│   └── index.js       # Session + episode CRUD, FTS search, embedding write path
-├── semantic/
-│   └── index.js       # Qdrant collection management, upsert, search, delete
-├── entities/
-│   ├── index.js       # Entity + relationship CRUD
-│   └── extraction.js  # Automatic entity extraction via qwen2.5:3b on Ollama
-└── index.js           # Express app + all route definitions
-```
-
-## SQLite Schema
-
-Seven core tables:
-
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
-
-### Migrations
-
-Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
-idempotent migrations in `db/index.js` at startup:
+`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
+immediately after the SQLite insert, without awaiting it:

 ```js
-try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
-try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
-try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
-try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
-try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
-try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
+extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
+  .catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
 ```

-New migrations are always appended here — never modify the schema file for
-existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
+If extraction throws, the episode is unaffected — the error is logged and
+swallowed.

-### FTS5 Full-Text Search
+## Model Settings

-An `episodes_fts` virtual table enables keyword search across all episodes.
-Three triggers (`episodes_fts_insert`, `episodes_fts_update`, `episodes_fts_delete`)
-keep the FTS index automatically in sync with the episodes table.
+| Setting | Value | Notes |
+|---|---|---|
+| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
+| Temperature | 0.1 | Low for consistent, deterministic output |
+| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
+| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
+| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |

-### SQLite Configuration
+## Prompt Structure

- `journal_mode = WAL` — non-blocking reads during writes
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs set via `db.pragma()`, not `db.exec()`
+The prompt is built by `buildExtractionPrompt()`. It includes:

-### Dynamic Updates
+1. **System message** — declares the model's role as an entity and relationship extractor
+2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
+3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
+4. **Conversation** — the raw user message and AI response, delimited clearly

-Both `updateSession` and `updateProject` build their `SET` clause dynamically
-from only the fields passed — prevents partial updates from overwriting fields
-that weren't touched.
+```
+<|im_start|>system
+You are a named entity and relationship extractor. You output only valid JSON.
+<|im_end|>
+<|im_start|>user
+Read the conversation below and extract all named entities and the relationships between them.
+Entity types: person, place, project, technology, concept, organization
+...
+Return this exact JSON structure:
+{ "entities": [...], "relationships": [...] }

-`updateProject` allowlist:
-```js
-const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
+Already known entities (use these exact name and type values if the same entity appears):
+- "NexusAI" (project)
+- "Alice" (person)
+
+--- CONVERSATION ---
+User: ...
+Assistant: ...
+--- END CONVERSATION ---
+<|im_end|>
+<|im_start|>assistant
 ```

-## Qdrant / Semantic Layer
+## Expected JSON Output

-Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
-
-| Collection | Purpose |
-|---|---|
-| `episodes` | Embeddings for individual conversation exchanges |
-| `entities` | Embeddings for named entities |
-| `summaries` | Embeddings for condensed episode summaries |
-
-All collections use **768-dimension vectors** with **Cosine similarity**,
-matching `nomic-embed-text` via Ollama. Vector size and distance metric are
-defined in `@nexusai/shared` — not hardcoded here.
-
-`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
-collection that doesn't already exist at startup — all three collections are
-guaranteed to exist before any requests are handled, avoiding race conditions
-between the first entity embed and an entity search.
-
-Each collection exposes upsert, search (with optional Qdrant filter), and
-delete operations. The `wait: true` flag is used on all writes.
-
-## Embedding Write Path
-
-When a new episode is created:
-
-1. Episode saved to SQLite synchronously — response returned immediately
-2. User message + AI response combined: `User: ...\nAssistant: ...`
-3. Text sent to embedding service (`POST /embed`)
-4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
-
-This step is **fire-and-forget** — if embedding fails, the episode is still
-saved and searchable via FTS. The error is logged but not surfaced.
-
-> The Qdrant payload stores `sessionId` (the internal integer ID). See
-> `memory-isolation.md` for how project-level filtering works.
-
-## Entity Layer
-
-Entities and relationships use upsert semantics with composite unique
-constraints to prevent duplicates:
-
- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on relationship foreign keys
-
-After each episode is saved, `extraction.js` automatically extracts named
-entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
-
-> For full details on the extraction pipeline, prompt format, constrained
-> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
-
-## Summaries Layer
-
-Session summaries are generated by `orchestration-service/src/services/summarization.js`
-after each episode write and stored here via `POST /summaries`. The memory
-service is responsible only for CRUD — generation logic lives in orchestration.
-
-> For full details on trigger conditions, prompt format, cumulative updates,
-> and ChatML token stripping, see `summarization.md`.
-
-## Project Delete Behaviour
-
-Deleting a project runs as a transaction — it first nulls out `project_id`
-on all assigned sessions, then deletes the project. This avoids a foreign
-key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
-
-```js
-const doDelete = db.transaction(() => {
-  db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
-  db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
-});
+```json
+{
+  "entities": [
+    { "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
+    { "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
+  ],
+  "relationships": [
+    {
+      "from": "Alice", "fromType": "person",
+      "to": "NexusAI", "toType": "project",
+      "label": "works_on",
+      "notes": "Alice is the primary developer."
+    }
+  ]
+}
 ```

-For all HTTP endpoints, see `api-routes.md`.
+Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
+`knows`, `located_in`, `part_of`, `created_by`).
+
+## JSON Parsing
+
+The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
+tolerates any preamble or trailing prose the model emits alongside the JSON.
+If the match fails or `JSON.parse` throws, the function logs a warning and
+returns without writing anything.
+
+## Entity Processing
+
+For each entity in `parsed.entities`:
+
+1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
+2. Call `upsertEntity(name, type, notes)`:
+   - **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
+   - **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
+3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
+4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
+5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
+
+**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
+
+**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
+
+## Relationship Processing
+
+After all entities are saved, relationships are processed:
+
+1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
+2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
+3. Call `upsertRelationship(fromId, toId, label, notes)`:
+   - **Insert**: creates new row with `mention_count = 1`
+   - **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
+
+Relationships are unidirectional in storage. Bidirectionality is handled at
+query time by the graph traversal layer.
+
+## Project Scoping
+
+`projectId` is threaded through from the episode creation call. It is stored
+in the Qdrant entity payload, which enables project-scoped entity search in
+orchestration. SQLite entities and relationships are global — scoping only
+applies at the Qdrant retrieval layer.
+
+## Error Behaviour
+
+All steps after the initial model call are wrapped in a single outer try/catch.
+If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
+parsed, the function logs at `warn` level and returns. There is no retry logic.
+Individual entity embedding failures are caught per-entity and logged at `warn`
+level without affecting other entities in the same batch.
--- a/docs/services/knowledge-graph.md
+++ b/docs/services/knowledge-graph.md
@@ -0,0 +1,213 @@
+# Knowledge Graph
+
+**Location:** `packages/memory-service/src/graph/index.js`  
+**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`  
+**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`  
+**Consumed by:** Orchestration service context assembly
+
+## Purpose
+
+The knowledge graph transforms NexusAI from "remembers conversations" to
+"understands relationships between things." Rather than injecting a flat
+list of entity facts into every prompt, orchestration now retrieves a
+1-hop subgraph of connected entities and their relationships, giving the
+model structured, linked knowledge about people, projects, technologies,
+and concepts that have appeared across conversations.
+
+## Schema
+
+### `entity_episodes` (join table)
+
+Tracks which episodes contributed to each entity's knowledge. Defined in
+`schema.js` — exists on all installs.
+
+```sql
+CREATE TABLE IF NOT EXISTS entity_episodes (
+  entity_id  INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+  episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
+  PRIMARY KEY (entity_id, episode_id)
+);
+```
+
+Both FKs cascade on delete — removing an entity or episode automatically
+cleans up its join rows.
+
+### New columns on `entities`
+
+Added via migration in `db/index.js`:
+
+| Column | Type | Default | Description |
+|---|---|---|---|
+| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
+| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
+| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
+| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
+
+### New columns on `relationships`
+
+| Column | Type | Default | Description |
+|---|---|---|---|
+| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
+| `notes` | TEXT | NULL | Relationship context sentence from extraction |
+
+## Entity Promotion Model
+
+Entities are not created equal — some are mentioned once in passing, others
+recur across many conversations. `mention_count` is the signal:
+
+- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
+- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
+- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
+
+The same pattern applies to relationships — `mention_count` rises each time
+the same `(from_id, to_id, label)` triple is extracted.
+
+## Graph Traversal
+
+`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
+CTE support. No external graph database is needed.
+
+### `getNeighborhood(entityId, depth)`
+
+Traverses the graph from a single entity, following edges in **both directions**,
+up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
+
+Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
+
+**SQLite query:**
+
+```sql
+WITH RECURSIVE traverse(entity_id, depth) AS (
+    SELECT ?, 0
+    UNION
+    SELECT
+        CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
+        t.depth + 1
+    FROM relationships r
+    JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
+    WHERE t.depth < ?
+)
+SELECT DISTINCT entity_id FROM traverse
+```
+
+`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
+cycles — a node already in the traversal set is not re-visited.
+
+After collecting node IDs, two follow-up queries fetch:
+- All entity rows for those IDs
+- All relationship rows where both `from_id` and `to_id` are in the node set
+
+This ensures edges between neighbors are included even if they aren't on the
+traversal path from the seed.
+
+### `getEntityNeighbors(entityIds[])`
+
+Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
+(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
+
+1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
+2. Deduplicates seeds + neighbors using a JavaScript `Set`
+3. Fetches all entity rows and all relationship rows within the combined node set
+
+This is intentionally simpler than the recursive version — orchestration always
+uses depth=1, and the bulk query avoids N separate CTE calls.
+
+## Graph-Aware Context Assembly
+
+Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
+graph at step 7 of the chat pipeline:
+
+1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
+2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
+3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
+4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
+
+### Prompt Format
+
+`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
+
+```
+Here is what you know about entities relevant to this conversation and their connections:
+- Alice (person): software engineer working on NexusAI
+  → works_on NexusAI (project)
+  → knows Bob (person)
+- NexusAI (project): AI assistant framework
+- Bob (person): Alice's colleague
+```
+
+- One line per node: `- {name} ({type}): {notes}`
+- Outbound edges indented below: `  → {label} {target_name} ({target_type})`
+- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
+- Only outbound edges are shown — each relationship appears once, from the `from_id` side
+
+## Project Scoping
+
+The knowledge graph respects project boundaries at the **entry point**, not
+during traversal:
+
+- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
+- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
+- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
+
+Entities are tagged with `projectId` in the Qdrant payload at extraction time.
+Entities extracted from non-project sessions have `projectId: null` and only
+appear in unfiltered global searches.
+
+## API Reference
+
+### `GET /graph/neighborhood/:entityId`
+
+Returns the neighborhood of a single entity.
+
+**Query params:**
+
+| Param | Default | Max | Description |
+|---|---|---|---|
+| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
+
+**Response:**
+```json
+{
+  "entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
+  "neighborhood": {
+    "nodes": [
+      { "id": 5, "name": "Alice", "type": "person", "notes": "..." },
+      { "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
+    ],
+    "edges": [
+      { "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
+    ]
+  }
+}
+```
+
+Returns 404 if the entity does not exist.
+
+### `POST /graph/neighbors`
+
+Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
+orchestration — not intended for direct client use.
+
+**Request body:**
+```json
+{ "entityIds": [5, 8, 12] }
+```
+
+**Response:**
+```json
+{
+  "nodes": [ ...entity objects... ],
+  "edges": [ ...relationship objects... ]
+}
+```
+
+Returns 400 if `entityIds` is missing or empty.
+
+## Constants (`packages/shared/src/config/constants.js`)
+
+| Constant | Value | Description |
+|---|---|---|
+| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
+| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
+| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
+| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -9,8 +9,8 @@

 Responsible for all reading and writing of long-term memory. Acts as the
 sole interface to both SQLite and Qdrant — no other service accesses these
-stores directly. On episode creation, automatically calls the embedding
-service to generate and store a vector in Qdrant.
+stores directly. On episode creation, automatically triggers entity and
+relationship extraction and embeds results into Qdrant.

 ## Dependencies

@@ -45,19 +45,22 @@ src/
 ├── semantic/
 │   └── index.js       # Qdrant collection management, upsert, search, delete
 ├── entities/
-│   ├── index.js       # Entity + relationship CRUD
-│   └── extraction.js  # Automatic entity extraction via qwen2.5:3b on Ollama
+│   ├── index.js       # Entity + relationship CRUD (upsert, mention tracking)
+│   └── extraction.js  # Automatic entity + relationship extraction via qwen2.5:3b
+├── graph/
+│   └── index.js       # Knowledge graph traversal (neighborhood queries, recursive CTE)
 └── index.js           # Express app + all route definitions
 ```

 ## SQLite Schema

-Seven core tables:
+Eight core tables:

 - **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
 - **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
+- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
+- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
+- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
 - **summaries** — condensed episode groups for efficient context retrieval
 - **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`

@@ -73,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
 try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
 try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
 try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
+// Knowledge graph columns:
+try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
+try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
 ```

-New migrations are always appended here — never modify the schema file for
-existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
+`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
+
+New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.

 ### FTS5 Full-Text Search

@@ -117,8 +128,7 @@ defined in `@nexusai/shared` — not hardcoded here.

 `initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
 collection that doesn't already exist at startup — all three collections are
-guaranteed to exist before any requests are handled, avoiding race conditions
-between the first entity embed and an entity search.
+guaranteed to exist before any requests are handled.

 Each collection exposes upsert, search (with optional Qdrant filter), and
 delete operations. The `wait: true` flag is used on all writes.
@@ -143,15 +153,27 @@ saved and searchable via FTS. The error is logged but not surfaced.
 Entities and relationships use upsert semantics with composite unique
 constraints to prevent duplicates:

- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
+- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
+- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
 - `ON DELETE CASCADE` on relationship foreign keys

 After each episode is saved, `extraction.js` automatically extracts named
-entities from the conversation using `qwen2.5:3b` on Ollama — fire-and-forget.
+entities **and relationships** from the conversation using `qwen2.5:3b` on
+Ollama — fire-and-forget. Each saved entity is also linked to the episode
+via the `entity_episodes` join table.

-> For full details on the extraction pipeline, prompt format, constrained
-> decoding, stoplist, and Qdrant storage, see `entity-extraction.md`.
+> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.  
+> For the knowledge graph traversal layer, see `knowledge-graph.md`.
+
+## Knowledge Graph Layer
+
+`src/graph/index.js` provides SQLite-based graph traversal over the entities
+and relationships tables. Two functions are exposed via HTTP:
+
+- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
+- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
+
+> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.

 ## Summaries Layer

@@ -175,4 +197,4 @@ const doDelete = db.transaction(() => {
 });
 ```

-For all HTTP endpoints, see `api-routes.md`.
+For all HTTP endpoints, see `api-routes.md`.
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -42,9 +42,10 @@ src/
 │   ├── inference.js      # HTTP client for inference service
 │   ├── embedding.js      # HTTP client for embedding service
 │   ├── qdrant.js         # HTTP client for Qdrant (direct vector search)
+│   ├── graph.js          # HTTP client for memory-service graph endpoints
 │   └── summarization.js  # Session summarisation — triggers after each episode
 ├── chat/
-│   └── index.js          # Core pipeline — context assembly, isolation, auto-naming
+│   └── index.js          # Core pipeline — context assembly, graph expansion, auto-naming
 ├── config/
 │   └── settings.js       # Settings load/save — reads/writes data/settings.json
 ├── routes/
@@ -71,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
 |---|---|---|
 | `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
 | `semanticLimit` | 5 | Semantic search results injected into prompt |
-| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
+| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
+| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
 | `temperature` | 0.7 | Inference temperature |
 | `repeatPenalty` | 1.1 | Repeat token penalty |
@@ -100,24 +103,35 @@ difference is how the inference response is delivered to the client.

 4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).

-5. **Semantic search** — embed user message, query Qdrant for similar past
-   episodes. Deduplicated against recent episodes. Non-critical.
+5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
+   search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
+   Both paths are filtered against `recentIds` before fusion. FTS is scoped
+   to the current session or all project sessions. If `keywordWeight` is `0`,
+   the FTS call is skipped entirely. Non-critical — failures fall back to
+   whichever strategy succeeded.

 6. **Entity search** — query `entities` Qdrant collection filtered by
-   `projectId`. Non-project sessions receive no entity context. Non-critical.
+   `projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
+   point ID equals the SQLite entity ID). Non-critical.

-7. **Prompt assembly** — combine system prompt, entity context, semantic
-   episodes, recent episodes, and user message.
+7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
+   memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
+   `{ nodes, edges }` — entity objects plus the relationships connecting them.
+   If no entities were found or the graph call fails, falls back to flat entity
+   list (no edges). Non-critical.

-8. **Inference** — send to inference service. `/chat` awaits full response;
+8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
+   recent episodes, and user message.
+
+9. **Inference** — send to inference service. `/chat` awaits full response;
   `/chat/stream` pipes SSE chunks to the client.

-9. **Episode write** — write exchange back to memory with `projectId`.
+10. **Episode write** — write exchange back to memory with `projectId`.

-10. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
+11. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
    fire-and-forget. See `summarization.md` for full details.

-11. **Auto-naming** — on first message with no session name, fires a secondary
+12. **Auto-naming** — on first message with no session name, fires a secondary
    inference call (max 20 tokens, temperature 0.3) to generate a session name.

 ### Prompt Structure
@@ -125,8 +139,9 @@ difference is how the inference response is delivered to the client.
 ```
 [Resolved system prompt]

-Here is what you know about entities relevant to this conversation:
+Here is what you know about entities relevant to this conversation and their connections:
 - {name} ({type}): {notes}
+  → {label} {neighbor_name} ({neighbor_type})
 ---
 Here are some relevant memories from earlier conversations:
 User: {past user message}
@@ -141,6 +156,12 @@ User: {current message}
 Assistant:
 ```

+The entity block renders the full graph neighborhood — seed entities matched
+by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
+shows its `notes` and any outbound relationships with their targets. Neighbor
+nodes that have no outbound edges within the subgraph appear without connection
+lines.
+
 ## Summarisation

 After each episode write, `triggerSummary` is called fire-and-forget. It
@@ -199,4 +220,7 @@ handle /health*    { reverse_proxy localhost:4000 }

 After updating: `caddy reload --config /path/to/Caddyfile`

-For all HTTP endpoints, see `api-routes.md`.
+> Note: `/graph` routes are on the memory-service (port 3002) and are called
+> internally by orchestration — they do not need a Caddy entry.
+
+For all HTTP endpoints, see `api-routes.md`.
--- a/docs/services/retrieval-fusion.md
+++ b/docs/services/retrieval-fusion.md
@@ -0,0 +1,153 @@
+# Retrieval Fusion
+
+**Implementation:** `packages/orchestration-service/src/chat/index.js`  
+**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`  
+**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
+
+## Purpose
+
+Rather than relying solely on Qdrant vector similarity (which finds semantically
+related content but misses exact keyword matches) or FTS5 keyword search alone
+(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
+merges the ranked results from both strategies into a single better-ranked list.
+
+Episodes that rank highly in **both** lists score highest. An episode that is
+the top semantic match but irrelevant by keyword, or vice versa, scores lower
+than one that satisfies both.
+
+## How RRF Works
+
+For each episode `d`, its fused score is:
+
+```
+RRF(d) = w_semantic / (k + rank_semantic(d))
+        + w_keyword  / (k + rank_keyword(d))
+```
+
+- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
+- `k = 60` — smoothing constant (standard; not exposed in settings)
+- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
+
+Setting a weight to `0` removes that strategy's contribution entirely. Setting
+`keywordWeight` to `0` also short-circuits the FTS network call.
+
+## Architecture
+
+Fusion lives in orchestration — the service already coordinates multiple data
+sources, and fusion is a retrieval strategy, not a storage concern.
+
+```
+getFusedEpisodes()
+├── getSemanticEpisodes()     — Qdrant embed+search → fetch full rows by ID
+│   (existing path, unchanged)
+└── getFTSResults()           — memory-service /episodes/search → full rows directly
+    (skipped entirely if keywordWeight == 0)
+         ↓
+fuseEpisodeResults()          — pure RRF, no I/O
+         ↓
+fusedEpisodes[]               — top semanticLimit episodes by RRF score
+```
+
+### Data Shape Consistency
+
+Both sides must enter fusion as `Episode[]` — full SQLite row objects with
+the same shape — and both must be filtered against `recentIds` first:
+
+- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
+- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
+
+FTS requests `semanticLimit * 2` results to provide headroom for the
+`recentIds` filter without under-serving the fusion.
+
+## FTS Session Scoping
+
+Without scoping, FTS5 searches across all episodes in the database. For
+context assembly, results must be constrained to the current session or
+project session pool — the same scope used for Qdrant semantic search.
+
+`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
+optional `sessionIds` array. When provided, the SQL becomes:
+
+```sql
+SELECT e.* FROM episodes e
+JOIN episodes_fts fts ON e.id = fts.rowid
+WHERE episodes_fts MATCH ?
+AND e.session_id IN (?, ?, ...)
+ORDER BY rank
+LIMIT ?
+```
+
+The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
+comma-separated query param: `?q=hello&sessionIds=1,2,3`.
+
+In orchestration, `ftsSessionIds` is set to:
+- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
+- `[session.id]` — otherwise (single session only)
+
+This mirrors the Qdrant scoping logic exactly.
+
+## `fuseEpisodeResults` — Implementation Detail
+
+```js
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K; // 60
+    const scores = new Map();  // episode.id → { episode, score }
+
+    // Score semantic results (already filtered against recentIds)
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+
+    // Score + merge keyword results (already filtered against recentIds)
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;   // appears in both — sum scores
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });  // FTS-only episode
+        }
+        // contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
+    });
+
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit)
+        .map(({ episode }) => episode);
+}
+```
+
+The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
+the result set with a score of 0 when `keywordWeight` is 0 — verified by
+the test suite.
+
+## Settings
+
+| Setting | Default | Range | Description |
+|---|---|---|---|
+| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
+| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
+
+Both are readable via `GET /settings` and writable via `PATCH /settings`
+without a service restart. Changes take effect on the next chat request.
+
+**To enable keyword search:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"keywordWeight": 1.0}'
+```
+
+**To favour keyword matches over semantic:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
+```
+
+## Constants (`packages/shared/src/config/constants.js`)
+
+| Constant | Value | Description |
+|---|---|---|
+| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
+| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
+| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
--- a/package-lock.json
+++ b/package-lock.json
@@ -4224,8 +4224,7 @@
      "dependencies": {
        "@nexusai/shared": "^1.0.0",
        "dotenv": "^17.4.0",
-        "express": "^5.2.1",
-        "ollama": "^0.6.3"
+        "express": "^5.2.1"
      }
    },
    "packages/inference-service": {
--- a/packages/chat-client/src/api/orchestration.js
+++ b/packages/chat-client/src/api/orchestration.js
@@ -1,5 +1,6 @@
 import { API_DEFAULTS } from "../config/constants";

+
 const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';

 // ── Sessions ────────────────────────────────────────────────
--- a/packages/chat-client/src/components/AllChatsView.jsx
+++ b/packages/chat-client/src/components/AllChatsView.jsx
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
 import { fetchSessions, deleteSession } from '../api/orchestration';
 import { CLIENT_DEFAULTS } from '../config/constants';

+
 const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;

 export default function AllChatsView({ onSelectSession, onBack, projects }) {
--- a/packages/chat-client/src/components/AllProjectsView.jsx
+++ b/packages/chat-client/src/components/AllProjectsView.jsx
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
 import ProjectModal from './ProjectModal';
 import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';

+
 export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
  const [projects, setProjects] = useState([]);
  const [loading, setLoading] = useState(true);
--- a/packages/chat-client/src/components/SettingsView.jsx
+++ b/packages/chat-client/src/components/SettingsView.jsx
@@ -3,6 +3,7 @@ import { useSettings } from '../hooks/useSettings';
 import { useModels } from '../hooks/useModels';
 import { getServiceHealth } from '../api/orchestration';

+
 export default function SettingsView({ onNavigate, onBack, modelProps }) {
  const { settings, saveSetting, saving } = useSettings();

--- a/packages/chat-client/src/hooks/useProjects.js
+++ b/packages/chat-client/src/hooks/useProjects.js
@@ -1,6 +1,7 @@
 import { useState, useEffect, useCallback } from 'react';
 import { fetchProjects } from '../api/orchestration';

+
 export function useProjects() {
  const [projects, setProjects] = useState([]);

--- a/packages/embedding-service/CLAUDE.md
+++ b/packages/embedding-service/CLAUDE.md
@@ -0,0 +1,64 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
+
+## Running This Service
+
+```bash
+npm run embedding                          # From repo root
+npm -w packages/embedding-service run dev  # With --watch
+```
+
+Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
+
+## Single-File Service
+
+The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
+
+## Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `PORT` | `3003` | Port to listen on |
+| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
+| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
+
+Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
+
+## Ollama API Details
+
+Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
+
+```json
+{ "model": "nomic-embed-text", "input": "text to embed" }
+```
+
+Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
+
+The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
+
+## Batch Endpoint
+
+`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
+
+## Error Responses
+
+| Condition | Status | Notes |
+|---|---|---|
+| Missing/empty `text` | 400 | |
+| Ollama call fails | 502 | Upstream failure — correct status |
+| Empty `texts` array | 400 | |
+
+## Known Issue
+
+The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
+
+## API Endpoints
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Static response — does not verify Ollama is reachable |
+| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
+| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |
--- a/packages/embedding-service/package.json
+++ b/packages/embedding-service/package.json
@@ -9,7 +9,6 @@
  "dependencies": {
    "@nexusai/shared": "^1.0.0",
    "dotenv": "^17.4.0",
-    "express": "^5.2.1",
-    "ollama": "^0.6.3"
+    "express": "^5.2.1"
  }
 }
--- a/packages/embedding-service/src/index.js
+++ b/packages/embedding-service/src/index.js
@@ -3,7 +3,7 @@ const express = require('express');
 const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '1mb' }));    // limit request body to 1mb to prevent abuse - embedding requests should be small

 const PORT          = getEnv('PORT',            PORTS.EMBEDDING);  
 const OLLAMA_URL    = getEnv('OLLAMA_URL',      OLLAMA.DEFAULT_URL); 
@@ -14,7 +14,8 @@ async function embedText(text) {
    const res = await fetch(`${OLLAMA_URL}/api/embed`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
-        body: JSON.stringify({ model: EMBED_MODEL, input: text })
+        body: JSON.stringify({ model: EMBED_MODEL, input: text }),
+        signal: AbortSignal.timeout(30_000),
    });

    if (!res.ok) {
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
 app.post('/embed', async (req, res) => {
    const { text } = req.body;
    if (!text || typeof text !== 'string' || text.trim() === '') {
-        return res.status(400).json({ error: 'text is required and must be empty' });
+        return res.status(400).json({ error: 'text is required and must not be empty' });
    }

    try {
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
    }

    try {
-        //sequential embedding for now, Ollama doesn't natively parallize embeddings
+        const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
+        if (invalid !== -1)
+            return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
+
        const embeddings = [];
        for (const text of texts) {
            embeddings.push(await embedText(text.trim()));
--- a/packages/inference-service/CLAUDE.md
+++ b/packages/inference-service/CLAUDE.md
@@ -0,0 +1,75 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
+
+## Running This Service
+
+```bash
+npm run inference                          # From repo root
+npm -w packages/inference-service run dev  # With --watch
+```
+
+Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
+
+## Provider Pattern
+
+`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
+
+| `INFERENCE_PROVIDER` | Module | Backend |
+|---|---|---|
+| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
+| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
+
+An unknown provider throws immediately at startup — fail-fast, not at request time.
+
+Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
+
+## Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `PORT` | `3001` | Port to listen on |
+| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
+| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
+| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
+
+`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
+
+## Options Resolution
+
+Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
+
+## Streaming Chunk Format
+
+The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
+
+**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
+
+**llama.cpp** yields:
+- Per-token: `{ response: delta, done: false }`
+- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
+
+The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
+
+## Known Issue: `maxTokens` Missing from Streaming Route
+
+`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
+
+## SSE Format (route → caller)
+
+```
+data: {"response":"Hello"}        ← per token
+data: {"response":" world"}
+data: {"done":true,"model":"...","tokenCount":42}  ← final metadata
+data: [DONE]                       ← sentinel
+```
+
+## API Endpoints
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Returns `{ service, status, provider, model }` |
+| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
+| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |
--- a/packages/inference-service/src/index.js
+++ b/packages/inference-service/src/index.js
@@ -1,10 +1,10 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, PORTS, OLLAMA} = require('@nexusai/shared');
+const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
 const inferenceRouter = require('./routes/inference');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '8mb' }));  // prompts include full context window

 const PORT      = getEnv('PORT', PORTS.INFERENCE);
 const PROVIDER  = getEnv('INFERENCE_PROVIDER',   'ollama');
@@ -24,5 +24,5 @@ app.use('/', inferenceRouter);

 // Start the server
 app.listen(PORT, () => {
-    console.log(`Inference Service is running on port ${PORT}`);
+    logger.info(`Inference Service is running on port ${PORT}`);
 });
--- a/packages/inference-service/src/providers/llamacpp.js
+++ b/packages/inference-service/src/providers/llamacpp.js
@@ -1,4 +1,4 @@
-const { getEnv, LLAMACPP, INFERENCE_DEFAULTS } = require("@nexusai/shared");
+const { getEnv, LLAMACPP, INFERENCE_DEFAULTS, logger } = require("@nexusai/shared");

 const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
 const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
@@ -89,7 +89,7 @@ async function* completeStream(prompt, options = {}) {
    }
  }

-  console.log('[llamacpp] finalTokenCount:', finalTokenCount);
+  logger.info('[llamacpp] finalTokenCount:', finalTokenCount);

  yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
 }
--- a/packages/inference-service/src/providers/ollama.js
+++ b/packages/inference-service/src/providers/ollama.js
@@ -57,7 +57,16 @@ async function* completeStream(prompt, options = {} ) {
    });

    for await (const chunk of stream) {
-        yield chunk;
+        if (chunk.done) {
+            yield {
+                response:   '',
+                done:       true,
+                model:      chunk.model,
+                tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
+            };
+        } else {
+            yield chunk;
+        }
    }
 }

--- a/packages/inference-service/src/routes/inference.js
+++ b/packages/inference-service/src/routes/inference.js
@@ -1,5 +1,6 @@
 const { Router } = require('express');
 const { complete, completeStream } = require('../infer');
+const { logger } = require('@nexusai/shared');

 const router = Router();

@@ -15,14 +16,14 @@ router.post('/complete', async (req, res) => {
        const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
        res.json(result);
    } catch (error) {
-        console.error('[Inference] Completion error:', error.message);
-        res.status(500).json({ error: error.message });
+        logger.error('[Inference] Completion error:', error.message);
+        res.status(500).json({ error: 'Inference failed', detail: error.message });
    }
 });

 // Streaming completion endpoint - sends partial responses as they arrive
 router.post('/complete/stream', async (req, res) => {
-    const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
+    const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;

    if (!prompt) return res.status(400).json({ error: 'prompt is required' });

@@ -34,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
        let lastModel = model;
        let tokenCount = 0;

-        for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
+        for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
            if (chunk.response) {
                res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
            }
@@ -42,7 +43,7 @@ router.post('/complete/stream', async (req, res) => {
                // capture final metadata from the done signal
                lastModel  = chunk.model      ?? lastModel;
                tokenCount = chunk.tokenCount ?? tokenCount;
-                console.log('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
+                logger.info('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
            }
        }

@@ -51,7 +52,7 @@ router.post('/complete/stream', async (req, res) => {
        res.write('data: [DONE]\n\n');

    } catch (err) {
-        console.error('[Inference] Streaming error:', err.message);
+        logger.error('[Inference] Streaming error:', err.message);
        res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
    } finally {
        res.end();
--- a/packages/memory-service/CLAUDE.md
+++ b/packages/memory-service/CLAUDE.md
@@ -0,0 +1,114 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the dual-store memory model.
+
+## Running This Service
+
+```bash
+npm run memory             # From repo root (node src/index.js)
+npm -w packages/memory-service run dev   # With --watch
+```
+
+Default port: **3002**. Requires Qdrant and the embedding-service to be reachable on startup.
+
+## SQLite Schema
+
+`src/db/schema.js` is the source of truth for the data model. Key schema facts:
+
+- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
+- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
+- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
+- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
+- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
+
+**New columns on `entities` (added via migration):**
+- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
+- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
+- `source TEXT DEFAULT 'extraction'` — `'extraction'` or `'manual'`
+- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
+
+**New columns on `relationships` (added via migration):**
+- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
+- `notes TEXT` — relationship context sentence from extraction
+
+## Async Pipeline: Episode Creation
+
+`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
+
+1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
+2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
+
+Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
+
+## Entity Extraction Details
+
+`src/entities/extraction.js`:
+
+- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
+- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
+- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
+- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
+- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
+- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
+- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
+- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
+- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
+
+> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
+
+## Knowledge Graph
+
+`src/graph/index.js` provides two SQLite traversal functions:
+
+- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
+
+- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
+
+The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
+
+> For full design rationale and usage see `docs/services/knowledge-graph.md`.
+
+## Summarization Strategy
+
+`src/summarization/project.js`:
+
+- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
+- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
+- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
+- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
+- Uses temp 0.2 and `num_predict 1200`.
+
+## Qdrant Client
+
+`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
+
+## API Endpoints Quick Reference
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Static response, no dependency checks |
+| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
+| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
+| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
+| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
+| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
+| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
+| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
+| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
+| GET | `/entities/by-type/:type` | All entities of given type |
+| GET/DELETE | `/entities/:id` | |
+| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
+| GET | `/entities/:id/relationships` | Outbound only |
+| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
+| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
+| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
+| GET/POST | `/projects` | POST requires non-empty `name` |
+| GET/PATCH/DELETE | `/projects/:id` | |
+| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
+| GET | `/projects/:id/overview` | Returns null (not 404) if no overview exists |
+| GET | `/projects/:id/summaries` | All summaries for project |
+| POST | `/summaries` | Requires `content` + at least one of `sessionId`/`projectId` |
+| GET | `/sessions/:id/summaries` | |
+| PATCH/DELETE | `/summaries/:id` | |
--- a/packages/memory-service/src/db/index.js
+++ b/packages/memory-service/src/db/index.js
@@ -1,6 +1,6 @@
 const Database = require('better-sqlite3');
 const schema = require('./schema');
-const {getEnv, SQLITE } = require('@nexusai/shared');
+const {getEnv, SQLITE, logger } = require('@nexusai/shared');

 let db;  // Declare db variable in a scope accessible to all functions

@@ -54,15 +54,20 @@ function getDB() {
            db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
        } catch {}

-        try {
-            db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
-        } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
+
+        try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+        try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
+
        
        // Sync FTS index with any existing episodes data
        db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response) 
            SELECT id, user_message, ai_response FROM episodes`);

-        console.log(`Connected to SQLite database at ${path}`);
+        logger.info(`Connected to SQLite database at ${path}`);
    }
    return db;
 }
--- a/packages/memory-service/src/db/schema.js
+++ b/packages/memory-service/src/db/schema.js
@@ -38,6 +38,20 @@ const schema = `
    UNIQUE(from_id, to_id, label)
  );

+  CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
+  CREATE INDEX IF NOT EXISTS idx_relationships_to   ON relationships(to_id);
+
+  CREATE TABLE IF NOT EXISTS entity_episodes (
+    entity_id  INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
+    PRIMARY KEY (entity_id, episode_id)
+  );
+
+  CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity  ON entity_episodes(entity_id);
+  CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
+  
+
+
  CREATE TABLE IF NOT EXISTS projects (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT NOT NULL,
--- a/packages/memory-service/src/entities/extraction.js
+++ b/packages/memory-service/src/entities/extraction.js
@@ -1,14 +1,18 @@
 const semantic = require('../semantic')
-const { getEnv, SERVICES, formatEpisodeText, ENTITIES } = require('@nexusai/shared');
-const { upsertEntity } = require('./index');
+const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
+const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');

 const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
-const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
+const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
 const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);

-const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
+const ENTITY_TYPES = ENTITIES.TYPES;
 const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];

+// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
+// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
+// or other model, this format will need to change — most alternatives use either
+// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
 function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
    const knownBlock = knownEntities.length > 0
        ? [
@@ -20,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {

    return [
        '<|im_start|>system',
-        'You are a named entity extractor. You output only valid JSON.',
+        'You are a named entity and relationship extractor. You output only valid JSON.',
        '<|im_end|>',
        '<|im_start|>user',
-        'Read the conversation below and extract every named entity mentioned.',
-        `Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
-        'For each entity found, provide: name, type, and a one-sentence notes field.',
-        'Return your answer as: { "entities": [ ... ] }',
-        'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
-        'For each entity found, provide:',
-        '  "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
+        'Read the conversation below and extract all named entities and the relationships between them.',
+        `Entity types: ${ENTITY_TYPES.join(', ')}`,
+        'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
+        'Use "person" only for real people',
+        'For each entity provide:',
+        '  "name": short proper noun only (max 4 words)',
        '  "type": one of the valid types',
-        '  "notes": one specific sentence about this entity based on the conversation (not generic)',
+        '  "notes": one specific sentence about this entity based on the conversation',
+        'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
+        'Only include relationships between entities you have listed above.',
+        'Return this exact JSON structure:',
+        '{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
        '',
        knownBlock,
-        '--- CONVERSATION ---',   // clear delimiter helps smaller models
+        '--- CONVERSATION ---',
        `User: ${userMessage}`,
        `Assistant: ${aiResponse}`,
        '--- END CONVERSATION ---',
@@ -58,8 +65,8 @@ async function embedEntity(entity) {
    return data.embedding;
 }

-async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
-    console.log('[entities] Extraction triggered')
+async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
+    logger.info('[entities] Extraction triggered')
    try {
        // Fetch existing entities to guide the model toward consistent name/type pairs
        const db = require('../db').getDB();
@@ -80,6 +87,7 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
                    num_predict: ENTITIES.NUM_PREDICT,
                },
            }),
+            signal: AbortSignal.timeout(60_000),
        });

        if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
@@ -87,27 +95,41 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
        const data = await res.json();
        const raw = data.response?.trim() ?? '';

-        const parsed = JSON.parse(raw);
-        const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
-        if (entities.length === 0) {
-            console.log('[entities] No entities found in this exchange — skipping');
-            return;  // not an error, just nothing to extract
+        const jsonMatch = raw.match(/\{[\s\S]*\}/);
+        if (!jsonMatch) {
+            logger.warn('[entities] No JSON object found in response');
+            logger.debug('[entities] Raw response was:', raw);
+            return;
        }

-        if (!Array.isArray(entities)) throw new Error('Response was not a JSON array');
+        let parsed;
+        try {
+            parsed = JSON.parse(jsonMatch[0]);
+        } catch (err) {
+            logger.warn('[entities] Failed to parse extraction response:', err.message);
+            logger.debug('[entities] Raw response was:', raw);
+            return;
+        }
+        const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
+        if (entities.length === 0) {
+            logger.debug('[entities] No entities found in this exchange — skipping');
+            return;
+        }

+        // Map of "name::type" → saved entity, used for relationship resolution below
+        const entityMap = new Map();
        let saved = 0;
-        

        for (const { name, type, notes } of entities) {
-            
            if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
            if (IGNORED_NAMES.includes(name.toLowerCase())) continue;

            const entity = upsertEntity(name, type, notes ?? null);
-            console.log('[entities] Upserted entity:', entity);
+            entityMap.set(`${name}::${type}`, entity);
+            logger.info('[entities] Upserted entity:', entity);
+
+            if (episodeId) linkEntityToEpisode(entity.id, episodeId);

-            // Embed and upsert to Qdrant fire-and-forget
            embedEntity(entity)
                .then(vector => semantic.upsertEntity(entity.id, vector, {
                    name: entity.name,
@@ -116,17 +138,34 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
                    projectId: projectId ?? null,
                }))
                .catch(err => {
-                    console.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
+                    logger.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
                });

            saved++;
        }

-        if (saved > 0) console.log(`[entities] Extracted and stored ${saved} entities`);
+        if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
+
+        // Process extracted relationships — both entities must have been saved above
+        const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
+        let relSaved = 0;
+
+        for (const { from, fromType, to, toType, label, notes } of relationships) {
+            if (!from || !fromType || !to || !toType || !label) continue;
+
+            const fromEntity = entityMap.get(`${from}::${fromType}`);
+            const toEntity   = entityMap.get(`${to}::${toType}`);
+            if (!fromEntity || !toEntity) continue;
+
+            upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
+            relSaved++;
+        }
+
+        if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);

    } catch (err) {
        // Non-critical — log and move on, episode is already saved
-        console.warn('[entities] Extraction failed:', err.message);
+        logger.warn('[entities] Extraction failed:', err.message);
    }
 }

--- a/packages/memory-service/src/entities/index.js
+++ b/packages/memory-service/src/entities/index.js
@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
 /******* Entities ********/

 // Upsert an entity - insert or update if (name, type) already exists
-function upsertEntity(name, type, notes = null, metadata = null) {
+function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
    const db = getDB();
-    const stmt = db.prepare(`
-        INSERT INTO entities (name, type, notes, metadata)
-        VALUES (?, ?, ?, ?)
-        ON CONFLICT(name, type) DO UPDATE SET
-            notes      = COALESCE(entities.notes, excluded.notes),
-            metadata   = excluded.metadata,
-            updated_at = unixepoch()
-    `);
-    const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
-
+const stmt = db.prepare(`
+    INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
+    VALUES (?, ?, ?, ?, ?, unixepoch())
+    ON CONFLICT(name, type) DO UPDATE SET
+        -- First extraction wins: notes are never overwritten once set.
+        -- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
+        -- quality scoring exists, a higher-confidence extraction should be able
+        -- to replace stale notes rather than being silently dropped.
+        notes = COALESCE(entities.notes, excluded.notes),
+        metadata      = excluded.metadata,
+        mention_count = entities.mention_count + 1,
+        last_seen_at  = unixepoch(),
+        updated_at    = unixepoch()
+`);
+    stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
    return getEntityByNameType(name, type);
 }

@@ -40,15 +45,17 @@ function deleteEntity(id) {
 /********* Relationships *********/

 // Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
-function upsertRelationship(fromId, toId, label, metadata = null){
+function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
    const db = getDB();
    const stmt = db.prepare(`
-        INSERT INTO relationships (from_id, to_id, label, metadata)
-        VALUES (?, ?, ?, ?)
-        ON CONFLICT(from_id, to_id, label) DO NOTHING
+        INSERT INTO relationships (from_id, to_id, label, notes, metadata)
+        VALUES (?, ?, ?, ?, ?)
+        ON CONFLICT(from_id, to_id, label) DO UPDATE SET
+            mention_count = relationships.mention_count + 1,
+            -- First extraction wins for notes — same policy as entities.
+            notes         = COALESCE(relationships.notes, excluded.notes)
    `);
-
-    const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
+    stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
    return getRelationship(fromId, toId, label);
 }

@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
 }

 // Retrive all relationships originating from a given entity
-function getRelationshipsByEntity(entityId) {
+function getOutboundRelationships(entityId) {
    const db = getDB();
    return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
 }
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
    db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
 }   

+function linkEntityToEpisode(entityId, episodeId) {
+    const db = getDB();
+    db.prepare(`
+        INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
+        VALUES (?, ?)
+    `).run(entityId, episodeId);
+}
+
 module.exports = {
    upsertEntity,
    getEntity,
    getEntitiesByType,
    getEntityByNameType,
    deleteEntity,
+    linkEntityToEpisode,
    upsertRelationship,
    getRelationship,
-    getRelationshipsByEntity,
+    getOutboundRelationships,
    deleteRelationship
 }
--- a/packages/memory-service/src/episodic/index.js
+++ b/packages/memory-service/src/episodic/index.js
@@ -1,5 +1,5 @@
 const {getDB} = require('../db');
-const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES } = require('@nexusai/shared');
+const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES, logger } = require('@nexusai/shared');
 const semantic = require('../semantic');
 const { extractAndStoreEntities } = require('../entities/extraction')

@@ -125,10 +125,10 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
      sessionId: episode.session_id,
      createdAt: episode.created_at
    }))
-    .catch(err => console.error(`Failed to embed episode ${episode.id}:`, err.message));
+    .catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));

-  extractAndStoreEntities(userMessage, aiResponse, projectId)
-    .catch(err => console.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
+  extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
+    .catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));


  return episode;
@@ -163,22 +163,32 @@ function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
    ORDER BY created_at DESC
    LIMIT ?
  `);
-  return stmt.all(limit).map(parseRow);
+  return stmt.all(sessionId, limit).map(parseRow);
 }


 // Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
-function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
-  // FTS5 full-text search across all episodes
+function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
  const db = getDB();
-  const stmt = db.prepare(`
+  const safeQuery = `"${query.replace(/"/g, '""')}"`;
+  if (sessionIds && sessionIds.length > 0) {
+    const ph = sessionIds.map(() => '?').join(',');
+    return db.prepare(`
+      SELECT e.* FROM episodes e
+      JOIN episodes_fts fts ON e.id = fts.rowid
+      WHERE episodes_fts MATCH ?
+      AND e.session_id IN (${ph})
+      ORDER BY rank
+      LIMIT ?
+    `).all(safeQuery, ...sessionIds, limit).map(parseRow);
+  }
+  return db.prepare(`
    SELECT e.* FROM episodes e
    JOIN episodes_fts fts ON e.id = fts.rowid
    WHERE episodes_fts MATCH ?
    ORDER BY rank
    LIMIT ?
-  `);
-  return stmt.all(query, limit).map(parseRow);
+  `).all(safeQuery, limit).map(parseRow);
 }

 // Deletes an episode by its ID
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
  const res = await fetch(`${url}/embed`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
-    body: JSON.stringify({ text })  
+    body: JSON.stringify({ text }),  
+    signal: AbortSignal.timeout(30_000),
  })

  if (!res.ok) {
--- a/packages/memory-service/src/graph/index.js
+++ b/packages/memory-service/src/graph/index.js
@@ -0,0 +1,77 @@
+const { getDB } = require('../db');
+const { parseRow, ENTITIES } = require('@nexusai/shared');
+
+// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
+function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
+    const db = getDB();
+
+    const nodeRows = db.prepare(`
+        WITH RECURSIVE traverse(entity_id, depth) AS (
+            SELECT ?, 0
+            UNION
+            SELECT
+                CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
+                t.depth + 1
+            FROM relationships r
+            JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
+            WHERE t.depth < ?
+        )
+        SELECT DISTINCT entity_id FROM traverse
+    `).all(entityId, depth);
+
+    const nodeIds = nodeRows.map(r => r.entity_id);
+    if (nodeIds.length === 0) return { nodes: [], edges: [] };
+
+    const ph = nodeIds.map(() => '?').join(',');
+    const nodes = db.prepare(
+        `SELECT * FROM entities WHERE id IN (${ph})`
+    ).all(...nodeIds).map(parseRow);
+
+    const edges = db.prepare(
+        `SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
+    ).all(...nodeIds, ...nodeIds).map(parseRow);
+
+    return { nodes, edges };
+}
+
+// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
+function getEntityNeighbors(entityIds) {
+    if (!entityIds.length) return { nodes: [], edges: [] };
+    const db = getDB();
+
+    const ph = entityIds.map(() => '?').join(',');
+
+    // entityIds appears three times — once for the CASE (finding the neighbor),
+    // and once each for the FROM and TO sides of the WHERE clause
+    const neighborRows = db.prepare(`
+        SELECT DISTINCT
+            CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
+        FROM relationships
+        WHERE from_id IN (${ph}) OR to_id IN (${ph})
+    `).all(...entityIds, ...entityIds, ...entityIds);
+
+    const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
+    const allPh = allIds.map(() => '?').join(',');
+
+    const nodes = db.prepare(
+        `SELECT * FROM entities WHERE id IN (${allPh})`
+    ).all(...allIds).map(parseRow);
+
+    const edges = db.prepare(
+        `SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
+    ).all(...allIds, ...allIds).map(parseRow);
+
+    return { nodes, edges };
+}
+
+// Returns episode IDs linked to any of the given entity IDs via entity_episodes
+function getEpisodeIdsByEntities(entityIds) {
+    if (!entityIds.length) return [];
+    const db = getDB();
+    const ph = entityIds.map(() => '?').join(',');
+    return db.prepare(
+        `SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
+    ).all(...entityIds).map(r => r.episode_id);
+}
+
+module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };
--- a/packages/memory-service/src/index.js
+++ b/packages/memory-service/src/index.js
@@ -1,17 +1,18 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, PORTS, EPISODIC} = require('@nexusai/shared');
+const {getEnv, PORTS, EPISODIC, logger} = require('@nexusai/shared');
 const { getDB } = require('./db');
 const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
 const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
 const { generateAndStoreProjectSummary } = require('./summarization/project');
+const graph = require('./graph');

 const episodic = require('./episodic');
 const semantic = require('./semantic');
 const entities = require('./entities');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '2mb' }));

 const  PORT = getEnv('PORT', PORTS.MEMORY);

@@ -19,8 +20,8 @@ const  PORT = getEnv('PORT', PORTS.MEMORY);
 const db = getDB();

 semantic.initCollections()
-    .then(() => console.log(`QDrant collections ready`))
-    .catch(err => console.error(`QDrant initialization error:`, err.message));
+    .then(() => logger.info(`QDrant collections ready`))
+    .catch(err => logger.error(`QDrant initialization error:`, err.message));

 // Health check endpoint
 app.get('/health', (req, res) => {
@@ -80,7 +81,7 @@ app.patch('/sessions/by-external/:externalId', (req, res) => {
    const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
    res.json(session);
  } catch (err) {
-    res.status(500).json({error: err.message });
+    res.status(500).json({ error: 'Failed to update session', detail: err.message });
  }
 });

@@ -130,10 +131,12 @@ app.get('/episodes', (req, res) => {

 // Search MUST come before /:id — otherwise 'search' gets captured as an id
 app.get('/episodes/search', (req, res) => {
-  const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
+  const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
  if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
-  const results = episodic.searchEpisodes(q, Number(limit));
-  res.json(results);
+  const parsedSessionIds = sessionIds
+    ? sessionIds.split(',').map(Number).filter(Boolean)
+    : null;
+  res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
 });

 app.get('/episodes/:id', (req, res) => {
@@ -158,7 +161,7 @@ app.delete('/episodes/:id', (req, res) => {
  episodic.deleteEpisode(id);

  semantic.deleteEpisode(id)  // fire-and-forget
-    .catch(err => console.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
+    .catch(err => logger.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));

  res.status(204).send();
 });
@@ -202,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {

 // Upsert a relationship between two entities
 app.post('/relationships', (req, res) => {
-  const {fromId, toId, label, metadata } = req.body;
+  const { fromId, toId, label, notes, metadata } = req.body;
  if (!fromId || !toId || !label) {
    return res.status(400).json({ error: 'fromId, toId and label are required' });
  }
-  const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
+  const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
  res.status(201).json(relationship);
 });

 // Get all relationships for a given entity ID
 app.get('/entities/:id/relationships', (req, res) => {
-  res.json(entities.getRelationshipsByEntity(req.params.id));
+  res.json(entities.getOutboundRelationships(req.params.id));
 });

 // Delete a specific relationship
@@ -225,6 +228,37 @@ app.delete('/relationships', (req, res) => {
  res.status(204).send();
 })

+/********************************* */
+/********** Graph Routes ********** */
+/********************************* */
+
+// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
+app.get('/graph/neighborhood/:entityId', (req, res) => {
+    const entity = entities.getEntity(req.params.entityId);
+    if (!entity) return res.status(404).json({ error: 'Entity not found' });
+
+    const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
+    const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
+    res.json({ entity, neighborhood });
+});
+
+// Bulk 1-hop neighborhood — body: { entityIds: [...] }
+app.post('/graph/neighbors', (req, res) => {
+    const { entityIds } = req.body;
+    if (!Array.isArray(entityIds) || entityIds.length === 0) {
+        return res.status(400).json({ error: 'entityIds array is required' });
+    }
+    res.json(graph.getEntityNeighbors(entityIds.map(Number)));
+});
+
+app.post('/episodes/by-entities', (req, res) => {
+    const { entityIds } = req.body;
+    if (!Array.isArray(entityIds) || entityIds.length === 0) {
+        return res.status(400).json({ error: 'entityIds array is required' });
+    }
+    res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
+});
+
 /*********************************** */
 /********** Project Routes ********** */
 /*********************************** */
@@ -235,7 +269,7 @@ app.post('/projects', (req, res) => {
  try {
    res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to create project', detail: err.message });
  }
 });

@@ -255,7 +289,7 @@ app.post('/projects/:id/summarize', async (req, res) => {
        if (err.message.includes('No session summaries or episodes')) {
            return res.status(422).json({ error: err.message });
        }
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to generate project summary', detail: err.message });
    }
 });

@@ -310,7 +344,7 @@ app.post('/summaries', (req, res) => {
        const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
        res.status(201).json(summary);
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to create summary', detail: err.message });
    }
 });

@@ -338,5 +372,5 @@ app.delete('/summaries/:id', (req, res) => {
 /********** Start Server ********** */
 /********************************** */
 app.listen(PORT, () => {
-    console.log(`Memory Service is running on port ${PORT}`);
+    logger.info(`Memory Service is running on port ${PORT}`);
 });
--- a/packages/memory-service/src/semantic/index.js
+++ b/packages/memory-service/src/semantic/index.js
@@ -1,5 +1,5 @@
 const {QdrantClient} = require('@qdrant/js-client-rest');
-const {QDRANT, COLLECTIONS, getEnv} = require('@nexusai/shared');
+const {QDRANT, COLLECTIONS, getEnv, logger} = require('@nexusai/shared');

 let client;

@@ -24,9 +24,9 @@ async function initCollections() {
            distance: QDRANT.DISTANCE_METRIC
        }
      });
-      console.log(`Created Qdrant collection: ${name}`);
+      logger.info(`Created Qdrant collection: ${name}`);
    } else {
-      console.log(`Qdrant collection already exists: ${name}`);
+      logger.info(`Qdrant collection already exists: ${name}`);
    }
  }
 }
--- a/packages/orchestration-service/CLAUDE.md
+++ b/packages/orchestration-service/CLAUDE.md
@@ -0,0 +1,156 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
+
+## Running This Service
+
+```bash
+npm run orchestration             # From repo root (node src/index.js)
+npm -w packages/orchestration-service run dev   # With --watch
+```
+
+Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
+
+## Context Assembly (`src/chat/index.js`)
+
+`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
+
+1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
+2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
+3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
+4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
+   - No project: `must: [sessionId == this session]`
+   - Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
+   - Dedup against recent episode IDs before including.
+5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
+6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
+7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
+8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
+
+The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
+
+## Graph Context Format
+
+`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
+
+```
+- Alice (person): software engineer working on NexusAI
+  → works_on NexusAI (project)
+  → knows Bob (person)
+- NexusAI (project): AI assistant framework
+- Bob (person): Alice's colleague
+```
+
+Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
+
+## System Prompt Resolution
+
+Priority from highest to lowest:
+1. `project.system_prompt` (stored on the project row in memory-service)
+2. `settings.systemPrompt` (saved in `data/settings.json`)
+3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
+
+## Settings (`src/config/settings.js`)
+
+Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
+
+| Field | Constraint |
+|---|---|
+| `recentEpisodeLimit` | integer, 1–20 |
+| `semanticLimit` | integer, 1–20 |
+| `scoreThreshold` | number, 0–1 |
+| `temperature` | number, 0–2 |
+| `repeatPenalty` | number, 1–2 |
+| `topP` | number, 0–1 |
+| `topK` | integer, 1–100 |
+| `modelsFolderPath` | path must exist and be readable |
+| `systemPrompt` | string (trimmed); `null` reverts to shared default |
+
+`data/settings.json` is created on first save. Parent directories are created if missing.
+
+## Streaming SSE (`src/chat/index.js` — `chatStream`)
+
+The route sets SSE headers and delegates to `chatStream`, which:
+1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
+2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
+3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
+4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
+5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
+
+If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
+
+## Fire-and-Forget Tasks
+
+After every successful chat turn:
+- **Summarization** (`services/summarization.js` → `triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
+- **Auto-naming** (`chat/index.js` → `autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
+
+Both tasks catch all errors and log warnings without surfacing to the client.
+
+## Summarization Recency Guard
+
+`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
+
+When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
+
+## Qdrant Calls (Direct, Not Via Memory-Service)
+
+`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
+
+`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
+
+## Retrieval Fusion (`src/chat/index.js`)
+
+Three functions handle fusion — all pure or lightly async, all non-critical:
+
+- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
+- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
+- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
+
+FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
+
+> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
+
+## Graph Service Client (`src/services/graph.js`)
+
+Thin HTTP client for memory-service graph endpoints. One function:
+
+- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
+
+## Models Endpoint
+
+`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
+
+`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
+
+## Health Check
+
+`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
+
+## Background Model (qwen2.5:3b)
+Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
+
+## API Endpoints Quick Reference
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Returns service URLs |
+| GET | `/health/services` | Parallel status of all dependencies |
+| POST | `/chat` | Blocking completion |
+| POST | `/chat/stream` | SSE streaming |
+| GET/PATCH | `/settings` | Persistent settings |
+| GET | `/models` | `.gguf` file scan |
+| GET | `/models/props` | llama-server model info |
+| GET | `/sessions` | Delegates to memory-service |
+| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
+| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
+| DELETE | `/sessions/:sessionId` | |
+| GET | `/episodes` | Delegates; supports `q` for FTS |
+| DELETE | `/episodes/:id` | Delegates |
+| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
+| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
+| GET | `/summaries/project/:projectId/overview` | |
+| GET | `/summaries/session/:sessionId` | Resolves external ID first |
+| GET | `/summaries/project/:projectId` | |
--- a/packages/orchestration-service/src/chat/index.js
+++ b/packages/orchestration-service/src/chat/index.js
@@ -2,34 +2,32 @@ const memory = require("../services/memory");
 const inference = require("../services/inference");
 const embedding = require("../services/embedding");
 const qdrant = require("../services/qdrant");
-const { ORCHESTRATION } = require("@nexusai/shared");
+const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
 const appSettings = require("../config/settings");
 const {triggerSummary} = require('../services/summarization')
+const graph = require('../services/graph');

-function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
+function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
    const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];

-  if (entities.length > 0) {
-    parts.push(
-      "Here is what you know about entities relevant to this conversation:",
-    );
-    for (const e of entities) {
-      parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
+    const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
+    if (graphText) {
+        parts.push("Here is what you know about entities relevant to this conversation and their connections:");
+        parts.push(graphText);
+        parts.push("---");
    }
-    parts.push("---");
-  }

-  if (semanticEpisodes.length > 0) {
-    parts.push("Here are some relevant memories from earlier conversations:");
-    for (const ep of semanticEpisodes) {
+  if (selected.length > 0) {
+    parts.push("Relevant memories from earlier conversations:");
+    for (const ep of selected) {
      parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
    }
    parts.push("---");
  }

-  if (recentEpisodes.length > 0) {
-    parts.push(`Here are some relevant memories from your past conversations:`);
-    for (const ep of recentEpisodes) {
+  if (guaranteed.length > 0) {
+    parts.push("Recent conversation history (most recent exchanges):");
+    for (const ep of guaranteed) {
      parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
    }
    parts.push("--- End of recent memories ---\n");
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
  ].join("\n");
 }

+function formatGraphContext(nodes, edges) {
+    if (!nodes.length) return null;
+
+    const nodeMap = new Map(nodes.map(n => [n.id, n]));
+
+    // Build outbound adjacency
+    const outbound = new Map(nodes.map(n => [n.id, []]));
+    for (const edge of edges) {
+        if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
+            const target = nodeMap.get(edge.to_id);
+            outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
+        }
+    }
+
+    return nodes.map(n => {
+        const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
+        for (const conn of outbound.get(n.id) ?? []) lines.push(`  → ${conn}`);
+        return lines.join('\n');
+    }).join('\n');
+}
+
+
 async function autoNameSession(externalId, userMessage, aiResponse) {
  try {
    const prompt = buildNamingPrompt(userMessage, aiResponse);
@@ -64,12 +84,12 @@ async function autoNameSession(externalId, userMessage, aiResponse) {
    const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
    if (name) {
      await memory.updateSession(externalId, { name });
-      console.log(
+      logger.info(
        `[orchestration] Auto-named session "${externalId}": "${name}"`,
      );
    }
  } catch (err) {
-    console.warn(
+    logger.warn(
      "[orchestration] Auto-naming failed (non-critical):",
      err.message,
    );
@@ -99,7 +119,7 @@ async function getSemanticEpisodes(
    );
    return fetched.filter(Boolean);
  } catch (err) {
-    console.warn(
+    logger.warn(
      `[orchestration] Semantic search failed, continuing without: `,
      err.message,
    );
@@ -107,233 +127,287 @@ async function getSemanticEpisodes(
  }
 }

-async function getRelevantEntities(userMessage, projectId=null) {
-  try {
-    const vector = await embedding.embed(userMessage);
-    const results = await qdrant.searchEntities(vector, { projectId });
-    console.log(
-      "[orchestration] Entity search results:",
-      results.map((r) => ({ name: r.payload?.name, score: r.score })),
-    );
-    return results.map((r) => r.payload).filter(Boolean);
-  } catch (err) {
-    console.warn(
-      "[orchestration] Entity search failed, continuing without:",
-      err.message,
-    );
-    return [];
-  }
-}
-
-async function chat(externalId, userMessage, options = {}) {
-  const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt} =
-    appSettings.load();
-  // 1. Resolve or create session
-  let session = await memory.getSessionByExternalId(externalId);
-  if (!session) session = await memory.createSession(externalId);
-
-  let projectSessionIds = null;
-  let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
-  if (session.project_id) {
+async function getRelevantEntities(userMessage, projectId = null) {
    try {
-      const project = await memory.getProject(session.project_id);
-      if (project) {
-        const projectSessions = await memory.getProjectSessions(session.project_id);
-        if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
-        projectSessionIds = projectSessions.map((s) => s.id);
-      }
+        const vector = await embedding.embed(userMessage);
+        const results = await qdrant.searchEntities(vector, { projectId });
+        logger.info(
+            '[orchestration] Entity search results:',
+            results.map((r) => ({ name: r.payload?.name, score: r.score })),
+        );
+        // Include the Qdrant point ID (== SQLite entity ID) for graph traversal
+        return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
    } catch (err) {
-      console.warn(
-        "[orchestration] Failed to resolve project context:",
-        err.message,
-      );
+        logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
+        return [];
    }
-  }
-  // 2. Fetch recent episodes for context
-  const recentEpisodes = await memory.getRecentEpisodes(
-    session.id,
-    recentEpisodeLimit,
-  );
-  const isFirstMessage = recentEpisodes.length === 0;
-  const recentIds = new Set(recentEpisodes.map((e) => e.id));
-
-  // 3. Semantic Search
-  const semanticEpisodes = await getSemanticEpisodes(
-    userMessage,
-    session.id,
-    recentIds,
-    projectSessionIds,
-    { semanticLimit, scoreThreshold },
-  );
-
-  // 3b. Entity Search
-  const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
-
-  // 4. Assemble prompt
-  const prompt = buildPrompt(
-    recentEpisodes,
-    semanticEpisodes,
-    entities,
-    userMessage,
-    activeSystemPrompt,
-  );
-
-  // 5. Run inference
-  const result = await inference.complete(prompt, {...options, temperature, repeatPenalty, topP, topK});
-
-  // 6. Write episode back to memory
-  try {
-    await memory.createEpisode(
-        session.id, userMessage, result.text,
-        (result.evalCount || 0) + (result.promptEvalCount || 0),
-        session.project_id ?? null,
-    );
-  } catch (err) {
-      console.error('[orchestration] Failed to save episode:', err.message);
-  }
-  const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
-  triggerSummary(session, allEpisodes);
-  
-  
-    // 8. Auto-name on first message
-  if (isFirstMessage && !session.name) {
-    autoNameSession(externalId, userMessage, result.text).catch(() => {}); // already logged inside autoNameSession
-  }
-
-  // 9. Return response
-  return {
-    sessionId: externalId,
-    response: result.text,
-    model: result.model,
-    tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
-  };
 }

-async function chatStream(externalId, userMessage, onChunk, options = {}) {
+async function getFTSResults(userMessage, { limit, sessionIds }) {
+    try {
+        return await memory.searchEpisodes(userMessage, { limit, sessionIds });
+    } catch (err) {
+        logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
+        return [];
+    }
+}

-  
-  try {
-    const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt } = appSettings.load();
+// Returns {episode, score}[] — scores needed for buildScoredPool downstream
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K;
+    const scores = new Map();
+
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });
+        }
+    });
+
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit);
+
+}
+
+function estimateTokens(episode) {
+    return episode.token_count
+        ?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
+}
+
+function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
+    const k = RETRIEVAL.RRF_K;
+    const pool = new Map(); // episode.id → {episode, score}
+
+    for (const { episode, score } of fusedWithScores) {
+        pool.set(episode.id, { episode, score });
+    }
+
+    recentEpisodes.forEach((ep, i) => {
+        const recencyScore = 1.0 / (k + i + 1);
+        if (pool.has(ep.id)) {
+            pool.get(ep.id).score += recencyScore;
+        } else {
+            pool.set(ep.id, { episode: ep, score: recencyScore });
+        }
+    });
+
+    for (const id of entityBoostedIds) {
+        if (pool.has(id)) pool.get(id).score += entityWeight;
+    }
+
+    return [...pool.values()].sort((a, b) => b.score - a.score);
+}
+
+function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
+    let budget = contextBudget;
+    const sortByTime = (a, b) => a.created_at - b.created_at;
+
+    // Guarantee floor: always include the N most recent episodes
+    const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
+    const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
+    for (const ep of guaranteed) budget -= estimateTokens(ep);
+
+    // Fill remaining budget from scored pool, highest-priority first
+    const selected = [];
+    for (const { episode } of scoredPool) {
+        if (guaranteedIds.has(episode.id)) continue;
+        const cost = estimateTokens(episode);
+
+        // // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
+        if (budget - cost < 0) break;
+        selected.push(episode);
+        budget -= cost;
+    }
+
+    return {
+        guaranteed: [...guaranteed].sort(sortByTime),
+        selected:   selected.sort(sortByTime),
+    };
+}
+
+
+async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
+    const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
+    const ftsSessionIds = projectSessionIds ?? [session.id];
+
+    const ftsPromise = keywordWeight > 0
+        //  FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
+        ? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
+        : Promise.resolve([]);
+
+    const [semanticEps, rawKeywordEps] = await Promise.all([
+        getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
+        ftsPromise,
+    ]);
+
+    const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
+    return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
+}
+
+async function assembleContext(externalId, userMessage) {
+    const settings = appSettings.load();
+    const { recentEpisodeLimit, semanticLimit, scoreThreshold,
+            temperature, repeatPenalty, topP, topK, systemPrompt,
+            semanticWeight, keywordWeight,
+            contextBudget, entityWeight, minRecentEpisodes } = settings;
+
+    // 1. Resolve or create session
    let session = await memory.getSessionByExternalId(externalId);
    if (!session) session = await memory.createSession(externalId);

+    // 2. Resolve project context
    let projectSessionIds = null;
    let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
    if (session.project_id) {
-      try {
-        const project = await memory.getProject(session.project_id);
-        if (project) {
-          const projectSessions = await memory.getProjectSessions(
-            session.project_id,
-          );
-          projectSessionIds = projectSessions.map((s) => s.id);
-          if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
-        }
-        
-      } catch (err) {
-        console.warn(
-          "[orchestration] Failed to resolve project context:",
-          err.message,
-        );
-      }
-    }
-
-    const recentEpisodes = await memory.getRecentEpisodes(
-      session.id,
-      recentEpisodeLimit,
-    );
-    const isFirstMessage = recentEpisodes.length === 0;
-    const recentIds = new Set(recentEpisodes.map((e) => e.id));
-    const semanticEpisodes = await getSemanticEpisodes(
-      userMessage,
-      session.id,
-      recentIds,
-      projectSessionIds,
-      {semanticLimit, scoreThreshold }
-    );
-
-    const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
-
-    const prompt = buildPrompt(
-      recentEpisodes,
-      semanticEpisodes,
-      entities,
-      userMessage,
-      activeSystemPrompt,
-    );
-    const res = await inference.completeStream(prompt, {...options, temperature, repeatPenalty, topP, topK});
-
-    let fullText = "";
-    let model = "";
-    let tokenCount = 0;
-    let buffer = "";
-
-    for await (const chunk of res.body) {
-      buffer += Buffer.from(chunk).toString("utf8");
-
-      const events = buffer.split("\n\n");
-      buffer = events.pop() || "";
-
-      for (const event of events) {
-        const lines = event.split("\n");
-        const dataLines = lines
-          .filter((line) => line.startsWith("data: "))
-          .map((line) => line.slice(6));
-
-        if (dataLines.length === 0) continue;
-
-        const raw = dataLines.join("\n").trim();
-        if (raw === "[DONE]") continue;
-
        try {
-          const data = JSON.parse(raw);
-
-          if (data.response) {
-            fullText += data.response;
-            onChunk(data.response);
-          }
-
-          if (data.model) model = data.model;
-          if (data.done && data.tokenCount !== undefined) {
-            tokenCount = data.tokenCount;
-          }
-
-          if (data.error) {
-            throw new Error(data.error);
-          }
+            const project = await memory.getProject(session.project_id);
+            if (project) {
+                const projectSessions = await memory.getProjectSessions(session.project_id);
+                if (project.system_prompt) activeSystemPrompt = project.system_prompt;
+                projectSessionIds = projectSessions.map(s => s.id);
+            }
        } catch (err) {
-          console.error(
-            "[orchestration] Failed to parse inference SSE event:",
-            raw,
-            err.message,
-          );
+            logger.warn('[orchestration] Failed to resolve project context:', err.message);
        }
-      }
    }

-    if (fullText.trim()) {
-      await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
-      const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
-      triggerSummary(session, allEpisodes);
-    } else {
-      console.warn(
-        "[orchestration] Stream finished with no assistant text; episode not saved",
-      );
+    // 3. Fetch recent episodes
+    const recentEpisodes = await memory.getRecentEpisodes(session.id, recentEpisodeLimit);
+    const isFirstMessage = recentEpisodes.length === 0;
+    const recentIds = new Set(recentEpisodes.map(e => e.id));
+
+    // 4. Fused retrieval + entity search in parallel (both are independent)
+    const [fusedWithScores, entityResults] = await Promise.all([
+        getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
+        getRelevantEntities(userMessage, session.project_id ?? null),
+    ]);
+
+    // 5. Entity-linked episode IDs for scoring bonus
+    const entityIds = entityResults.map(e => e.id);
+    let entityBoostedIds = new Set();
+    if (entityIds.length > 0) {
+        try {
+            const result = await memory.getEpisodesByEntities(entityIds);
+            entityBoostedIds = new Set(result.episodeIds);
+        } catch (err) {
+            logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
+        }
    }

+    // 6. Build unified scored pool and select within token budget
+    const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
+    const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);
+
+    // 7. Graph neighborhood expansion
+    let neighborhood = { nodes: [], edges: [] };
+    if (entityIds.length > 0) {
+        try {
+            neighborhood = await graph.getNeighbors(entityIds);
+        } catch (err) {
+            logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
+            neighborhood = { nodes: entityResults, edges: [] };
+        }
+    }
+
+    // 8. Assemble prompt
+    const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
+
+    return {
+        session,
+        prompt,
+        isFirstMessage,
+        inferenceOptions: { temperature, repeatPenalty, topP, topK },
+    };
+}
+
+async function chat(externalId, userMessage, options = {}) {
+    const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
+
+    const result = await inference.complete(prompt, { ...options, ...inferenceOptions });
+
+    try {
+        await memory.createEpisode(
+            session.id, userMessage, result.text,
+            (result.evalCount || 0) + (result.promptEvalCount || 0),
+            session.project_id ?? null,
+        );
+    } catch (err) {
+        logger.error('[orchestration] Failed to save episode:', err.message);
+    }
+
+    const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
+    triggerSummary(session, allEpisodes);
+
    if (isFirstMessage && !session.name) {
-      autoNameSession(externalId, userMessage, fullText).catch(() => {});
+        autoNameSession(externalId, userMessage, result.text).catch(() => {});
    }

-    return { model, tokenCount };
-  } catch (err) {
-    console.error(
-      "[orchestration] chatStream fatal error:",
-      err.message,
-      err.stack,
-    );
-    throw err;
-  }
+    return {
+        sessionId: externalId,
+        response: result.text,
+        model: result.model,
+        tokenCount: (result.evalCount || 0) + (result.promptEvalCount || 0),
+    };
+}
+
+async function chatStream(externalId, userMessage, onChunk, options = {}) {
+    try {
+        const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
+
+        const res = await inference.completeStream(prompt, { ...options, ...inferenceOptions });
+
+        let fullText = '', model = '', tokenCount = 0, buffer = '';
+
+        for await (const chunk of res.body) {
+            buffer += Buffer.from(chunk).toString('utf8');
+            const events = buffer.split('\n\n');
+            buffer = events.pop() || '';
+
+            for (const event of events) {
+                const dataLines = event.split('\n')
+                    .filter(line => line.startsWith('data: '))
+                    .map(line => line.slice(6));
+
+                if (!dataLines.length) continue;
+                const raw = dataLines.join('\n').trim();
+                if (raw === '[DONE]') continue;
+
+                try {
+                    const data = JSON.parse(raw);
+                    if (data.response) { fullText += data.response; onChunk(data.response); }
+                    if (data.model) model = data.model;
+                    if (data.done && data.tokenCount !== undefined) tokenCount = data.tokenCount;
+                    if (data.error) throw new Error(data.error);
+                } catch (err) {
+                    logger.error('[orchestration] Failed to parse SSE event:', raw, err.message);
+                }
+            }
+        }
+
+        if (fullText.trim()) {
+            await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
+            const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
+            triggerSummary(session, allEpisodes);
+        } else {
+            logger.warn('[orchestration] Stream finished with no assistant text; episode not saved');
+        }
+
+        if (isFirstMessage && !session.name) {
+            autoNameSession(externalId, userMessage, fullText).catch(() => {});
+        }
+
+        return { model, tokenCount };
+    } catch (err) {
+        logger.error('[orchestration] chatStream fatal error:', err.message, err.stack);
+        throw err;
+    }
 }

 module.exports = { chat, chatStream };
--- a/packages/orchestration-service/src/config/settings.js
+++ b/packages/orchestration-service/src/config/settings.js
@@ -1,6 +1,6 @@
 const fs = require('fs');
 const path = require('path');
-const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
+const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');

 const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');

@@ -14,6 +14,11 @@ const DEFAULTS = {
  topP:                 INFERENCE_DEFAULTS.TOP_P,
  topK:                 INFERENCE_DEFAULTS.TOP_K,
  systemPrompt:         ORCHESTRATION.SYSTEM_PROMPT,
+  semanticWeight:       RETRIEVAL.SEMANTIC_WEIGHT,
+  keywordWeight:        RETRIEVAL.KEYWORD_WEIGHT,
+  contextBudget:        ORCHESTRATION.CONTEXT_BUDGET,
+  entityWeight:         ORCHESTRATION.ENTITY_WEIGHT,
+  minRecentEpisodes:    ORCHESTRATION.MIN_RECENT_EPISODES,
 };

 function load() {
--- a/packages/orchestration-service/src/index.js
+++ b/packages/orchestration-service/src/index.js
@@ -15,7 +15,7 @@ const summariesRouter = require('./routes/summaries')
 const cors = require('cors');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '2mb' }));

 app.use(cors({
    origin: [
--- a/packages/orchestration-service/src/routes/chat.js
+++ b/packages/orchestration-service/src/routes/chat.js
@@ -1,6 +1,8 @@
 const { Router } = require('express')
 const { chat, chatStream } = require('../chat/index');
 const memory = require('../services/memory')
+const logger = require('@nexusai/shared');
+

 const router = Router();

@@ -17,8 +19,8 @@ router.post('/', async (req, res) => {
        });
        res.json(result)
    } catch (err) {
-        console.error(`[orchestration] chat error: `, err.message)
-        res.status(500).json ({ error: err.message})
+        logger.error(`[orchestration] chat error: `, err.message)
+        res.status(500).json ({ error: 'Chat failed', detail: err.message })
    }
 });

--- a/packages/orchestration-service/src/routes/episodes.js
+++ b/packages/orchestration-service/src/routes/episodes.js
@@ -9,7 +9,7 @@ router.get('/', async (req, res) => {
    const result = await memory.getEpisodes({ limit, offset, sessionId, q });
    res.json(result);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch episodes', detail: err.message });
  }
 });

@@ -18,7 +18,7 @@ router.delete('/:id', async (req, res) => {
    await memory.deleteEpisode(req.params.id);
    res.status(204).send();
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to delete episode', detail: err.message });
  }
 });

--- a/packages/orchestration-service/src/routes/models.js
+++ b/packages/orchestration-service/src/routes/models.js
@@ -4,7 +4,7 @@ const fs = require('fs');
 const path = require('path');
 const appSettings = require('../config/settings');

-const { getEnv, LLAMACPP } = require('@nexusai/shared');
+const { getEnv, LLAMACPP, logger } = require('@nexusai/shared');
 const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);

 router.get('/', (req, res) => {
@@ -38,7 +38,7 @@ router.get('/', (req, res) => {

    res.json(models);
  } catch (err) {
-    console.error('[models] Failed to scan folder:', err.message);
+    logger.error('[models] Failed to scan folder:', err.message);
    res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
  }
 });
@@ -53,7 +53,7 @@ router.get('/props', async (req, res) => {
      modelAlias: data.model_alias,
    });
  } catch (err) {
-    console.error('[models/props]', err.message);
+    logger.error('[models/props]', err.message);
    res.status(503).json({ error: 'Could not reach llama-server' });
  }
 });
--- a/packages/orchestration-service/src/routes/projects.js
+++ b/packages/orchestration-service/src/routes/projects.js
@@ -7,7 +7,7 @@ router.get('/', async (req, res) => {
    try {
        res.json(await memory.getProjects());
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to fetch projects', detail: err.message });
    }
 });

@@ -17,7 +17,7 @@ router.post('/', async (req, res) => {
    try {
        res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to create project', detail: err.message });
    }
 });

@@ -25,7 +25,7 @@ router.patch('/:id', async (req, res) => {
  try {
    res.json(await memory.updateProject(req.params.id, req.body));
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to update project', detail: err.message });
  }
 });

@@ -34,7 +34,7 @@ router.delete('/:id', async (req, res) => {
        await memory.deleteProject(req.params.id);
        res.status(204).send();
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to delete project', detail: err.message });
    }
 });

--- a/packages/orchestration-service/src/routes/sessions.js
+++ b/packages/orchestration-service/src/routes/sessions.js
@@ -15,7 +15,7 @@ router.get('/:sessionId/history', async (req, res) => {
    const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
    res.json({ sessionId, episodes: history });
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch session history', detail: err.message });
  }
 });

@@ -26,7 +26,7 @@ router.get('/', async (req, res) => {
    const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
    res.json(sessions);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch sessions', detail: err.message });
  }
 });

@@ -45,7 +45,7 @@ router.patch('/:sessionId', async (req, res) => {
    });
    res.json(session);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to update session', detail: err.message });
  }
 });

@@ -54,7 +54,7 @@ router.delete('/:sessionId', async (req, res) => {
        await memory.deleteSession(req.params.sessionId);
        res.status(204).send();
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to delete session', detail: err.message });
    }
 });

--- a/packages/orchestration-service/src/routes/settings.js
+++ b/packages/orchestration-service/src/routes/settings.js
@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
  updates.systemPrompt = val.trim() || null; // null reverts to default
 }

+  if (req.body.semanticWeight !== undefined) {
+    const val = Number(req.body.semanticWeight);
+    if (isNaN(val) || val < 0 || val > 5)
+      return res.status(400).json({ error: 'semanticWeight must be 0–5' });
+    updates.semanticWeight = val;
+  }
+
+  if (req.body.keywordWeight !== undefined) {
+    const val = Number(req.body.keywordWeight);
+    if (isNaN(val) || val < 0 || val > 5)
+      return res.status(400).json({ error: 'keywordWeight must be 0–5' });
+    updates.keywordWeight = val;
+  }
+
+  if (req.body.contextBudget !== undefined) {
+    const val = Number(req.body.contextBudget);
+    if (!Number.isInteger(val) || val < 512 || val > 32768)
+        return res.status(400).json({ error: 'contextBudget must be 512–32768' });
+    updates.contextBudget = val;
+  }
+
+  if (req.body.entityWeight !== undefined) {
+      const val = Number(req.body.entityWeight);
+      if (isNaN(val) || val < 0 || val > 2)
+          return res.status(400).json({ error: 'entityWeight must be 0–2' });
+      updates.entityWeight = val;
+  }
+
+  if (req.body.minRecentEpisodes !== undefined) {
+      const val = Number(req.body.minRecentEpisodes);
+      if (!Number.isInteger(val) || val < 0 || val > 10)
+          return res.status(400).json({ error: 'minRecentEpisodes must be 0–10' });
+      updates.minRecentEpisodes = val;
+  }
+
  res.json(settings.save(updates));
 });

--- a/packages/orchestration-service/src/routes/summaries.js
+++ b/packages/orchestration-service/src/routes/summaries.js
@@ -21,7 +21,7 @@ router.get('/project/:projectId/overview', async (req, res) => {
        const summary = await memory.getProjectOverviewSummary(req.params.projectId);
        res.json(summary);
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to fetch project overview summary', detail: err.message });
    }
 });

@@ -32,7 +32,7 @@ router.get('/session/:sessionId', async (req, res) => {
        const summaries = await memory.getSummariesBySession(session.id);
        res.json(summaries);
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to fetch session summaries', detail: err.message });
    }
 });

@@ -41,7 +41,7 @@ router.get('/project/:projectId', async (req, res) => {
        const summaries = await memory.getSummariesByProject(req.params.projectId);
        res.json(summaries);
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to fetch project summaries', detail: err.message });
    }
 });

--- a/packages/orchestration-service/src/services/graph.js
+++ b/packages/orchestration-service/src/services/graph.js
@@ -0,0 +1,15 @@
+const { getEnv, SERVICES } = require('@nexusai/shared');
+
+const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
+
+async function getNeighbors(entityIds) {
+    const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ entityIds }),
+    });
+    if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
+    return res.json();
+}
+
+module.exports = { getNeighbors };
--- a/packages/orchestration-service/src/services/memory.js
+++ b/packages/orchestration-service/src/services/memory.js
@@ -196,6 +196,26 @@ async function getProjectOverviewSummary(projectId) {
    return res.json(); // null if none exists yet
 }

+async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
+    const url = new URL(`${BASE_URL}/episodes/search`);
+    url.searchParams.set('q', query);
+    url.searchParams.set('limit', limit);
+    if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
+    const res = await fetch(url.toString());
+    if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
+    return res.json();
+}
+
+async function getEpisodesByEntities(entityIds) {
+    const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ entityIds }),
+    });
+    if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
+    return res.json(); // { episodeIds: [...] }
+}
+
 module.exports = {
    getSessionByExternalId,
    createSession,
@@ -220,4 +240,6 @@ module.exports = {
    getSummariesByProject,
    generateProjectSummary,
    getProjectOverviewSummary,
+    searchEpisodes,
+    getEpisodesByEntities,
 }
--- a/packages/orchestration-service/src/services/summarization.js
+++ b/packages/orchestration-service/src/services/summarization.js
@@ -1,4 +1,4 @@
-const { getEnv, SERVICES, SUMMARIES } = require('@nexusai/shared');
+const { getEnv, SERVICES, SUMMARIES, logger } = require('@nexusai/shared');

 const EXTRACTION_URL  = getEnv('EXTRACTION_URL', 'http://localhost:11434');
 const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
@@ -104,7 +104,7 @@ async function maybeSummarize(session, allEpisodes) {
    const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);

    // add temporarily before the generateSummary call
-    console.log('[summarization] episodes to summarize:', episodesToSummarize.length);
+    logger.debug('[summarization] episodes to summarize:', episodesToSummarize.length);

    const content = await generateSummary(
        episodesToSummarize,
@@ -126,7 +126,7 @@ async function maybeSummarize(session, allEpisodes) {
                episodeRange,
            }),
        });
-        console.log(`[summarization] Created new summary for session ${session.id}`);
+        logger.debug(`[summarization] Created new summary for session ${session.id}`);
    } else {
        await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
            method: 'PATCH',
@@ -137,14 +137,14 @@ async function maybeSummarize(session, allEpisodes) {
                episodeRange,
            }),
        });
-        console.log(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
+        logger.debug(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
    }
 }

 async function triggerSummary(session, allEpisodes) {
    // Intentionally fire-and-forget — caller doesn't await this
    maybeSummarize(session, allEpisodes).catch(err =>
-        console.warn('[summarization] Summary failed (non-critical):', err.message)
+        logger.warn('[summarization] Summary failed (non-critical):', err.message)
    );
 }

--- a/packages/shared/src/config/constants.js
+++ b/packages/shared/src/config/constants.js
@@ -28,6 +28,9 @@ const ORCHESTRATION = {
    ENTITIES_LIMIT:         5,
    ENTITIES_THRESHOLD:     0.55,
    TEMPERATURE:            0.7,
+    CONTEXT_BUDGET:         4096,
+    ENTITY_WEIGHT:          0.5,
+    MIN_RECENT_EPISODES:    2,
    CORS_ORIGIN:            'http://localhost:5173',
    SYSTEM_PROMPT:          `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
 }
@@ -79,8 +82,27 @@ const SUMMARIES = {

 const ENTITIES = {
    TEMPERATURE:    0.1,    // Low temperature, more precise extraction, less creative
-    NUM_PREDICT:    1024,   // Max tokens to consider for entity extraction (e.g. recent conversation)
+    NUM_PREDICT:    1500,   // Max tokens to consider for entity extraction (e.g. recent conversation)
    THRESHOLD:      0.55,   // Minimum confidence score for an extracted entity to be included in the results
+    PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
+    GRAPH_HOP_DEPTH: 1,     // Default traversal depth for neighborhood queries
+    TYPES: [
+        'person', 
+        'place', 
+        'project', 
+        'technology', 
+        'concept', 
+        'organization', 
+        'character', 
+        'event', 
+        'topic'
+    ],
+}
+
+const RETRIEVAL = {
+    RRF_K:              60,     // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
+    SEMANTIC_WEIGHT:    1.0,    // Weight applied to semantic (QDrant) results
+    KEYWORD_WEIGHT:     0,    // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
 }

 module.exports = {
@@ -95,5 +117,6 @@ module.exports = {
    SQLITE,
    ORCHESTRATION,
    SUMMARIES,
-    ENTITIES
+    ENTITIES,
+    RETRIEVAL,
 };
--- a/packages/shared/src/index.js
+++ b/packages/shared/src/index.js
@@ -1,5 +1,5 @@
 const {getEnv} = require('./config/env');
-const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES } = require('./config/constants');
+const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
 const {parseRow, formatEpisodeText} = require('./utils')
 const logger = require('./utils/logger');

@@ -20,4 +20,5 @@ module.exports = {
    SUMMARIES,
    ENTITIES,
    logger,
+    RETRIEVAL,
 };
--- a/test-fusion.js
+++ b/test-fusion.js
@@ -0,0 +1,67 @@
+// test-fusion.js
+const { RETRIEVAL } = require('./packages/shared/src/config/constants');
+
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K;
+    const scores = new Map();
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });
+        }
+    });
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit)
+        .map(({ episode }) => episode);
+}
+
+// --- Test 1: episodes in both lists rank highest ---
+const semantic = [
+    { id: 1, user_message: 'ep1 — semantic only, rank 1' },
+    { id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
+    { id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
+];
+const keyword = [
+    { id: 3, user_message: 'ep3 — rank 1 FTS' },
+    { id: 2, user_message: 'ep2 — rank 2 FTS' },
+    { id: 4, user_message: 'ep4 — FTS only, rank 3' },
+];
+
+const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
+console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
+result.forEach((ep, i) => console.log(`  ${i + 1}. id=${ep.id} "${ep.user_message}"`));
+console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
+console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
+console.log('  PASS\n');
+
+// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
+const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
+console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
+result2.forEach((ep, i) => console.log(`  ${i + 1}. id=${ep.id}`));
+console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
+console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
+console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
+console.log('  PASS\n');
+
+// --- Test 3: limit is respected ---
+const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
+console.log('Test 3 — limit:2 should return exactly 2 results:');
+console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
+console.log('  PASS\n');
+
+// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
+const semOnly = [{ id: 10, user_message: 'sem' }];
+const ftsOnly = [{ id: 20, user_message: 'fts' }];
+const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
+console.log('Test 4 — no overlap, both should appear:');
+console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
+console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
+console.log('  PASS\n');
+
+console.log('All tests passed.');
Author	SHA1	Message	Date
Storme-bit	e4908193bd	smarter context assembly implementation	2026-04-27 21:41:32 -07:00
Storme-bit	b58a4e4692	minor clean up	2026-04-27 20:17:05 -07:00
Storme-bit	055683424d	retrieval fusion	2026-04-27 07:03:46 -07:00
Storme-bit	27ad614130	retrieval fusion	2026-04-27 05:56:23 -07:00
Storme-bit	8ade5c68ca	retrieval fusion	2026-04-27 05:46:01 -07:00
Storme-bit	49982a85de	retrieval fusion	2026-04-27 05:21:43 -07:00
Storme-bit	9c6c5c9a42	entity extraction prompt	2026-04-27 03:50:13 -07:00
Storme-bit	c9cbac87ac	knowledge graph entity fixes	2026-04-27 03:41:56 -07:00
Storme-bit	1a97b19280	roadmap phase 1 complete	2026-04-27 03:10:39 -07:00
Storme-bit	9fe8e568cf	roadmap phase 1 complete	2026-04-27 00:28:42 -07:00
Storme-bit	5ad01c6ad8	clean up	2026-04-27 00:14:51 -07:00
Storme-bit	aac0923351	Merge branch 'main' of http://192.168.0.205:3100/storme/nexusai	2026-04-27 00:10:16 -07:00
Storme-bit	54218894c0	logger clean up	2026-04-27 00:09:16 -07:00
Storme-bit	66a95f4479	logger clean up	2026-04-27 00:07:51 -07:00
storme	78476e166f	Delete .claude/settings.local.json	2026-04-27 06:57:49 +00:00
Storme-bit	696ead29f8	chat/index.js cleanup	2026-04-26 23:04:31 -07:00
Storme-bit	45db47a584	error response consistency, human readible1	2026-04-26 23:00:55 -07:00
Storme-bit	095c9a623e	error response consistency, human readible1	2026-04-26 23:00:18 -07:00
Storme-bit	f5011fddca	logger updates	2026-04-26 22:29:57 -07:00
Storme-bit	86e78cc4c6	logger updates	2026-04-26 22:28:54 -07:00