smarter context assembly implementation

minor clean up
retrieval fusion
2026-04-27 21:41:32 -07:00 · 2026-04-27 20:17:05 -07:00 · 2026-04-27 07:03:46 -07:00 · 2026-04-27 05:56:23 -07:00 · 2026-04-27 05:46:01 -07:00 · 2026-04-27 05:21:43 -07:00
64 changed files with 3216 additions and 584 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -5,4 +5,5 @@ data/
 .env
 .env.*
 *.db
+.claude/settings.local.json
 EOF
--- a/.vscode/settings.json
+++ b/.vscode/settings.json
@@ -1,2 +0,0 @@
-{
-}
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -0,0 +1,108 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Development Commands
+
+```bash
+# Start individual services
+npm run memory           # Memory Service (port 3002)
+npm run embedding        # Embedding Service (port 3003)
+npm run inference        # Inference Service (port 3001)
+npm run orchestration    # Orchestration Service (port 4000)
+npm run mini1            # Start memory + embedding concurrently
+
+# Per-service dev mode (with --watch)
+npm -w packages/<service-name> run dev
+
+# Chat client
+npm -w packages/chat-client run dev      # Vite dev server (port 5173)
+npm -w packages/chat-client run build    # Production build
+```
+
+No test framework or linter is configured.
+
+## Architecture Overview
+
+NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
+
+### Services
+
+| Package | Port | Role |
+|---|---|---|
+| `orchestration-service` | 4000 | Central gateway; coordinates all others |
+| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
+| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
+| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
+| `chat-client` | 5173 | React/Vite frontend |
+| `shared` | — | Constants, env helpers, logger, formatters |
+
+All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
+
+### Chat Request Flow
+
+1. Client POSTs to orchestration `/chat/stream`
+2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
+3. Embedding computed for user message (embedding-service)
+4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
+5. Inference streams response (inference-service)
+6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
+7. Entity extraction triggered async (qwen2.5:3b via inference-service)
+8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
+9. Auto-naming on first message (temp 0.3, 20 tokens max)
+
+### Memory Model
+
+**Dual store — neither works alone:**
+- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
+- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
+
+Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
+
+**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
+
+### Key File Locations
+
+**Orchestration** (`packages/orchestration-service/src/`):
+- `chat/index.js` — Core prompt building and memory assembly
+- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
+- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
+- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
+
+**Memory** (`packages/memory-service/src/`):
+- `db/schema.js` — SQLite table definitions (source of truth for data model)
+- `episodic/` — Episode CRUD
+- `semantic/` — Qdrant operations
+- `entities/` — Entity extraction + CRUD
+- `summarization/` — Project summary generation
+
+**Shared** (`packages/shared/src/`):
+- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
+- `config/env.js` — `getEnv()` helper with fallback to constants
+- `utils.js` — `parseRow()`, `formatEpisodeText()`, `logger`
+
+**Frontend** (`packages/chat-client/src/`):
+- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
+- `hooks/` — `useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
+- `api/orchestration.js` — Fetch wrapper for all API calls
+- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
+
+### Configuration
+
+Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
+
+### Deployment
+
+Home lab across 3 nodes, managed with Docker Compose:
+- **Main PC** — RTX A4000 (inference via llama.cpp)
+- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
+- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
+
+Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
+
+## Key Development Principles
+
+- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
+- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
+- **All services read settings on every request** — no restart required for config changes
+- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -73,8 +73,8 @@ service by ID after the vector search.

 The core four-service architecture is complete and operational. Key capabilities:

- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
+- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
+- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
 - **Projects** — sessions grouped with shared or isolated memory pools
 - **Auto-naming** — sessions named automatically from first exchange via inference
 - **Project-scoped semantic search** — Qdrant filtered by project session IDs
--- a/docs/reference/API-routes.md
+++ b/docs/reference/API-routes.md
@@ -120,6 +120,38 @@ all projects use isolated memory. Returns `201` with the created project object.

 Only provided fields are updated — omitted fields are not touched.

+### Summaries
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /summaries/session/:sessionId | Get all summaries for a session (by external UUID) |
+| GET | /summaries/project/:projectId | Get all summaries for a project |
+
+**GET /summaries/session/:sessionId** — resolves the external UUID to an
+internal session ID, then fetches summaries from the memory service.
+Returns an array of summary objects ordered by `created_at` ascending.
+
+**GET /summaries/project/:projectId** — proxies directly to the memory
+service project summaries endpoint.
+
+**Summary object shape:**
+```json
+{
+  "id": 8,
+  "session_id": 72,
+  "project_id": null,
+  "content": "The user asked about...",
+  "token_count": 579,
+  "episode_range": "246-251",
+  "created_at": 1776766518,
+  "updated_at": 1776766518
+}
+```
+
+> **Proxy requirement:** `/summaries` must be added to both the Caddyfile
+> reverse proxy and the Vite dev proxy config alongside the other route
+> prefixes. See `orchestration-service.md` for the Caddy block pattern.
+
 ### Models

 | Method | Path | Description |
@@ -170,7 +202,9 @@ Returns `503` if llama-server is unreachable.
 |---|---|---|---|
 | `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
 | `semanticLimit` | integer | 1–20 | Max semantic search results |
-| `scoreThreshold` | float | 0–1 | Minimum similarity score |
+| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
+| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | string | — | Path to folder containing .gguf files |
 | `temperature` | float | 0–2 | Inference randomness |
 | `repeatPenalty` | float | 1–2 | Repeat token penalty |
@@ -269,6 +303,29 @@ Both fields are optional. Only provided fields are updated.

 Same request/response shape as orchestration `/projects` above.

+### Summaries
+
+| Method | Path | Description |
+|---|---|---|
+| POST | /summaries | Create a new summary |
+| GET | /sessions/:id/summaries | Get all summaries for a session (internal ID) |
+| GET | /projects/:id/summaries | Get all summaries for a project |
+| PATCH | /summaries/:id | Update a summary (content, tokenCount, episodeRange) |
+| DELETE | /summaries/:id | Delete a summary |
+
+**POST /summaries — body:**
+```json
+{
+  "sessionId": 72,
+  "content": "The user discussed...",
+  "tokenCount": 579,
+  "episodeRange": "246-251"
+}
+```
+`content` is required. Either `sessionId` or `projectId` is required.
+
+**PATCH /summaries/:id — body:** any subset of `content`, `tokenCount`, `episodeRange`.
+
 ### Entities

 | Method | Path | Description |
@@ -305,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.

 **DELETE /relationships — body:**
 ```json
-{ "fromId": 1, "toId": 2, "label": "uses" }
+{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
 ```
+notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.

 Relationships are identified by the composite key `(fromId, toId, label)`.
 Delete uses request body rather than URL params since this three-part key
 is awkward to encode in a path.

+### Graph
+
+| Method | Path | Description |
+|---|---|---|
+| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
+| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
+
+**GET /graph/neighborhood/:entityId — query params:**
+
+| Param | Default | Max | Description |
+|---|---|---|---|
+| depth | 1 | 3 | Traversal depth |
+
+Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
+
+**POST /graph/neighbors — body:**
+```json
+{ "entityIds": [5, 8, 12] }
+Returns { nodes: [...], edges: [...] }. Used internally by orchestration — not a client-facing endpoint.
+
 ---

 ## Embedding Service — port 3003
--- a/docs/roadmap.md
+++ b/docs/roadmap.md
@@ -0,0 +1,228 @@
+# NexusAI — Master Roadmap
+
+> A modular, memory-centric AI assistant and personal second brain.  
+> Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.  
+> Repo: `https://gitea.jellystorm.com/storme/nexusAI`
+
+---
+
+## Current State (Completed)
+
+### Backend — Core Four Services
+- ✅ **Shared package** — `getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`)
+- ✅ **Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
+- ✅ **Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch`
+- ✅ **Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE)
+- ✅ **Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back
+
+### Memory System
+- ✅ Episodic memory — full conversation history in SQLite
+- ✅ Semantic memory — Qdrant vector search across episodes and entities
+- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
+- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates
+- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions
+
+### Chat Client
+- ✅ React/Vite frontend served via Caddy
+- ✅ Sidebar navigation — recent chats, projects, settings
+- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView
+- ✅ Session management — auto-naming, project assignment, SessionModal
+- ✅ Streaming chat interface — SSE token-by-token rendering
+- ✅ Memory viewer — episode browsing, deletion, health panel
+- ✅ Settings panel — models section, configuration
+
+### Infrastructure
+- ✅ Caddy reverse proxy with Authelia SSO
+- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
+- ✅ npm workspaces monorepo
+- ✅ Gitea self-hosted repo
+
+---
+
+## Phase 1 — Loose Ends & Stability - COMPLETE ✅
+*Target: Next development session (Saturday)*
+
+### Bug Fixes
+✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
+✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
+
+### Tech Debt
+✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
+✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
+✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
+✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
+
+---
+
+## Phase 2 — Memory System Upgrades
+*The core intelligence layer*
+
+### 1. Knowledge Graph (SQLite) ✅
+The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
+- [x] Graph schema — `nodes` and `edges` tables with typed relationships
+- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
+- [x] Relationship traversal queries
+- [x] Graph-aware context assembly in orchestration
+
+### 2. Retrieval Fusion + Full-Text Search ✅
+Multi-strategy retrieval merged into a single ranked result set.
+- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
+- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
+- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
+
+### 3. Memory Consolidation Lifecycle
+Prevents long-term memory degradation and enables compression.
+- [ ] Episode aging — score/weight episodes by recency and access frequency
+- [ ] Consolidation pass — merge related low-weight episodes into summary nodes
+- [ ] Orphan cleanup — remove entities no longer referenced by active episodes
+
+### 4. User Preference Model
+Automatically maintained profile injected into every system prompt.
+- [ ] Preference schema — communication style, interests, known facts, tone preferences
+- [ ] Auto-update from conversation history
+- [ ] Manual override / review UI
+
+### 5. Confidence-Based Routing *(inspired by acid2lake)*
+Short-circuit simple requests before they reach the LLM.
+- [ ] Intent classifier in orchestration — categorise incoming messages
+- [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
+- [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls
+
+### 6. Smarter Context Assembly *(inspired by acid2lake)*
+Budget-aware context selection instead of dumping all relevant memory into the prompt.
+- [ ] Token budget manager in orchestration
+- [ ] Priority scoring — recency × relevance × entity weight
+- [ ] Configurable context budget via env var
+
+### 7. Procedural Memory Store *(inspired by acid2lake)*
+Learns "how NexusAI has successfully handled this type of request before."
+- [ ] Procedural memory schema — trigger pattern, steps, success count, confidence
+- [ ] Auto-population from successful interaction traces
+- [ ] Procedural context injection for matched request types
+
+### 8. Reflection / Self-Summarization
+NexusAI periodically reviews and synthesises its own memory.
+- [ ] Scheduled reflection pass — background job, configurable interval
+- [ ] Cross-session insight extraction
+- [ ] Summary nodes written back to knowledge graph
+- *Requires: Knowledge graph + consolidation lifecycle*
+
+### 9. Proactive Agent Loop
+The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.
+- [ ] Tool calling framework in orchestration
+- [ ] Built-in tools — memory search, entity lookup, summarize, web fetch
+- [ ] Reasoning loop — think → act → observe → respond
+- [ ] Agent mode toggle per session
+- *Requires: All Phase 2 items above*
+
+---
+
+## Phase 3 — Client Features
+*Making the daily driver experience excellent*
+
+### Core Chat Enhancements
+- [ ] Message regeneration — re-roll last AI response
+- [ ] Edit & resend — edit a previous message, clear subsequent history
+- [ ] Copy message button — hover icon per message
+- [ ] Message timestamps — subtle, toggleable
+- [ ] Token count display — per-response usage indicator
+
+### Memory Visibility
+- [ ] **"What I remember" panel** — show which episodes/entities were injected into context
+- [ ] Memory pinning — mark episodes as always-include
+- [x] Session summary view — on-demand or auto-generated session summary
+- [ ] Memory attribution — subtle indicator on responses that were memory-informed
+
+### Session & Project Management
+- [ ] Session search — full-text search across all sessions
+- [ ] Session tagging — freeform tags beyond project assignment
+- [ ] Session export — download as markdown or JSON
+- [ ] Pinned sessions — pin frequently used sessions to sidebar top
+- [ ] Bulk session actions — delete, move to project
+
+### Model & Persona Controls *(high priority — circles back to companion origins)*
+- [ ] Per-session model switching — override default model per session
+- [x] System prompt editor — per-project custom prompts
+- [ ] System prompt editor — per-session custom prompts
+- [ ] Persona profiles — saved configurations (model + system prompt + temperature)
+  - Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
+- [ ] Temperature / parameter sliders — collapsible panel for power users
+
+### Second Brain Features
+- [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat
+- [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships
+- [ ] Memory search page — dedicated search UI across all episodes and entities
+- [ ] Daily digest — generated summary of recent activity and learned facts
+
+### Quality of Life
+- [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send
+- [ ] Dark/light theme toggle
+- [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs
+- [ ] Notification support — browser notifications for long completions
+
+---
+
+## Phase 4 — Coding Copilot
+*After core is feature-complete*
+
+### Project Directory Awareness
+- [ ] Directory watcher service — monitors a VS Code workspace for changes
+- [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
+- [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save
+- [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag
+
+### Coding-Specific Memory
+- [ ] Procedural patterns per language/framework — stored in procedural memory layer
+- [ ] Skill compilation — successful coding solutions abstracted into reusable patterns
+- [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent
+
+---
+
+## Phase 5 — Stretch Goals
+
+### Voice Layer
+- [ ] TTS output — text-to-speech for AI responses
+- [ ] STT input — speech-to-text for voice messages
+- [ ] Hardware-dependent — deferred until appropriate hardware available
+- *Architecturally clean addition — new input/output modality only*
+
+### Homelab Enhancements
+- [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data
+- [ ] Security hardening — network segmentation, service-level auth
+- [ ] IP webcam integration
+- [ ] Home Assistant integration
+
+---
+
+## Architecture Reference
+
+### Services & Nodes
+
+| Service | Host | Port | Role |
+|---|---|---|---|
+| Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` |
+| Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD |
+| Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation |
+| Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections |
+| Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management |
+| Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy |
+| Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO |
+
+### Primary Models
+
+| Role | Model | Notes |
+|---|---|---|
+| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical |
+| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | |
+| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | |
+| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization |
+
+### Key Design Principles
+- **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer
+- **Fire-and-forget async** — embedding and entity extraction never block the chat response
+- **All services read settings on every request** — no restart required for config changes
+- **Backend-first development** — data layer → endpoints → orchestration proxy → frontend
+
+---
+
+*Last updated: April 2026*
--- a/docs/services/entity-extraction.md
+++ b/docs/services/entity-extraction.md
@@ -0,0 +1,140 @@
+# Entity Extraction
+
+**Location:** `packages/memory-service/src/entities/extraction.js`  
+**Triggered by:** Episode creation (`POST /episodes`)  
+**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
+
+## Purpose
+
+After each episode is saved to SQLite, the extraction pipeline runs
+asynchronously in the background to identify named entities and the
+relationships between them. Results are written back to SQLite and
+embedded into Qdrant — the episode response is never delayed.
+
+## Trigger
+
+`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
+immediately after the SQLite insert, without awaiting it:
+
+```js
+extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
+  .catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
+```
+
+If extraction throws, the episode is unaffected — the error is logged and
+swallowed.
+
+## Model Settings
+
+| Setting | Value | Notes |
+|---|---|---|
+| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
+| Temperature | 0.1 | Low for consistent, deterministic output |
+| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
+| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
+| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
+
+## Prompt Structure
+
+The prompt is built by `buildExtractionPrompt()`. It includes:
+
+1. **System message** — declares the model's role as an entity and relationship extractor
+2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
+3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
+4. **Conversation** — the raw user message and AI response, delimited clearly
+
+```
+<|im_start|>system
+You are a named entity and relationship extractor. You output only valid JSON.
+<|im_end|>
+<|im_start|>user
+Read the conversation below and extract all named entities and the relationships between them.
+Entity types: person, place, project, technology, concept, organization
+...
+Return this exact JSON structure:
+{ "entities": [...], "relationships": [...] }
+
+Already known entities (use these exact name and type values if the same entity appears):
+- "NexusAI" (project)
+- "Alice" (person)
+
+--- CONVERSATION ---
+User: ...
+Assistant: ...
+--- END CONVERSATION ---
+<|im_end|>
+<|im_start|>assistant
+```
+
+## Expected JSON Output
+
+```json
+{
+  "entities": [
+    { "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
+    { "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
+  ],
+  "relationships": [
+    {
+      "from": "Alice", "fromType": "person",
+      "to": "NexusAI", "toType": "project",
+      "label": "works_on",
+      "notes": "Alice is the primary developer."
+    }
+  ]
+}
+```
+
+Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
+`knows`, `located_in`, `part_of`, `created_by`).
+
+## JSON Parsing
+
+The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
+tolerates any preamble or trailing prose the model emits alongside the JSON.
+If the match fails or `JSON.parse` throws, the function logs a warning and
+returns without writing anything.
+
+## Entity Processing
+
+For each entity in `parsed.entities`:
+
+1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
+2. Call `upsertEntity(name, type, notes)`:
+   - **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
+   - **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
+3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
+4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
+5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
+
+**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
+
+**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
+
+## Relationship Processing
+
+After all entities are saved, relationships are processed:
+
+1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
+2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
+3. Call `upsertRelationship(fromId, toId, label, notes)`:
+   - **Insert**: creates new row with `mention_count = 1`
+   - **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
+
+Relationships are unidirectional in storage. Bidirectionality is handled at
+query time by the graph traversal layer.
+
+## Project Scoping
+
+`projectId` is threaded through from the episode creation call. It is stored
+in the Qdrant entity payload, which enables project-scoped entity search in
+orchestration. SQLite entities and relationships are global — scoping only
+applies at the Qdrant retrieval layer.
+
+## Error Behaviour
+
+All steps after the initial model call are wrapped in a single outer try/catch.
+If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
+parsed, the function logs at `warn` level and returns. There is no retry logic.
+Individual entity embedding failures are caught per-entity and logged at `warn`
+level without affecting other entities in the same batch.
--- a/docs/services/knowledge-graph.md
+++ b/docs/services/knowledge-graph.md
@@ -0,0 +1,213 @@
+# Knowledge Graph
+
+**Location:** `packages/memory-service/src/graph/index.js`  
+**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`  
+**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`  
+**Consumed by:** Orchestration service context assembly
+
+## Purpose
+
+The knowledge graph transforms NexusAI from "remembers conversations" to
+"understands relationships between things." Rather than injecting a flat
+list of entity facts into every prompt, orchestration now retrieves a
+1-hop subgraph of connected entities and their relationships, giving the
+model structured, linked knowledge about people, projects, technologies,
+and concepts that have appeared across conversations.
+
+## Schema
+
+### `entity_episodes` (join table)
+
+Tracks which episodes contributed to each entity's knowledge. Defined in
+`schema.js` — exists on all installs.
+
+```sql
+CREATE TABLE IF NOT EXISTS entity_episodes (
+  entity_id  INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+  episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
+  PRIMARY KEY (entity_id, episode_id)
+);
+```
+
+Both FKs cascade on delete — removing an entity or episode automatically
+cleans up its join rows.
+
+### New columns on `entities`
+
+Added via migration in `db/index.js`:
+
+| Column | Type | Default | Description |
+|---|---|---|---|
+| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
+| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
+| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
+| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
+
+### New columns on `relationships`
+
+| Column | Type | Default | Description |
+|---|---|---|---|
+| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
+| `notes` | TEXT | NULL | Relationship context sentence from extraction |
+
+## Entity Promotion Model
+
+Entities are not created equal — some are mentioned once in passing, others
+recur across many conversations. `mention_count` is the signal:
+
+- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
+- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
+- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
+
+The same pattern applies to relationships — `mention_count` rises each time
+the same `(from_id, to_id, label)` triple is extracted.
+
+## Graph Traversal
+
+`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
+CTE support. No external graph database is needed.
+
+### `getNeighborhood(entityId, depth)`
+
+Traverses the graph from a single entity, following edges in **both directions**,
+up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
+
+Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
+
+**SQLite query:**
+
+```sql
+WITH RECURSIVE traverse(entity_id, depth) AS (
+    SELECT ?, 0
+    UNION
+    SELECT
+        CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
+        t.depth + 1
+    FROM relationships r
+    JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
+    WHERE t.depth < ?
+)
+SELECT DISTINCT entity_id FROM traverse
+```
+
+`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
+cycles — a node already in the traversal set is not re-visited.
+
+After collecting node IDs, two follow-up queries fetch:
+- All entity rows for those IDs
+- All relationship rows where both `from_id` and `to_id` are in the node set
+
+This ensures edges between neighbors are included even if they aren't on the
+traversal path from the seed.
+
+### `getEntityNeighbors(entityIds[])`
+
+Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
+(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
+
+1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
+2. Deduplicates seeds + neighbors using a JavaScript `Set`
+3. Fetches all entity rows and all relationship rows within the combined node set
+
+This is intentionally simpler than the recursive version — orchestration always
+uses depth=1, and the bulk query avoids N separate CTE calls.
+
+## Graph-Aware Context Assembly
+
+Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
+graph at step 7 of the chat pipeline:
+
+1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
+2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
+3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
+4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
+
+### Prompt Format
+
+`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
+
+```
+Here is what you know about entities relevant to this conversation and their connections:
+- Alice (person): software engineer working on NexusAI
+  → works_on NexusAI (project)
+  → knows Bob (person)
+- NexusAI (project): AI assistant framework
+- Bob (person): Alice's colleague
+```
+
+- One line per node: `- {name} ({type}): {notes}`
+- Outbound edges indented below: `  → {label} {target_name} ({target_type})`
+- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
+- Only outbound edges are shown — each relationship appears once, from the `from_id` side
+
+## Project Scoping
+
+The knowledge graph respects project boundaries at the **entry point**, not
+during traversal:
+
+- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
+- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
+- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
+
+Entities are tagged with `projectId` in the Qdrant payload at extraction time.
+Entities extracted from non-project sessions have `projectId: null` and only
+appear in unfiltered global searches.
+
+## API Reference
+
+### `GET /graph/neighborhood/:entityId`
+
+Returns the neighborhood of a single entity.
+
+**Query params:**
+
+| Param | Default | Max | Description |
+|---|---|---|---|
+| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
+
+**Response:**
+```json
+{
+  "entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
+  "neighborhood": {
+    "nodes": [
+      { "id": 5, "name": "Alice", "type": "person", "notes": "..." },
+      { "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
+    ],
+    "edges": [
+      { "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
+    ]
+  }
+}
+```
+
+Returns 404 if the entity does not exist.
+
+### `POST /graph/neighbors`
+
+Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
+orchestration — not intended for direct client use.
+
+**Request body:**
+```json
+{ "entityIds": [5, 8, 12] }
+```
+
+**Response:**
+```json
+{
+  "nodes": [ ...entity objects... ],
+  "edges": [ ...relationship objects... ]
+}
+```
+
+Returns 400 if `entityIds` is missing or empty.
+
+## Constants (`packages/shared/src/config/constants.js`)
+
+| Constant | Value | Description |
+|---|---|---|
+| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
+| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
+| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
+| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -9,8 +9,8 @@

 Responsible for all reading and writing of long-term memory. Acts as the
 sole interface to both SQLite and Qdrant — no other service accesses these
-stores directly. On episode creation, automatically calls the embedding
-service to generate and store a vector in Qdrant.
+stores directly. On episode creation, automatically triggers entity and
+relationship extraction and embeds results into Qdrant.

 ## Dependencies

@@ -38,25 +38,29 @@ src/
 ├── db/
 │   ├── index.js       # SQLite connection + initialization + migrations
 │   ├── schema.js      # Table definitions, indexes, FTS5, triggers
-│   └── projects.js    # Project CRUD functions
+│   ├── projects.js    # Project CRUD functions
+│   └── summaries.js   # Summary CRUD functions
 ├── episodic/
 │   └── index.js       # Session + episode CRUD, FTS search, embedding write path
 ├── semantic/
 │   └── index.js       # Qdrant collection management, upsert, search, delete
 ├── entities/
-│   ├── index.js       # Entity + relationship CRUD
-│   └── extraction.js  # Automatic entity extraction via qwen2.5:3b on Ollama
+│   ├── index.js       # Entity + relationship CRUD (upsert, mention tracking)
+│   └── extraction.js  # Automatic entity + relationship extraction via qwen2.5:3b
+├── graph/
+│   └── index.js       # Knowledge graph traversal (neighborhood queries, recursive CTE)
 └── index.js           # Express app + all route definitions
 ```

 ## SQLite Schema

-Six core tables:
+Eight core tables:

 - **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
 - **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
+- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
+- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
+- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
 - **summaries** — condensed episode groups for efficient context retrieval
 - **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`

@@ -72,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
 try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
 try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
 try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
+// Knowledge graph columns:
+try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
+try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
+try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
 ```

-New migrations are always appended here — never modify the schema file for
-existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
+`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
+
+New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.

 ### FTS5 Full-Text Search

@@ -100,12 +112,9 @@ that weren't touched.
 const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
 ```

-This means saving just `{ notes: "..." }` or `{ system_prompt: "..." }` won't
-touch any other field.
-
 ## Qdrant / Semantic Layer

-Three Qdrant collections are initialized on service startup:
+Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:

 | Collection | Purpose |
 |---|---|
@@ -117,9 +126,12 @@ All collections use **768-dimension vectors** with **Cosine similarity**,
 matching `nomic-embed-text` via Ollama. Vector size and distance metric are
 defined in `@nexusai/shared` — not hardcoded here.

-Each collection exposes three operations in `src/semantic/index.js`:
-upsert, search (with optional Qdrant filter), and delete. The `wait: true`
-flag is used on all writes.
+`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
+collection that doesn't already exist at startup — all three collections are
+guaranteed to exist before any requests are handled.
+
+Each collection exposes upsert, search (with optional Qdrant filter), and
+delete operations. The `wait: true` flag is used on all writes.

 ## Embedding Write Path

@@ -133,8 +145,7 @@ When a new episode is created:
 This step is **fire-and-forget** — if embedding fails, the episode is still
 saved and searchable via FTS. The error is logged but not surfaced.

-> The Qdrant payload stores `sessionId` (the internal integer ID). This is
-> used for per-session and per-project filtering during semantic search. See
+> The Qdrant payload stores `sessionId` (the internal integer ID). See
 > `memory-isolation.md` for how project-level filtering works.

 ## Entity Layer
@@ -142,38 +153,36 @@ saved and searchable via FTS. The error is logged but not surfaced.
 Entities and relationships use upsert semantics with composite unique
 constraints to prevent duplicates:

- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
+- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
+- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
 - `ON DELETE CASCADE` on relationship foreign keys

-### Automatic Entity Extraction
-
 After each episode is saved, `extraction.js` automatically extracts named
-entities from the conversation using `qwen2.5:3b` running on Ollama (Mini PC 1).
-This runs **fire-and-forget** — the episode is already saved and returned
-before extraction begins.
+entities **and relationships** from the conversation using `qwen2.5:3b` on
+Ollama — fire-and-forget. Each saved entity is also linked to the episode
+via the `entity_episodes` join table.

-**Entity types extracted:** `person`, `place`, `project`, `technology`,
-`concept`, `organization`
+> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.  
+> For the knowledge graph traversal layer, see `knowledge-graph.md`.

-The extraction prompt uses ChatML format (native to qwen2.5) and primes the
-response by ending with `[` to steer the model directly into JSON array output.
-A list of already-known entities is injected into the prompt so the model
-reuses existing `(name, type)` pairs rather than creating duplicates with
-different types.
+## Knowledge Graph Layer

-After extraction, each entity is:
-1. Upserted into SQLite via `upsertEntity` — notes are only written if
-   the entity is new (`COALESCE(entities.notes, excluded.notes)` prevents
-   overwriting existing notes with speculative updates)
-2. Embedded via the embedding service and upserted into the `entities`
-   Qdrant collection with `{ name, type, notes, projectId }` as payload —
-   `projectId` scopes entities to their project for isolated retrieval
+`src/graph/index.js` provides SQLite-based graph traversal over the entities
+and relationships tables. Two functions are exposed via HTTP:

-`extractAndStoreEntities` receives `projectId` from `createEpisode`, which
-receives it from the episode route, which receives it from orchestration's
-`createEpisode` call. This ensures entities are tagged with the correct
-project scope at extraction time.
+- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
+- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
+
+> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.
+
+## Summaries Layer
+
+Session summaries are generated by `orchestration-service/src/services/summarization.js`
+after each episode write and stored here via `POST /summaries`. The memory
+service is responsible only for CRUD — generation logic lives in orchestration.
+
+> For full details on trigger conditions, prompt format, cumulative updates,
+> and ChatML token stripping, see `summarization.md`.

 ## Project Delete Behaviour

--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -30,7 +30,8 @@ or inference services — all traffic flows through orchestration.
 | LLAMA_SERVER_URL | No | http://localhost:8080 | Direct llama-server URL for /models/props |
 | QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
 | CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
-| MODELS_MANIFEST_PATH | No | — | Legacy — superseded by `modelsFolderPath` in settings.json |
+| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for summarisation |
+| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for summarisation |

 ## Internal Structure

@@ -40,20 +41,22 @@ src/
 │   ├── memory.js         # HTTP client for memory service
 │   ├── inference.js      # HTTP client for inference service
 │   ├── embedding.js      # HTTP client for embedding service
-│   └── qdrant.js      # HTTP client for Qdrant (direct vector search)
+│   ├── qdrant.js         # HTTP client for Qdrant (direct vector search)
+│   ├── graph.js          # HTTP client for memory-service graph endpoints
+│   └── summarization.js  # Session summarisation — triggers after each episode
 ├── chat/
-│   └── index.js       # Core pipeline — context assembly, isolation, auto-naming
+│   └── index.js          # Core pipeline — context assembly, graph expansion, auto-naming
 ├── config/
 │   └── settings.js       # Settings load/save — reads/writes data/settings.json
 ├── routes/
 │   ├── chat.js           # POST /chat and POST /chat/stream
 │   ├── sessions.js       # Session CRUD proxy
-│   ├── projects.js    # Project CRUD proxy — passes req.body straight through
+│   ├── projects.js       # Project CRUD proxy
 │   ├── episodes.js       # Episode list and delete proxy
+│   ├── summaries.js      # GET /summaries/session/:id and /summaries/project/:id
 │   ├── settings.js       # GET /settings and PATCH /settings
-│   ├── health.js      # GET /health — pings all four services
-│   └── models.js      # GET /models — scans .gguf files live, merges with models.json
-                       # GET /models/props — context window + loaded model from llama-server
+│   ├── health.js         # GET /health/services — pings all four services
+│   └── models.js         # GET /models and GET /models/props
 └── index.js              # Express app entry point
 ```

@@ -69,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
 |---|---|---|
 | `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
 | `semanticLimit` | 5 | Semantic search results injected into prompt |
-| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
+| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
+| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
+| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
 | `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
 | `temperature` | 0.7 | Inference temperature |
 | `repeatPenalty` | 1.1 | Repeat token penalty |
@@ -77,9 +82,6 @@ via `appSettings.load()` — changes apply immediately without a service restart
 | `topK` | 40 | Top-K token candidates per step |
 | `systemPrompt` | *(ORCHESTRATION.SYSTEM_PROMPT)* | Global system prompt. `null` reverts to hardcoded constant. |

-Defaults are defined in `config/settings.js` and fall back to constants in
-`@nexusai/shared`. Values saved in `settings.json` take precedence.
-
 ## Chat Pipeline

 Both `POST /chat` and `POST /chat/stream` share the same steps. The only
@@ -88,70 +90,86 @@ difference is how the inference response is delivered to the client.
 ### Steps

 1. **Session resolution** — look up session by `externalId`. Auto-create if
-   not found. Clients generate a UUID for new conversations — no pre-creation
-   step needed.
+   not found.

 2. **Project context resolution** — if the session has a `project_id`, fetch
   the project and all its session IDs. Used to scope semantic search. The
   project's `system_prompt` is also read at this step if set.

 3. **System prompt resolution** — three-tier hierarchy:
-   - `project.system_prompt` — if the session is in a project and it's set (highest priority)
+   - `project.system_prompt` — highest priority
   - `settings.systemPrompt` — global setting from `settings.json`
-   - `ORCHESTRATION.SYSTEM_PROMPT` — hardcoded constant in `@nexusai/shared` (last resort)
+   - `ORCHESTRATION.SYSTEM_PROMPT` — hardcoded constant (last resort)

-4. **Recent episode retrieval** — fetch the most recent episodes for the
-   session (`recentEpisodeLimit`, default 5).
+4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).

-5. **Semantic search** — embed the user message, query Qdrant for the top
-   most similar past episodes (`semanticLimit`, `scoreThreshold`). Deduplicated
-   against recent episodes. Non-critical — if it fails, pipeline continues with
-   recency-only context.
+5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
+   search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
+   Both paths are filtered against `recentIds` before fusion. FTS is scoped
+   to the current session or all project sessions. If `keywordWeight` is `0`,
+   the FTS call is skipped entirely. Non-critical — failures fall back to
+   whichever strategy succeeded.

-6. **Entity search** — query the `entities` Qdrant collection filtered by
-   `projectId`. Non-project sessions receive no entity context. Non-critical.
+6. **Entity search** — query `entities` Qdrant collection filtered by
+   `projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
+   point ID equals the SQLite entity ID). Non-critical.

-7. **Prompt assembly** — combine resolved system prompt, entity context,
-   semantic episodes, recent episodes, and user message.
+7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
+   memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
+   `{ nodes, edges }` — entity objects plus the relationships connecting them.
+   If no entities were found or the graph call fails, falls back to flat entity
+   list (no edges). Non-critical.

-8. **Inference** — send to inference service with settings-derived parameters
-   (temperature, topP, topK, repeatPenalty). `/chat` awaits full response;
+8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
+   recent episodes, and user message.
+
+9. **Inference** — send to inference service. `/chat` awaits full response;
   `/chat/stream` pipes SSE chunks to the client.

-9. **Episode write** — write the exchange back to memory with `projectId`.
-   Fire-and-forget for `/chat`; awaited for `/chat/stream`.
+10. **Episode write** — write exchange back to memory with `projectId`.

-10. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
-    inference call with a naming prompt (max 20 tokens, temperature 0.3) and
-    write the result back as `session.name`. Fully fire-and-forget.
+11. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
+    fire-and-forget. See `summarization.md` for full details.
+
+12. **Auto-naming** — on first message with no session name, fires a secondary
+    inference call (max 20 tokens, temperature 0.3) to generate a session name.

 ### Prompt Structure

 ```
 [Resolved system prompt]

-Here is what you know about entities relevant to this conversation:
+Here is what you know about entities relevant to this conversation and their connections:
 - {name} ({type}): {notes}
-... (up to 5 entity results)
+  → {label} {neighbor_name} ({neighbor_type})
 ---
 Here are some relevant memories from earlier conversations:
 User: {past user message}
 Assistant: {past ai response}
-... (up to semanticLimit semantic episodes)
 ---
 Here are some relevant memories from your past conversations:
 User: {past user message}
 Assistant: {past ai response}
-... (up to recentEpisodeLimit recent episodes)
 --- End of recent memories ---

 User: {current message}
 Assistant:
 ```

-Entity context appears first — before episodic memory — because structured
-facts about known entities are the most stable and reliable context. Semantic
-episodes follow, then recent episodes as the immediate conversation flow.
+The entity block renders the full graph neighborhood — seed entities matched
+by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
+shows its `notes` and any outbound relationships with their targets. Neighbor
+nodes that have no outbound edges within the subgraph appear without connection
+lines.
+
+## Summarisation
+
+After each episode write, `triggerSummary` is called fire-and-forget. It
+checks token thresholds and episode counts before generating, then stores
+the result in the memory service.
+
+> For full details on trigger conditions, prompt format, cumulative updates,
+> ChatML token stripping, and episode range tracking, see `summarization.md`.

 ## SSE Stream Format

@@ -168,37 +186,26 @@ data: {"text":"Hello"}
 data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
 ```

-The `[DONE]` sentinel is consumed internally and not forwarded. The stream
-is terminated by `res.end()` after the done event.
+The `[DONE]` sentinel is consumed internally and not forwarded.

 ## Models Route

-`GET /models` scans `.gguf` files live on each request from `modelsFolderPath`
-(read from settings). Merges results with a `models.json` file in the same
-folder for richer metadata (label, description). Returns file size in GB.
+`GET /models` scans `.gguf` files live from `modelsFolderPath` and merges
+with `models.json` for metadata. Returns file size in GB.

-`GET /models/props` fetches directly from llama-server via `LLAMA_SERVER_URL`.
-Returns `{ contextWindow, modelAlias }`. `n_ctx` is at
-`data.default_generation_settings.n_ctx` in the llama-server response.
-Returns `503` if llama-server is unreachable.
+`GET /models/props` fetches directly from llama-server. Returns
+`{ contextWindow, modelAlias }`. Returns `503` if unreachable.

 ## Sessions Route Behaviour

-`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
-The validation guard only rejects requests where neither is provided:
-
-```js
-if (!name?.trim() && projectId === undefined) {
-  return res.status(400).json({ error: 'name or projectId is required' });
-}
-```
-
-This allows `useChat` to write project assignment separately from rename
-operations.
+`PATCH /sessions/:sessionId` accepts `name`, `projectId`, or both.
+Rejects only when neither is provided — allows `useChat` to write project
+assignment separately from rename operations.

 ## Caddy Configuration

-Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
+Each route prefix needs a handle block in the Caddyfile on Mini PC 2.
+**Any new top-level route must be added here AND in `vite.config.js`.**

 ```
 handle /chat*      { reverse_proxy localhost:4000 }
@@ -207,9 +214,13 @@ handle /models*   { reverse_proxy localhost:4000 }
 handle /projects*  { reverse_proxy localhost:4000 }
 handle /episodes*  { reverse_proxy localhost:4000 }
 handle /settings*  { reverse_proxy localhost:4000 }
+handle /summaries* { reverse_proxy localhost:4000 }
 handle /health*    { reverse_proxy localhost:4000 }
 ```

 After updating: `caddy reload --config /path/to/Caddyfile`

+> Note: `/graph` routes are on the memory-service (port 3002) and are called
+> internally by orchestration — they do not need a Caddy entry.
+
 For all HTTP endpoints, see `api-routes.md`.
--- a/docs/services/retrieval-fusion.md
+++ b/docs/services/retrieval-fusion.md
@@ -0,0 +1,153 @@
+# Retrieval Fusion
+
+**Implementation:** `packages/orchestration-service/src/chat/index.js`  
+**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`  
+**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
+
+## Purpose
+
+Rather than relying solely on Qdrant vector similarity (which finds semantically
+related content but misses exact keyword matches) or FTS5 keyword search alone
+(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
+merges the ranked results from both strategies into a single better-ranked list.
+
+Episodes that rank highly in **both** lists score highest. An episode that is
+the top semantic match but irrelevant by keyword, or vice versa, scores lower
+than one that satisfies both.
+
+## How RRF Works
+
+For each episode `d`, its fused score is:
+
+```
+RRF(d) = w_semantic / (k + rank_semantic(d))
+        + w_keyword  / (k + rank_keyword(d))
+```
+
+- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
+- `k = 60` — smoothing constant (standard; not exposed in settings)
+- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
+
+Setting a weight to `0` removes that strategy's contribution entirely. Setting
+`keywordWeight` to `0` also short-circuits the FTS network call.
+
+## Architecture
+
+Fusion lives in orchestration — the service already coordinates multiple data
+sources, and fusion is a retrieval strategy, not a storage concern.
+
+```
+getFusedEpisodes()
+├── getSemanticEpisodes()     — Qdrant embed+search → fetch full rows by ID
+│   (existing path, unchanged)
+└── getFTSResults()           — memory-service /episodes/search → full rows directly
+    (skipped entirely if keywordWeight == 0)
+         ↓
+fuseEpisodeResults()          — pure RRF, no I/O
+         ↓
+fusedEpisodes[]               — top semanticLimit episodes by RRF score
+```
+
+### Data Shape Consistency
+
+Both sides must enter fusion as `Episode[]` — full SQLite row objects with
+the same shape — and both must be filtered against `recentIds` first:
+
+- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
+- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
+
+FTS requests `semanticLimit * 2` results to provide headroom for the
+`recentIds` filter without under-serving the fusion.
+
+## FTS Session Scoping
+
+Without scoping, FTS5 searches across all episodes in the database. For
+context assembly, results must be constrained to the current session or
+project session pool — the same scope used for Qdrant semantic search.
+
+`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
+optional `sessionIds` array. When provided, the SQL becomes:
+
+```sql
+SELECT e.* FROM episodes e
+JOIN episodes_fts fts ON e.id = fts.rowid
+WHERE episodes_fts MATCH ?
+AND e.session_id IN (?, ?, ...)
+ORDER BY rank
+LIMIT ?
+```
+
+The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
+comma-separated query param: `?q=hello&sessionIds=1,2,3`.
+
+In orchestration, `ftsSessionIds` is set to:
+- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
+- `[session.id]` — otherwise (single session only)
+
+This mirrors the Qdrant scoping logic exactly.
+
+## `fuseEpisodeResults` — Implementation Detail
+
+```js
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K; // 60
+    const scores = new Map();  // episode.id → { episode, score }
+
+    // Score semantic results (already filtered against recentIds)
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+
+    // Score + merge keyword results (already filtered against recentIds)
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;   // appears in both — sum scores
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });  // FTS-only episode
+        }
+        // contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
+    });
+
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit)
+        .map(({ episode }) => episode);
+}
+```
+
+The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
+the result set with a score of 0 when `keywordWeight` is 0 — verified by
+the test suite.
+
+## Settings
+
+| Setting | Default | Range | Description |
+|---|---|---|---|
+| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
+| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
+
+Both are readable via `GET /settings` and writable via `PATCH /settings`
+without a service restart. Changes take effect on the next chat request.
+
+**To enable keyword search:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"keywordWeight": 1.0}'
+```
+
+**To favour keyword matches over semantic:**
+```bash
+curl -X PATCH http://localhost:4000/settings \
+  -H "Content-Type: application/json" \
+  -d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
+```
+
+## Constants (`packages/shared/src/config/constants.js`)
+
+| Constant | Value | Description |
+|---|---|---|
+| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
+| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
+| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
--- a/docs/services/shared.md
+++ b/docs/services/shared.md
@@ -165,10 +165,16 @@ Orchestration pipeline defaults. Used as fallback values in
 | `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
 | `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
 | `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
+| `ENTITIES_LIMIT` | `5` | Max entity search results to inject into prompt |
+| `ENTITIES_THRESHOLD` | `0.55` | Minimum similarity score for entity results |
 | `TEMPERATURE` | `0.7` | Default inference temperature |
 | `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
 | `SYSTEM_PROMPT` | *(see below)* | Default system prompt |

+> `ENTITIES_THRESHOLD` is set to `0.55` — lower than `SCORE_THRESHOLD` because
+> entity notes generated by a 3B model tend to embed with lower cosine similarity
+> than full episode text. Tune upward if irrelevant entities appear in context.
+
 > `repeatPenalty`, `topP`, and `topK` defaults are sourced from
 > `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
 > since those constants already define the canonical values.
@@ -178,6 +184,25 @@ Default system prompt:
 > of past conversations with the user. Use them to provide consistent,
 > personalised responses."

+#### `SUMMARIES`
+
+Controls the automatic session summarisation system in `orchestration-service/src/services/summarization.js`.
+
+| Key | Value | Description |
+|---|---|---|
+| `THRESHOLD_TOKENS` | `200` | Minimum total session tokens before summarisation is considered |
+| `MAX_SUMMARY_TOKENS` | `800` | If existing summary exceeds this length (chars), create a new row instead of updating |
+| `MIN_EPISODES_SINCE` | `5` | Minimum new episodes since last summary before re-summarising |
+
+These can be overridden per-deployment via environment variables in the
+orchestration service `.env`:
+
+```
+SUMMARY_THRESHOLD_TOKENS=200
+SUMMARY_MAX_TOKENS=800
+SUMMARY_MIN_EPISODES=5
+```
+
 #### `SQLITE`

 | Key | Value | Description |
--- a/docs/services/summarization.md
+++ b/docs/services/summarization.md
@@ -0,0 +1,201 @@
+# Summarization
+
+Session summarization generates rolling plain-text summaries of conversation
+history, giving the model a condensed view of past context without consuming
+the full context window with raw episodes.
+
+**Location:** `packages/orchestration-service/src/services/summarization.js`  
+**Triggered by:** `chat/index.js` after every episode write (fire-and-forget)  
+**Model:** `qwen2.5:3b` via Ollama on Mini PC 1 (192.168.0.81)
+
+---
+
+## Trigger Conditions
+
+`triggerSummary(session, allEpisodes)` calls `maybeSummarize` fire-and-forget.
+`maybeSummarize` proceeds only when both conditions are met:
+
+1. Total session token count exceeds `SUMMARIES.THRESHOLD_TOKENS` (default 200)
+2. At least `SUMMARIES.MIN_EPISODES_SINCE` (default 5) new episodes have
+   accumulated since the last summary
+
+The token threshold is intentionally low — it ensures summaries start
+generating early in a session's life rather than only after very long
+conversations.
+
+---
+
+## Summary Rows and Cumulative Updates
+
+Each session can have multiple summary rows in the `summaries` table.
+The update strategy depends on the size of the most recent summary:
+
+| Condition | Action |
+|---|---|
+| No existing summary | Generate fresh summary from all episodes |
+| Latest summary under `MAX_SUMMARY_TOKENS` | Update: summarise new episodes with existing summary as context |
+| Latest summary over `MAX_SUMMARY_TOKENS` | Create new row: treat as fresh summarisation |
+
+This produces a chain of summary rows over time. Each row's `episode_range`
+covers only the episodes summarised in that specific pass (e.g. `259-263`),
+not all episodes in the session.
+
+---
+
+## Ollama Request
+
+```js
+{
+    model: EXTRACTION_MODEL,   // qwen2.5:3b (set via EXTRACTION_MODEL env var)
+    prompt: buildSummaryPrompt(episodesToSummarize, existingSummary),
+    stream: false,
+    // No format: 'json' — free-text output required for summaries
+    options: {
+        temperature: 0.2,
+        num_predict: 500,
+    },
+}
+```
+
+`temperature: 0.2` is slightly higher than extraction (0.1) — summaries
+benefit from some fluency. `num_predict: 500` gives room for 5 thorough
+sentences without risk of runoff.
+
+---
+
+## Prompt Format
+
+ChatML format — native to qwen2.5:
+
+```
+<|im_start|>user
+Summarize the conversation below in 3-5 sentences.
+Write in third person. Do not quote directly — paraphrase only.
+Do not include greetings, sign-offs, or filler. Output only the summary text.
+
+Conversation:
+{context}
+<|im_end|>
+<|im_start|>assistant
+```
+
+For cumulative updates, the instruction and context change:
+
+```
+<|im_start|>user
+Update the summary below to incorporate the new exchanges.
+Write 3-5 sentences in third person. Do not quote directly — paraphrase only.
+Do not include greetings, sign-offs, or filler. Output only the updated summary text.
+
+Previous summary:
+{existingSummary}
+
+New exchanges:
+{context}
+<|im_end|>
+<|im_start|>assistant
+```
+
+### Input truncation
+
+Episode context is truncated to `MAX_CHARS = 3000` characters, keeping the
+most recent exchanges (sliced from the end). This keeps Qwen focused and
+prevents the prompt from exceeding its effective context window.
+
+---
+
+## ChatML Token Stripping
+
+Qwen occasionally echoes ChatML tokens back into its response. The raw output
+is cleaned before saving:
+
+```js
+const raw = data.response?.trim() ?? '';
+const content = raw
+    .replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
+    .replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
+    .trim();
+return content;
+```
+
+Without this, leaked tokens get stored in the summary and then injected
+back into the next summarisation prompt — causing the model to append a new
+summary after the old one rather than replacing it.
+
+---
+
+## Episode Range Tracking
+
+Each summary row stores `episode_range` as `"firstId-lastId"` covering only
+the episodes summarised in that pass:
+
+```js
+const summarizedIds = episodesToSummarize.map(ep => ep.id).sort((a,b) => a - b);
+const episodeRange = `${summarizedIds.at(0)}-${summarizedIds.at(-1)}`;
+```
+
+This makes SummaryView cards meaningful — "Episodes 259-263" tells you
+exactly which exchanges that summary covers, rather than always showing
+the full session range.
+
+---
+
+## Summary Storage
+
+Summaries are written directly to the memory service from orchestration:
+
+```js
+// Create new row
+await fetch(`${MEMORY_URL}/summaries`, {
+    method: 'POST',
+    body: JSON.stringify({ sessionId: session.id, content, tokenCount, episodeRange }),
+});
+
+// Update existing row
+await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
+    method: 'PATCH',
+    body: JSON.stringify({ content, tokenCount, episodeRange }),
+});
+```
+
+`session.id` here is the internal SQLite integer ID — not the external UUID.
+It is available directly on the `session` object passed from `chat/index.js`.
+
+---
+
+## Client-Side Indicator
+
+The chat client shows a "Summarising…" spinner in the `ChatWindow` header
+and on the InfoPanel's Session Memory button while summarisation may be
+in progress.
+
+Since summarisation is fire-and-forget with no completion signal back to
+the client, the indicator is timer-based: it activates when the stream
+finishes and clears after 8 seconds.
+
+```js
+// In App.jsx, watching the streaming state from useChat:
+useEffect(() => {
+    if (prevStreaming.current && !streaming) {
+        setSummarising(true);
+        const t = setTimeout(() => setSummarising(false), 8000);
+        return () => clearTimeout(t);
+    }
+    prevStreaming.current = streaming;
+}, [streaming]);
+```
+
+---
+
+## Environment Variables
+
+Set in `packages/orchestration-service/src/.env`:
+
+| Variable | Default | Description |
+|---|---|---|
+| `EXTRACTION_URL` | `http://localhost:11434` | Ollama instance URL |
+| `EXTRACTION_MODEL` | `qwen2.5:3b` | Model for summarisation |
+| `MEMORY_SERVICE_URL` | `http://localhost:3002` | Memory service URL |
+| `SUMMARY_THRESHOLD_TOKENS` | `200` | Token threshold before summarisation triggers |
+| `SUMMARY_MAX_TOKENS` | `800` | Max summary length before a new row is created |
+| `SUMMARY_MIN_EPISODES` | `5` | Min new episodes since last summary before re-summarising |s
--- a/package-lock.json
+++ b/package-lock.json
@@ -4224,8 +4224,7 @@
      "dependencies": {
        "@nexusai/shared": "^1.0.0",
        "dotenv": "^17.4.0",
-        "express": "^5.2.1",
-        "ollama": "^0.6.3"
+        "express": "^5.2.1"
      }
    },
    "packages/inference-service": {
--- a/packages/chat-client/src/App.jsx
+++ b/packages/chat-client/src/App.jsx
@@ -12,6 +12,7 @@ import AllProjectsView from './components/AllProjectsView';
 import SettingsView from './components/SettingsView';
 import ProjectView from './components/ProjectView';
 import MemoryView from './components/MemoryView';
+import SummaryView from './components/SummaryView';

 /**** useHooks **** */
 import { useSession } from './hooks/useSession';
@@ -27,6 +28,7 @@ const BACK_MAP = {
  'settings':     'home',
  'project':      'all-projects',
  'memory':       'settings',
+  'summaries':    'chat',   
 };

 export default function App() {
@@ -63,6 +65,7 @@ export default function App() {
    streaming,
    lastTokenCount,
    lastModel,
+    summarising,
  } = useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions });

  function navigate(nextView) {
@@ -159,6 +162,7 @@ export default function App() {
          onBack={goBack}
          canGoBack={canGoBack}
          loadedModel={modelProps?.modelAlias ?? null}
+          summarising={summarising}
        />
      )}

@@ -205,6 +209,13 @@ export default function App() {
        />
      )}

+      {view === 'summaries' && (
+        <SummaryView
+          activeSession={activeSession}
+          onBack={goBack}
+        />
+      )}
+
      <InfoPanel
        isOpen={rightOpen}
        onToggle={() => setRightOpen(o => !o)}
@@ -214,6 +225,8 @@ export default function App() {
        onModelChange={setSelectedModel}
        lastModel={lastModel}
        lastTokenCount={lastTokenCount}
+        summarising={summarising}
+        onViewSummary={() => navigate('summaries')}
      />
    </div>
  );
--- a/packages/chat-client/src/api/orchestration.js
+++ b/packages/chat-client/src/api/orchestration.js
@@ -1,5 +1,6 @@
 import { API_DEFAULTS } from "../config/constants";

+
 const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';

 // ── Sessions ────────────────────────────────────────────────
@@ -205,3 +206,21 @@ export async function getModelProps() {
  if (!res.ok) throw new Error('Failed to fetch model props');
  return res.json();
 }
+
+export async function fetchSessionSummaries(sessionId) {
+  const res = await fetch(`${BASE_URL}/summaries/session/${sessionId}`);
+  if (!res.ok) throw new Error(`Failed to fetch summaries: ${res.status}`);
+  return res.json();
+}
+
+export async function generateProjectSummary(projectId) {
+    const res = await fetch(`${BASE_URL}/summaries/project/${projectId}/generate`, { method: 'POST' });
+    if (!res.ok) throw new Error(`Failed to generate project summary: ${res.status}`);
+    return res.json();
+}
+
+export async function fetchProjectOverviewSummary(projectId) {
+    const res = await fetch(`${BASE_URL}/summaries/project/${projectId}/overview`);
+    if (!res.ok) throw new Error(`Failed to fetch project overview: ${res.status}`);
+    return res.json(); // null if none exists yet
+}
--- a/packages/chat-client/src/components/AllChatsView.jsx
+++ b/packages/chat-client/src/components/AllChatsView.jsx
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
 import { fetchSessions, deleteSession } from '../api/orchestration';
 import { CLIENT_DEFAULTS } from '../config/constants';

+
 const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;

 export default function AllChatsView({ onSelectSession, onBack, projects }) {
--- a/packages/chat-client/src/components/AllProjectsView.jsx
+++ b/packages/chat-client/src/components/AllProjectsView.jsx
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
 import ProjectModal from './ProjectModal';
 import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';

+
 export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
  const [projects, setProjects] = useState([]);
  const [loading, setLoading] = useState(true);
--- a/packages/chat-client/src/components/ChatWindow.jsx
+++ b/packages/chat-client/src/components/ChatWindow.jsx
@@ -12,6 +12,7 @@ export default function ChatWindow({
  onBack,
  canGoBack,
  loadedModel,
+  summarising,
 }) {
  const bottomRef = useRef(null);
  const inputRef = useRef(null);
@@ -86,6 +87,20 @@ export default function ChatWindow({
              No model loaded
            </span>
          )}
+          {summarising && (
+            <div style={{ display: 'flex', alignItems: 'center', gap: '6px' }}>
+              <div style={{
+                width: '10px', height: '10px', borderRadius: '50%',
+                border: '2px solid var(--accent)',
+                borderTopColor: 'transparent',
+                animation: 'spin 0.7s linear infinite',
+                flexShrink: 0,
+              }} />
+              <span style={{ fontSize: '11px', color: 'var(--text-muted)', whiteSpace: 'nowrap' }}>
+                Summarising…
+              </span>
+            </div>
+          )}
          <button className="btn-icon" onClick={onTogglePanel} title="Session info">⊹</button>
        </div>
      </div>
--- a/packages/chat-client/src/components/InfoPanel.jsx
+++ b/packages/chat-client/src/components/InfoPanel.jsx
@@ -1,6 +1,17 @@
 import React from 'react';

-export default function InfoPanel({ isOpen, onToggle, activeSession, lastModel, lastTokenCount, selectedModel, onModelChange, models }) {
+export default function InfoPanel({ 
+  isOpen, 
+  onToggle, 
+  activeSession, 
+  lastModel, 
+  lastTokenCount, 
+  selectedModel, 
+  onModelChange, 
+  models, 
+  summarising,
+  onViewSummary,
+}) {
  return (
    <div className="flex-col" style={{
  position: 'fixed',
@@ -74,6 +85,37 @@ export default function InfoPanel({ isOpen, onToggle, activeSession, lastModel,
            )}
          </Section>

+          {/* Session Memory button */}
+          {activeSession && !activeSession.isNew && (
+          <button
+            onClick={onViewSummary}
+            className="btn-reset text-sm"
+            style={{
+              marginTop: '8px', width: '100%', padding: '7px 10px',
+              borderRadius: 'var(--radius-md)',
+              background: 'var(--bg-elevated)',
+              border: '1px solid var(--border)',
+              color: 'var(--text-secondary)',
+              display: 'flex', alignItems: 'center', gap: '8px',
+            }}
+            onMouseEnter={e => e.currentTarget.style.borderColor = 'var(--accent-hover)'}
+            onMouseLeave={e => e.currentTarget.style.borderColor = 'var(--border)'}
+          >
+            <span>◈</span>
+            <span>Session Memory</span>
+            {summarising && (
+              <div style={{
+                marginLeft: 'auto',
+                width: '8px', height: '8px', borderRadius: '50%',
+                border: '2px solid var(--accent-hover)',
+                borderTopColor: 'transparent',
+                animation: 'spin 0.7s linear infinite',
+                flexShrink: 0,
+              }} />
+            )}
+          </button>
+        )}
+
        </div>
      )}
    </div>
--- a/packages/chat-client/src/components/ProjectView.jsx
+++ b/packages/chat-client/src/components/ProjectView.jsx
@@ -1,5 +1,5 @@
 import React, { useState, useEffect } from 'react';
-import { fetchSessions, updateProject, deleteProject } from '../api/orchestration';
+import { fetchSessions, updateProject, deleteProject, generateProjectSummary, fetchProjectOverviewSummary } from '../api/orchestration';
 import ProjectModal from './ProjectModal';

 export default function ProjectView({ project, onNavigate, onBack, onSelectSession, onNewProjectChat, onProjectsChange }) {
@@ -8,9 +8,27 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi
  const [input, setInput] = useState('');
  const [menuOpen, setMenuOpen] = useState(false);
  const [modal, setModal] = useState(null);
+  const [overview, setOverview] = useState(null);
+  const [overviewLoading, setOverviewLoading] = useState(true);
+  const [generating, setGenerating] = useState(false);
+  const [generateError, setGenerateError] = useState(null);

  useEffect(() => { load(); }, [project.id]);

+  useEffect(() => {
+    async function loadOverview() {
+      setOverviewLoading(true);
+      try {
+        setOverview(await fetchProjectOverviewSummary(project.id));
+      } catch (err) {
+        console.error('[ProjectView] Failed to load overview:', err.message);
+      } finally {
+        setOverviewLoading(false);
+      }
+    }
+    loadOverview();
+  }, [project.id]);
+
  async function load() {
    setLoading(true);
    try {
@@ -71,6 +89,23 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi
    return date.toLocaleDateString([], { month: 'short', day: 'numeric', year: 'numeric' });
  }
  
+  async function handleGenerateSummary() {
+    setGenerating(true);
+    setGenerateError(null);
+    try {
+        setOverview(await generateProjectSummary(project.id));
+    } catch (err) {
+        // 422 means no session summaries exist yet — surface a friendly message
+        setGenerateError(
+            err.message.includes('422')
+                ? 'No conversations found in this project yet.'
+                : 'Failed to generate summary. Please try again.'
+        );
+    } finally {
+        setGenerating(false);
+    }
+}
+
  return (
    <div className="flex-col flex-1 overflow-hidden" style={{ background: 'var(--bg-base)' }}>

@@ -198,34 +233,61 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi

        {/* ── Project Memory ── */}
        <div style={{ marginBottom: '40px' }}>
-          <p className="label-upper" style={{ marginBottom: '12px' }}>Project Memory</p>
+            <div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: '12px' }}>
+                <p className="label-upper">Project Memory</p>
+                <button
+                    className="btn-primary"
+                    style={{ padding: '5px 12px', fontSize: '12px', display: 'flex', alignItems: 'center', gap: '6px' }}
+                    onClick={handleGenerateSummary}
+                    disabled={generating}
+                >
+                    {generating
+                        ? <><span className="spinner" />Generating…</>
+                        : overview ? 'Regenerate' : 'Generate Summary'
+                    }
+                </button>
+            </div>
+
            <div style={{
                background: 'var(--bg-surface)',
                border: '1px solid var(--border)',
                borderRadius: 'var(--radius-lg)',
                padding: '20px',
-            display: 'flex', flexDirection: 'column', gap: '10px',
            }}>
+                {overviewLoading ? (
+                    <p className="text-sm text-muted">Loading…</p>
+
+                ) : generateError ? (
+                    <p className="text-sm" style={{ color: 'var(--text-muted)', fontStyle: 'italic' }}>
+                        {generateError}
+                    </p>
+
+                ) : overview ? (
+                    <>
+                        <p className="text-sm" style={{ color: 'var(--text-secondary)', lineHeight: 1.7, whiteSpace: 'pre-wrap' }}>
+                            {overview.content}
+                        </p>
+                        <p className="text-xs text-muted" style={{ marginTop: '12px' }}>
+                            Last generated {formatTimestamp(overview.created_at)}
+                        </p>
+                    </>
+
+                ) : (
+                    // No overview exists yet — explain what this section is for
+                    <div style={{ display: 'flex', flexDirection: 'column', gap: '10px' }}>
                        <div style={{ display: 'flex', alignItems: 'center', gap: '10px' }}>
                            <span style={{ fontSize: '20px', opacity: 0.4 }}>◈</span>
                            <span className="text-sm" style={{ fontWeight: 500, color: 'var(--text-primary)' }}>
-                Project Summary
+                                No project summary yet
                            </span>
-              <span style={{
-                fontSize: '11px', padding: '2px 8px',
-                borderRadius: '999px',
-                background: 'var(--bg-elevated)',
-                border: '1px solid var(--border)',
-                color: 'var(--text-muted)',
-              }}>Coming soon</span>
                        </div>
                        <p className="text-sm text-muted" style={{ lineHeight: 1.6, maxWidth: '520px' }}>
-              Once this project has enough conversations, NexusAI will automatically
-              generate a rolling summary of key themes, decisions, and context — giving
-              the model a condensed view of the project's memory without consuming the
-              full context window.
+                            Generate a summary to create a concise overview of this project's goals,
+                            progress, and key decisions — built from your session summaries.
                        </p>
                    </div>
+                )}
+            </div>
        </div>

        {/* ── Notes ── */}
--- a/packages/chat-client/src/components/SettingsView.jsx
+++ b/packages/chat-client/src/components/SettingsView.jsx
@@ -3,6 +3,7 @@ import { useSettings } from '../hooks/useSettings';
 import { useModels } from '../hooks/useModels';
 import { getServiceHealth } from '../api/orchestration';

+
 export default function SettingsView({ onNavigate, onBack, modelProps }) {
  const { settings, saveSetting, saving } = useSettings();

--- a/packages/chat-client/src/components/SummaryView.jsx
+++ b/packages/chat-client/src/components/SummaryView.jsx
@@ -0,0 +1,124 @@
+import React, { useState, useEffect } from 'react';
+import { fetchSessionSummaries } from '../api/orchestration';
+import ReactMarkdown from 'react-markdown';
+
+export default function SummaryView({ activeSession, onBack }) {
+  const [summaries, setSummaries] = useState([]);
+  const [loading, setLoading]     = useState(true);
+  const [error, setError]         = useState(null);
+  const [expanded, setExpanded]   = useState(null);
+
+  useEffect(() => {
+    if (!activeSession || activeSession.isNew) {
+      setLoading(false);
+      return;
+    }
+    setLoading(true);
+    fetchSessionSummaries(activeSession.external_id)
+      .then(data => setSummaries(Array.isArray(data) ? data : []))
+      .catch(err => setError(err.message))
+      .finally(() => setLoading(false));
+  }, [activeSession]);
+
+  function formatTimestamp(ts) {
+    if (!ts) return '—';
+    return new Date(ts * 1000).toLocaleString([], {
+      month: 'short', day: 'numeric',
+      hour: '2-digit', minute: '2-digit',
+    });
+  }
+
+  return (
+    <div style={{ display: 'flex', flexDirection: 'column', flex: 1, overflow: 'hidden', background: 'var(--bg-base)' }}>
+
+      {/* Header */}
+      <div className="panel-header" style={{ padding: '0 24px', gap: 12 }}>
+        <button className="btn-icon" onClick={onBack}>←</button>
+        <span className="text-base" style={{ fontWeight: 500 }}>Session Memory</span>
+        <span className="text-sm text-muted" style={{ marginLeft: 'auto' }}>
+          {summaries.length} summar{summaries.length !== 1 ? 'ies' : 'y'}
+        </span>
+      </div>
+
+      {/* Session name pill */}
+      {activeSession && (
+        <div style={{ padding: '8px 24px 0' }}>
+          <span className="text-xs text-muted" style={{
+            background: 'var(--bg-elevated)',
+            border: '1px solid var(--border)',
+            borderRadius: '999px',
+            padding: '3px 10px',
+          }}>
+            {activeSession.name || activeSession.external_id}
+          </span>
+        </div>
+      )}
+
+      {/* Content */}
+      <div className="scroll-y flex-1" style={{ padding: '16px 24px' }}>
+        {loading && <p className="text-sm text-muted">Loading…</p>}
+        {error   && <p className="text-sm" style={{ color: 'var(--error, #e05)' }}>{error}</p>}
+
+        {!loading && !activeSession && (
+          <p className="text-sm text-muted">No active session.</p>
+        )}
+
+        {!loading && activeSession && summaries.length === 0 && (
+          <div style={{
+            display: 'flex', flexDirection: 'column', alignItems: 'center',
+            gap: '12px', padding: '48px 0', color: 'var(--text-muted)',
+          }}>
+            <span style={{ fontSize: '28px', opacity: 0.3 }}>◈</span>
+            <p className="text-sm">No summaries yet for this session.</p>
+            <p className="text-xs text-muted" style={{ maxWidth: '280px', textAlign: 'center', lineHeight: 1.6 }}>
+              Summaries generate automatically once a session accumulates enough conversation.
+            </p>
+          </div>
+        )}
+
+        {summaries.map(summary => (
+          <div key={summary.id} style={{
+            background: 'var(--bg-surface)',
+            border: '1px solid var(--border)',
+            borderRadius: 'var(--radius-lg)',
+            marginBottom: '10px', overflow: 'hidden',
+          }}>
+            {/* Card header */}
+            <div
+              onClick={() => setExpanded(expanded === summary.id ? null : summary.id)}
+              style={{ display: 'flex', alignItems: 'center', gap: '10px', padding: '10px 14px', cursor: 'pointer' }}
+            >
+              <span style={{ flex: 1, fontSize: 13, color: 'var(--text-primary)' }}>
+                Episodes {summary.episode_range}
+              </span>
+              <span className="text-xs text-muted">{formatTimestamp(summary.created_at)}</span>
+              <span className="text-muted" style={{ fontSize: 11 }}>
+                {expanded === summary.id ? '▲' : '▼'}
+              </span>
+            </div>
+
+            {/* Expanded content */}
+            {expanded === summary.id && (
+              <div style={{ padding: '0 14px 14px', borderTop: '1px solid var(--border)' }}>
+                <ReactMarkdown components={{
+                  p: ({ children }) => (
+                    <p style={{ margin: '8px 0', lineHeight: 1.7, fontSize: 13, color: 'var(--text-secondary)' }}>
+                      {children}
+                    </p>
+                  ),
+                }}>
+                  {summary.content}
+                </ReactMarkdown>
+                {summary.token_count > 0 && (
+                  <p className="text-xs text-muted" style={{ marginTop: 8 }}>
+                    {summary.token_count.toLocaleString()} tokens covered
+                  </p>
+                )}
+              </div>
+            )}
+          </div>
+        ))}
+      </div>
+    </div>
+  );
+}
--- a/packages/chat-client/src/hooks/useChat.js
+++ b/packages/chat-client/src/hooks/useChat.js
@@ -1,4 +1,4 @@
-import { useState, useCallback, useRef } from 'react';
+import React, { useEffect, useState, useCallback, useRef } from 'react';
 import { streamMessage, updateSession } from '../api/orchestration';

 export function useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions }) {
@@ -7,6 +7,18 @@ export function useChat({ activeSession, appendMessage, updateLastMessage, refre
  const [lastTokenCount, setLastTokenCount] = useState(0);
  const [lastModel, setLastModel] = useState(null);
  const cancelRef = useRef(null);
+  const prevStreaming = React.useRef(false);
+  const [summarising, setSummarising] = useState(false);
+
+  useEffect(() => {
+    if (prevStreaming.current && !streaming) {
+      // Stream just finished — trigger the summarising indicator
+      setSummarising(true);
+      const t = setTimeout(() => setSummarising(false), 8000);
+      return () => clearTimeout(t);
+    }
+    prevStreaming.current = streaming;
+  }, [streaming]);

  const sendMessage = useCallback(async (text, model, projectId = null, session=null) => {
    const targetSession = session ?? activeSession;
@@ -96,5 +108,6 @@ export function useChat({ activeSession, appendMessage, updateLastMessage, refre
    error,
    lastTokenCount,
    lastModel,
+    summarising,
  };
 }
--- a/packages/chat-client/src/hooks/useProjects.js
+++ b/packages/chat-client/src/hooks/useProjects.js
@@ -1,6 +1,7 @@
 import { useState, useEffect, useCallback } from 'react';
 import { fetchProjects } from '../api/orchestration';

+
 export function useProjects() {
  const [projects, setProjects] = useState([]);

--- a/packages/chat-client/src/index.css
+++ b/packages/chat-client/src/index.css
@@ -35,6 +35,10 @@ html, body, #root {
  50%       { opacity: 0; }
 }

+@keyframes spin {
+  to { transform: rotate(360deg); }
+}
+
 /* ── Layout ─────────────────────────────────────────── */

 .flex        { display: flex; }
@@ -111,3 +115,13 @@ html, body, #root {
 .text-accent  { color: var(--accent); }
 .label-upper  { font-size: 13px; font-weight: 750; color: var(--text-muted); text-transform: uppercase; letter-spacing: 0.08em; }
 .truncate     { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }
+
+.spinner {
+  width: 12px;
+  height: 12px;
+  border: 2px solid var(--border);
+  border-top-color: var(--text-muted);
+  border-radius: 50%;
+  animation: spin 0.7s linear infinite;
+  flex-shrink: 0;
+}
--- a/packages/chat-client/vite.config.js
+++ b/packages/chat-client/vite.config.js
@@ -16,6 +16,7 @@ export default defineConfig({
      '/episodes':  'http://192.168.0.205:4000',
      '/settings':  'http://192.168.0.205:4000',
      '/health':    'http://192.168.0.205:4000',
+      '/summaries': 'http://192.168.0.205:4000',
    },
  },
 });
--- a/packages/embedding-service/CLAUDE.md
+++ b/packages/embedding-service/CLAUDE.md
@@ -0,0 +1,64 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
+
+## Running This Service
+
+```bash
+npm run embedding                          # From repo root
+npm -w packages/embedding-service run dev  # With --watch
+```
+
+Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
+
+## Single-File Service
+
+The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
+
+## Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `PORT` | `3003` | Port to listen on |
+| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
+| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
+
+Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
+
+## Ollama API Details
+
+Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
+
+```json
+{ "model": "nomic-embed-text", "input": "text to embed" }
+```
+
+Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
+
+The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
+
+## Batch Endpoint
+
+`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
+
+## Error Responses
+
+| Condition | Status | Notes |
+|---|---|---|
+| Missing/empty `text` | 400 | |
+| Ollama call fails | 502 | Upstream failure — correct status |
+| Empty `texts` array | 400 | |
+
+## Known Issue
+
+The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
+
+## API Endpoints
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Static response — does not verify Ollama is reachable |
+| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
+| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |
--- a/packages/embedding-service/package.json
+++ b/packages/embedding-service/package.json
@@ -9,7 +9,6 @@
  "dependencies": {
    "@nexusai/shared": "^1.0.0",
    "dotenv": "^17.4.0",
-    "express": "^5.2.1",
-    "ollama": "^0.6.3"
+    "express": "^5.2.1"
  }
 }
--- a/packages/embedding-service/src/index.js
+++ b/packages/embedding-service/src/index.js
@@ -1,20 +1,21 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, OLLAMA, PORTS} = require('@nexusai/shared');
+const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '1mb' }));    // limit request body to 1mb to prevent abuse - embedding requests should be small

-const PORT          = getEnv('PORT',            PORTS.EMBEDDING);  // Default to 3003 if PORT is not set
-const OLLAMA_URL    = getEnv('OLLAMA_URL',      OLLAMA.DEFAULT_URL); // URL for Ollama API
-const EMBED_MODEL   = getEnv('EMBEDDING_MODEL', OLLAMA.EMBED_MODEL); // Ollama model for embeddings
+const PORT          = getEnv('PORT',            PORTS.EMBEDDING);  
+const OLLAMA_URL    = getEnv('OLLAMA_URL',      OLLAMA.DEFAULT_URL); 
+const EMBED_MODEL   = getEnv('EMBEDDING_MODEL', OLLAMA.EMBED_MODEL); 

 //OLLAMA embedding helper function
 async function embedText(text) {
    const res = await fetch(`${OLLAMA_URL}/api/embed`, {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
-        body: JSON.stringify({ model: EMBED_MODEL, input: text })
+        body: JSON.stringify({ model: EMBED_MODEL, input: text }),
+        signal: AbortSignal.timeout(30_000),
    });

    if (!res.ok) {
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
 app.post('/embed', async (req, res) => {
    const { text } = req.body;
    if (!text || typeof text !== 'string' || text.trim() === '') {
-        return res.status(400).json({ error: 'text is required and must be empty' });
+        return res.status(400).json({ error: 'text is required and must not be empty' });
    }

    try {
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
    }

    try {
-        //sequential embedding for now, Ollama doesn't natively parallize embeddings
+        const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
+        if (invalid !== -1)
+            return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
+
        const embeddings = [];
        for (const text of texts) {
            embeddings.push(await embedText(text.trim()));
@@ -78,5 +82,5 @@ app.post('/embed/batch', async (req, res) => {

 /******* Start Server ********/
 app.listen(PORT, () => {
-    console.log(`Embedding Service listening on port ${PORT}`);
+    logger.info(`Embedding Service listening on port ${PORT}`);
 });
--- a/packages/inference-service/CLAUDE.md
+++ b/packages/inference-service/CLAUDE.md
@@ -0,0 +1,75 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
+
+## Running This Service
+
+```bash
+npm run inference                          # From repo root
+npm -w packages/inference-service run dev  # With --watch
+```
+
+Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
+
+## Provider Pattern
+
+`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
+
+| `INFERENCE_PROVIDER` | Module | Backend |
+|---|---|---|
+| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
+| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
+
+An unknown provider throws immediately at startup — fail-fast, not at request time.
+
+Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
+
+## Environment Variables
+
+| Variable | Default | Description |
+|---|---|---|
+| `PORT` | `3001` | Port to listen on |
+| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
+| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
+| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
+
+`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
+
+## Options Resolution
+
+Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
+
+## Streaming Chunk Format
+
+The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
+
+**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
+
+**llama.cpp** yields:
+- Per-token: `{ response: delta, done: false }`
+- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
+
+The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
+
+## Known Issue: `maxTokens` Missing from Streaming Route
+
+`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
+
+## SSE Format (route → caller)
+
+```
+data: {"response":"Hello"}        ← per token
+data: {"response":" world"}
+data: {"done":true,"model":"...","tokenCount":42}  ← final metadata
+data: [DONE]                       ← sentinel
+```
+
+## API Endpoints
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Returns `{ service, status, provider, model }` |
+| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
+| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |
--- a/packages/inference-service/src/index.js
+++ b/packages/inference-service/src/index.js
@@ -1,10 +1,10 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, PORTS, OLLAMA} = require('@nexusai/shared');
+const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
 const inferenceRouter = require('./routes/inference');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '8mb' }));  // prompts include full context window

 const PORT      = getEnv('PORT', PORTS.INFERENCE);
 const PROVIDER  = getEnv('INFERENCE_PROVIDER',   'ollama');
@@ -24,5 +24,5 @@ app.use('/', inferenceRouter);

 // Start the server
 app.listen(PORT, () => {
-    console.log(`Inference Service is running on port ${PORT}`);
+    logger.info(`Inference Service is running on port ${PORT}`);
 });
--- a/packages/inference-service/src/providers/llamacpp.js
+++ b/packages/inference-service/src/providers/llamacpp.js
@@ -1,4 +1,4 @@
-const { getEnv, LLAMACPP, INFERENCE_DEFAULTS } = require("@nexusai/shared");
+const { getEnv, LLAMACPP, INFERENCE_DEFAULTS, logger } = require("@nexusai/shared");

 const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
 const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
@@ -89,7 +89,7 @@ async function* completeStream(prompt, options = {}) {
    }
  }

-  console.log('[llamacpp] finalTokenCount:', finalTokenCount);
+  logger.info('[llamacpp] finalTokenCount:', finalTokenCount);

  yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
 }
--- a/packages/inference-service/src/providers/ollama.js
+++ b/packages/inference-service/src/providers/ollama.js
@@ -57,8 +57,17 @@ async function* completeStream(prompt, options = {} ) {
    });

    for await (const chunk of stream) {
+        if (chunk.done) {
+            yield {
+                response:   '',
+                done:       true,
+                model:      chunk.model,
+                tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
+            };
+        } else {
            yield chunk;
        }
+    }
 }

 module.exports = { complete, completeStream };
--- a/packages/inference-service/src/routes/inference.js
+++ b/packages/inference-service/src/routes/inference.js
@@ -1,5 +1,6 @@
 const { Router } = require('express');
 const { complete, completeStream } = require('../infer');
+const { logger } = require('@nexusai/shared');

 const router = Router();

@@ -15,14 +16,14 @@ router.post('/complete', async (req, res) => {
        const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
        res.json(result);
    } catch (error) {
-        console.error('[Inference] Completion error:', error.message);
-        res.status(500).json({ error: error.message });
+        logger.error('[Inference] Completion error:', error.message);
+        res.status(500).json({ error: 'Inference failed', detail: error.message });
    }
 });

 // Streaming completion endpoint - sends partial responses as they arrive
 router.post('/complete/stream', async (req, res) => {
-    const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
+    const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;

    if (!prompt) return res.status(400).json({ error: 'prompt is required' });

@@ -34,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
        let lastModel = model;
        let tokenCount = 0;

-        for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
+        for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
            if (chunk.response) {
                res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
            }
@@ -42,7 +43,7 @@ router.post('/complete/stream', async (req, res) => {
                // capture final metadata from the done signal
                lastModel  = chunk.model      ?? lastModel;
                tokenCount = chunk.tokenCount ?? tokenCount;
-                console.log('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
+                logger.info('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
            }
        }

@@ -51,7 +52,7 @@ router.post('/complete/stream', async (req, res) => {
        res.write('data: [DONE]\n\n');

    } catch (err) {
-        console.error('[Inference] Streaming error:', err.message);
+        logger.error('[Inference] Streaming error:', err.message);
        res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
    } finally {
        res.end();
--- a/packages/memory-service/CLAUDE.md
+++ b/packages/memory-service/CLAUDE.md
@@ -0,0 +1,114 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the dual-store memory model.
+
+## Running This Service
+
+```bash
+npm run memory             # From repo root (node src/index.js)
+npm -w packages/memory-service run dev   # With --watch
+```
+
+Default port: **3002**. Requires Qdrant and the embedding-service to be reachable on startup.
+
+## SQLite Schema
+
+`src/db/schema.js` is the source of truth for the data model. Key schema facts:
+
+- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
+- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
+- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
+- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
+- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
+
+**New columns on `entities` (added via migration):**
+- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
+- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
+- `source TEXT DEFAULT 'extraction'` — `'extraction'` or `'manual'`
+- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
+
+**New columns on `relationships` (added via migration):**
+- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
+- `notes TEXT` — relationship context sentence from extraction
+
+## Async Pipeline: Episode Creation
+
+`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
+
+1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
+2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
+
+Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
+
+## Entity Extraction Details
+
+`src/entities/extraction.js`:
+
+- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
+- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
+- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
+- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
+- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
+- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
+- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
+- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
+- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
+
+> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
+
+## Knowledge Graph
+
+`src/graph/index.js` provides two SQLite traversal functions:
+
+- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
+
+- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
+
+The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
+
+> For full design rationale and usage see `docs/services/knowledge-graph.md`.
+
+## Summarization Strategy
+
+`src/summarization/project.js`:
+
+- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
+- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
+- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
+- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
+- Uses temp 0.2 and `num_predict 1200`.
+
+## Qdrant Client
+
+`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
+
+## API Endpoints Quick Reference
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Static response, no dependency checks |
+| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
+| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
+| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
+| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
+| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
+| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
+| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
+| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
+| GET | `/entities/by-type/:type` | All entities of given type |
+| GET/DELETE | `/entities/:id` | |
+| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
+| GET | `/entities/:id/relationships` | Outbound only |
+| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
+| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
+| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
+| GET/POST | `/projects` | POST requires non-empty `name` |
+| GET/PATCH/DELETE | `/projects/:id` | |
+| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
+| GET | `/projects/:id/overview` | Returns null (not 404) if no overview exists |
+| GET | `/projects/:id/summaries` | All summaries for project |
+| POST | `/summaries` | Requires `content` + at least one of `sessionId`/`projectId` |
+| GET | `/sessions/:id/summaries` | |
+| PATCH/DELETE | `/summaries/:id` | |
--- a/packages/memory-service/src/db/index.js
+++ b/packages/memory-service/src/db/index.js
@@ -1,6 +1,6 @@
 const Database = require('better-sqlite3');
 const schema = require('./schema');
-const {getEnv, SQLITE } = require('@nexusai/shared');
+const {getEnv, SQLITE, logger } = require('@nexusai/shared');

 let db;  // Declare db variable in a scope accessible to all functions

@@ -54,15 +54,20 @@ function getDB() {
            db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
        } catch {}

-        try {
-            db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
-        } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
+        try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
+
+        try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
+        try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
+
        
        // Sync FTS index with any existing episodes data
        db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response) 
            SELECT id, user_message, ai_response FROM episodes`);

-        console.log(`Connected to SQLite database at ${path}`);
+        logger.info(`Connected to SQLite database at ${path}`);
    }
    return db;
 }
--- a/packages/memory-service/src/db/schema.js
+++ b/packages/memory-service/src/db/schema.js
@@ -38,6 +38,20 @@ const schema = `
    UNIQUE(from_id, to_id, label)
  );

+  CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
+  CREATE INDEX IF NOT EXISTS idx_relationships_to   ON relationships(to_id);
+
+  CREATE TABLE IF NOT EXISTS entity_episodes (
+    entity_id  INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
+    episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
+    PRIMARY KEY (entity_id, episode_id)
+  );
+
+  CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity  ON entity_episodes(entity_id);
+  CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
+  
+
+
  CREATE TABLE IF NOT EXISTS projects (
    id          INTEGER PRIMARY KEY AUTOINCREMENT,
    name        TEXT NOT NULL,
--- a/packages/memory-service/src/db/summaries.js
+++ b/packages/memory-service/src/db/summaries.js
@@ -50,4 +50,27 @@ function deleteSummary(id) {
    getDB().prepare(`DELETE FROM summaries WHERE id = ?`).run(id);
 }

-module.exports = { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary };
+// Fetches session summaries that belong to sessions in a given project
+// Joins through sessions table since session summaries don't store project_id directly
+function getSessionSummariesForProject(projectId) {
+    const db = getDB();
+    return db.prepare(`
+        SELECT s.* FROM summaries s
+        JOIN sessions sess ON sess.id = s.session_id
+        WHERE sess.project_id = ? AND s.session_id IS NOT NULL
+        ORDER BY s.created_at ASC
+    `).all(projectId).map(parseRow);
+}
+
+// Fetches the most recent project-level overview summary (session_id IS NULL distinguishes it)
+function getProjectOverviewSummary(projectId) {
+    const db = getDB();
+    const row = db.prepare(`
+        SELECT * FROM summaries
+        WHERE project_id = ? AND session_id IS NULL
+        ORDER BY created_at DESC LIMIT 1
+    `).get(projectId);
+    return row ? parseRow(row) : null;
+}
+
+module.exports = { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary, getSessionSummariesForProject, getProjectOverviewSummary };
--- a/packages/memory-service/src/entities/extraction.js
+++ b/packages/memory-service/src/entities/extraction.js
@@ -1,13 +1,18 @@
 const semantic = require('../semantic')
-const { getEnv, SERVICES, formatEpisodeText } = require('@nexusai/shared');
-const { upsertEntity } = require('./index');
+const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
+const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');

 const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
-const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
+const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
 const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);

-const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
+const ENTITY_TYPES = ENTITIES.TYPES;
+const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];

+// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
+// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
+// or other model, this format will need to change — most alternatives use either
+// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
 function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
    const knownBlock = knownEntities.length > 0
        ? [
@@ -19,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {

    return [
        '<|im_start|>system',
-        'You are a named entity extractor. You output only valid JSON.',
+        'You are a named entity and relationship extractor. You output only valid JSON.',
        '<|im_end|>',
        '<|im_start|>user',
-        'Read the conversation below and extract every named entity mentioned.',
-        `Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
-        'For each entity found, provide: name, type, and a one-sentence notes field.',
-        'Return your answer as: { "entities": [ ... ] }',
-        'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
-        'For each entity found, provide:',
-        '  "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
+        'Read the conversation below and extract all named entities and the relationships between them.',
+        `Entity types: ${ENTITY_TYPES.join(', ')}`,
+        'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
+        'Use "person" only for real people',
+        'For each entity provide:',
+        '  "name": short proper noun only (max 4 words)',
        '  "type": one of the valid types',
-        '  "notes": one specific sentence about this entity based on the conversation (not generic)',
+        '  "notes": one specific sentence about this entity based on the conversation',
+        'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
+        'Only include relationships between entities you have listed above.',
+        'Return this exact JSON structure:',
+        '{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
        '',
        knownBlock,
-        '--- CONVERSATION ---',   // clear delimiter helps smaller models
+        '--- CONVERSATION ---',
        `User: ${userMessage}`,
        `Assistant: ${aiResponse}`,
        '--- END CONVERSATION ---',
@@ -57,17 +65,13 @@ async function embedEntity(entity) {
    return data.embedding;
 }

-async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
-    console.log('[entities] Extraction triggered')
+async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
+    logger.info('[entities] Extraction triggered')
    try {
        // Fetch existing entities to guide the model toward consistent name/type pairs
        const db = require('../db').getDB();
-        console.log('[entities] fetching known entities...');  // add this
        const knownEntities = db.prepare(`SELECT name, type FROM entities ORDER BY rowid DESC LIMIT 20`).all();
-        console.log('[entities] known entities count:', knownEntities.length);
-
        const prompt = buildExtractionPrompt(userMessage, aiResponse, knownEntities);
-        console.log('[entities] prompt preview:', JSON.stringify(prompt.slice(-300)));


        const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
@@ -79,32 +83,53 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
                stream: false,
                format: 'json',
                options: {
-                    temperature: 0.1,
-                    num_predict: 1024,
+                    temperature: ENTITIES.TEMPERATURE,
+                    num_predict: ENTITIES.NUM_PREDICT,
                },
            }),
+            signal: AbortSignal.timeout(60_000),
        });

        if (!res.ok) throw new Error(`Ollama responded ${res.status}`);

        const data = await res.json();
        const raw = data.response?.trim() ?? '';
-        console.log('[entities] raw response:', JSON.stringify(raw.slice(0, 300)));

-        const parsed = JSON.parse(raw);
+        const jsonMatch = raw.match(/\{[\s\S]*\}/);
+        if (!jsonMatch) {
+            logger.warn('[entities] No JSON object found in response');
+            logger.debug('[entities] Raw response was:', raw);
+            return;
+        }
+
+        let parsed;
+        try {
+            parsed = JSON.parse(jsonMatch[0]);
+        } catch (err) {
+            logger.warn('[entities] Failed to parse extraction response:', err.message);
+            logger.debug('[entities] Raw response was:', raw);
+            return;
+        }
        const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
-        if (entities.length === 0) throw new Error('No entities in response');
-
-        if (!Array.isArray(entities)) throw new Error('Response was not a JSON array');
+        if (entities.length === 0) {
+            logger.debug('[entities] No entities found in this exchange — skipping');
+            return;
+        }

+        // Map of "name::type" → saved entity, used for relationship resolution below
+        const entityMap = new Map();
        let saved = 0;
+
        for (const { name, type, notes } of entities) {
            if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
+            if (IGNORED_NAMES.includes(name.toLowerCase())) continue;

            const entity = upsertEntity(name, type, notes ?? null);
-            console.log('[entities] Upserted entity:', entity);
+            entityMap.set(`${name}::${type}`, entity);
+            logger.info('[entities] Upserted entity:', entity);
+
+            if (episodeId) linkEntityToEpisode(entity.id, episodeId);

-            // Embed and upsert to Qdrant fire-and-forget
            embedEntity(entity)
                .then(vector => semantic.upsertEntity(entity.id, vector, {
                    name: entity.name,
@@ -113,19 +138,34 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
                    projectId: projectId ?? null,
                }))
                .catch(err => {
-                    console.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
-                    console.warn(`[entities] Embed error stack:`, err.stack);  // add this
+                    logger.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
                });

            saved++;
        }

-        if (saved > 0) console.log(`[entities] Extracted and stored ${saved} entities`);
+        if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
+
+        // Process extracted relationships — both entities must have been saved above
+        const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
+        let relSaved = 0;
+
+        for (const { from, fromType, to, toType, label, notes } of relationships) {
+            if (!from || !fromType || !to || !toType || !label) continue;
+
+            const fromEntity = entityMap.get(`${from}::${fromType}`);
+            const toEntity   = entityMap.get(`${to}::${toType}`);
+            if (!fromEntity || !toEntity) continue;
+
+            upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
+            relSaved++;
+        }
+
+        if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);

    } catch (err) {
        // Non-critical — log and move on, episode is already saved
-        console.warn('[entities] Extraction failed:', err.message);
-        console.warn('[entities] Stack:', err.stack);
+        logger.warn('[entities] Extraction failed:', err.message);
    }
 }

--- a/packages/memory-service/src/entities/index.js
+++ b/packages/memory-service/src/entities/index.js
@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
 /******* Entities ********/

 // Upsert an entity - insert or update if (name, type) already exists
-function upsertEntity(name, type, notes = null, metadata = null) {
+function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
    const db = getDB();
-    const stmt = db.prepare(`
-        INSERT INTO entities (name, type, notes, metadata)
-        VALUES (?, ?, ?, ?)
+const stmt = db.prepare(`
+    INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
+    VALUES (?, ?, ?, ?, ?, unixepoch())
    ON CONFLICT(name, type) DO UPDATE SET
+        -- First extraction wins: notes are never overwritten once set.
+        -- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
+        -- quality scoring exists, a higher-confidence extraction should be able
+        -- to replace stale notes rather than being silently dropped.
        notes = COALESCE(entities.notes, excluded.notes),
        metadata      = excluded.metadata,
+        mention_count = entities.mention_count + 1,
+        last_seen_at  = unixepoch(),
        updated_at    = unixepoch()
-    `);
-    const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
-
+`);
+    stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
    return getEntityByNameType(name, type);
 }

@@ -40,15 +45,17 @@ function deleteEntity(id) {
 /********* Relationships *********/

 // Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
-function upsertRelationship(fromId, toId, label, metadata = null){
+function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
    const db = getDB();
    const stmt = db.prepare(`
-        INSERT INTO relationships (from_id, to_id, label, metadata)
-        VALUES (?, ?, ?, ?)
-        ON CONFLICT(from_id, to_id, label) DO NOTHING
+        INSERT INTO relationships (from_id, to_id, label, notes, metadata)
+        VALUES (?, ?, ?, ?, ?)
+        ON CONFLICT(from_id, to_id, label) DO UPDATE SET
+            mention_count = relationships.mention_count + 1,
+            -- First extraction wins for notes — same policy as entities.
+            notes         = COALESCE(relationships.notes, excluded.notes)
    `);
-
-    const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
+    stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
    return getRelationship(fromId, toId, label);
 }

@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
 }

 // Retrive all relationships originating from a given entity
-function getRelationshipsByEntity(entityId) {
+function getOutboundRelationships(entityId) {
    const db = getDB();
    return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
 }
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
    db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
 }   

+function linkEntityToEpisode(entityId, episodeId) {
+    const db = getDB();
+    db.prepare(`
+        INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
+        VALUES (?, ?)
+    `).run(entityId, episodeId);
+}
+
 module.exports = {
    upsertEntity,
    getEntity,
    getEntitiesByType,
    getEntityByNameType,
    deleteEntity,
+    linkEntityToEpisode,
    upsertRelationship,
    getRelationship,
-    getRelationshipsByEntity,
+    getOutboundRelationships,
    deleteRelationship
 }
--- a/packages/memory-service/src/episodic/index.js
+++ b/packages/memory-service/src/episodic/index.js
@@ -1,5 +1,5 @@
 const {getDB} = require('../db');
-const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText } = require('@nexusai/shared');
+const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES, logger } = require('@nexusai/shared');
 const semantic = require('../semantic');
 const { extractAndStoreEntities } = require('../entities/extraction')

@@ -25,7 +25,7 @@ function getSession(id) {
 }


-function getSessions(limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = 0, projectId = null) {
+function getSessions(limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = EPISODIC.DEFAULT_OFFSET, projectId = null) {
  const db = getDB();
  const stmt = projectId
    ? db.prepare(`
@@ -98,27 +98,26 @@ function deleteSessionByExternalId(externalId) {

 // --Episodes --------------------------------------------------
 // Creates a new episode linked to a session, with user message, AI response, optional token count, and metadata
-async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = null, metadata = null, projectId=null) {
+async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = null, projectId=null) {
  const db = getDB();

  // Wrap insert + session touch in a transaction — both succeed or neither does
  const insert = db.transaction(() => {
    const stmt = db.prepare(`
-      INSERT INTO episodes (session_id, user_message, ai_response, token_count, metadata)
-      VALUES (?, ?, ?, ?, ?)
+      INSERT INTO episodes (session_id, user_message, ai_response, token_count)
+      VALUES (?, ?, ?, ?)
    `);
    const result = stmt.run(
      sessionId,
      userMessage,
      aiResponse,
      tokenCount,
-      metadata ? JSON.stringify(metadata) : null
    );
    touchSession(sessionId);
    return getEpisode(result.lastInsertRowid);
  });

-  const episode= insert();
+  const episode = insert();

  //embed ascynchronously after SQLite completes, non-blocking.  If embedding fail, the episode still saved.
  getEpisodeEmbedding(userMessage, aiResponse)
@@ -126,10 +125,10 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
      sessionId: episode.session_id,
      createdAt: episode.created_at
    }))
-    .catch(err => console.error(`Failed to embed episode ${episode.id}:`, err.message));
+    .catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));

-  extractAndStoreEntities(userMessage, aiResponse, projectId)
-    .catch(err => console.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
+  extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
+    .catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));


  return episode;
@@ -143,7 +142,7 @@ function getEpisode(id) {
 }

 // Retrieves episodes for a given session, ordered by creation time descending, with pagination
-function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = 0) {
+function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = EPISODIC.DEFAULT_OFFSET) {
  const db = getDB();
  const stmt = db.prepare(`
    SELECT * FROM episodes
@@ -155,30 +154,41 @@ function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, off
 }

 // Retrieves recent episodes across all sessions, ordered by creation time descending, with a limit
-function getRecentEpisodes(limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
+function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
  // Cross-session recent episodes — useful for recency-based retrieval
  const db = getDB();
  const stmt = db.prepare(`
    SELECT * FROM episodes
+    WHERE session_id = ?
    ORDER BY created_at DESC
    LIMIT ?
  `);
-  return stmt.all(limit).map(parseRow);
+  return stmt.all(sessionId, limit).map(parseRow);
 }


 // Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
-function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
-  // FTS5 full-text search across all episodes
+function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
  const db = getDB();
-  const stmt = db.prepare(`
+  const safeQuery = `"${query.replace(/"/g, '""')}"`;
+  if (sessionIds && sessionIds.length > 0) {
+    const ph = sessionIds.map(() => '?').join(',');
+    return db.prepare(`
+      SELECT e.* FROM episodes e
+      JOIN episodes_fts fts ON e.id = fts.rowid
+      WHERE episodes_fts MATCH ?
+      AND e.session_id IN (${ph})
+      ORDER BY rank
+      LIMIT ?
+    `).all(safeQuery, ...sessionIds, limit).map(parseRow);
+  }
+  return db.prepare(`
    SELECT e.* FROM episodes e
    JOIN episodes_fts fts ON e.id = fts.rowid
    WHERE episodes_fts MATCH ?
    ORDER BY rank
    LIMIT ?
-  `);
-  return stmt.all(query, limit).map(parseRow);
+  `).all(safeQuery, limit).map(parseRow);
 }

 // Deletes an episode by its ID
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
  const res = await fetch(`${url}/embed`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
-    body: JSON.stringify({ text })  
+    body: JSON.stringify({ text }),  
+    signal: AbortSignal.timeout(30_000),
  })

  if (!res.ok) {
@@ -207,6 +218,17 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
  return data.embedding;
 }

+function getEpisodesByProject(projectId, limit = SUMMARIES.MAX_PROJECT_EPISODE_LIMIT) {
+    const db = getDB();
+    return db.prepare(`
+        SELECT e.* FROM episodes e
+        JOIN sessions s ON s.id = e.session_id
+        WHERE s.project_id = ?
+        ORDER BY e.created_at ASC
+        LIMIT ?
+    `).all(projectId, limit).map(parseRow);
+}
+
 module.exports = {
  createSession,
  getSession,
@@ -221,5 +243,6 @@ module.exports = {
  getEpisodesBySession,
  getRecentEpisodes,
  searchEpisodes,
-  deleteEpisode
+  deleteEpisode,
+  getEpisodesByProject
 };
--- a/packages/memory-service/src/graph/index.js
+++ b/packages/memory-service/src/graph/index.js
@@ -0,0 +1,77 @@
+const { getDB } = require('../db');
+const { parseRow, ENTITIES } = require('@nexusai/shared');
+
+// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
+function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
+    const db = getDB();
+
+    const nodeRows = db.prepare(`
+        WITH RECURSIVE traverse(entity_id, depth) AS (
+            SELECT ?, 0
+            UNION
+            SELECT
+                CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
+                t.depth + 1
+            FROM relationships r
+            JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
+            WHERE t.depth < ?
+        )
+        SELECT DISTINCT entity_id FROM traverse
+    `).all(entityId, depth);
+
+    const nodeIds = nodeRows.map(r => r.entity_id);
+    if (nodeIds.length === 0) return { nodes: [], edges: [] };
+
+    const ph = nodeIds.map(() => '?').join(',');
+    const nodes = db.prepare(
+        `SELECT * FROM entities WHERE id IN (${ph})`
+    ).all(...nodeIds).map(parseRow);
+
+    const edges = db.prepare(
+        `SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
+    ).all(...nodeIds, ...nodeIds).map(parseRow);
+
+    return { nodes, edges };
+}
+
+// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
+function getEntityNeighbors(entityIds) {
+    if (!entityIds.length) return { nodes: [], edges: [] };
+    const db = getDB();
+
+    const ph = entityIds.map(() => '?').join(',');
+
+    // entityIds appears three times — once for the CASE (finding the neighbor),
+    // and once each for the FROM and TO sides of the WHERE clause
+    const neighborRows = db.prepare(`
+        SELECT DISTINCT
+            CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
+        FROM relationships
+        WHERE from_id IN (${ph}) OR to_id IN (${ph})
+    `).all(...entityIds, ...entityIds, ...entityIds);
+
+    const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
+    const allPh = allIds.map(() => '?').join(',');
+
+    const nodes = db.prepare(
+        `SELECT * FROM entities WHERE id IN (${allPh})`
+    ).all(...allIds).map(parseRow);
+
+    const edges = db.prepare(
+        `SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
+    ).all(...allIds, ...allIds).map(parseRow);
+
+    return { nodes, edges };
+}
+
+// Returns episode IDs linked to any of the given entity IDs via entity_episodes
+function getEpisodeIdsByEntities(entityIds) {
+    if (!entityIds.length) return [];
+    const db = getDB();
+    const ph = entityIds.map(() => '?').join(',');
+    return db.prepare(
+        `SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
+    ).all(...entityIds).map(r => r.episode_id);
+}
+
+module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };
--- a/packages/memory-service/src/index.js
+++ b/packages/memory-service/src/index.js
@@ -1,16 +1,18 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, PORTS, EPISODIC} = require('@nexusai/shared');
+const {getEnv, PORTS, EPISODIC, logger} = require('@nexusai/shared');
 const { getDB } = require('./db');
 const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
 const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
+const { generateAndStoreProjectSummary } = require('./summarization/project');
+const graph = require('./graph');

 const episodic = require('./episodic');
 const semantic = require('./semantic');
 const entities = require('./entities');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '2mb' }));

 const  PORT = getEnv('PORT', PORTS.MEMORY);

@@ -18,8 +20,8 @@ const  PORT = getEnv('PORT', PORTS.MEMORY);
 const db = getDB();

 semantic.initCollections()
-    .then(() => console.log(`QDrant collections ready`))
-    .catch(err => console.error(`QDrant initialization error:`, err.message));
+    .then(() => logger.info(`QDrant collections ready`))
+    .catch(err => logger.error(`QDrant initialization error:`, err.message));

 // Health check endpoint
 app.get('/health', (req, res) => {
@@ -79,13 +81,11 @@ app.patch('/sessions/by-external/:externalId', (req, res) => {
    const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
    res.json(session);
  } catch (err) {
-    res.status(500).json({error: err.message });
+    res.status(500).json({ error: 'Failed to update session', detail: err.message });
  }
 });

-
-
-// Updates the session's updated_at timestamp to now
+// Deletes a session and all associated episodes
 app.delete('/sessions/by-external/:externalId', (req, res) => {
  episodic.deleteSessionByExternalId(req.params.externalId);
  res.status(204).send();
@@ -97,18 +97,11 @@ app.delete('/sessions/by-external/:externalId', (req, res) => {
 /************************************* */

 app.post('/episodes', async (req, res) => {
-  const { sessionId, userMessage, aiResponse, tokenCount, metadata, projectId } = req.body;
+  const { sessionId, userMessage, aiResponse, tokenCount, projectId } = req.body;
  if (!sessionId || !userMessage || !aiResponse) {
    return res.status(400).json({ error: 'sessionId, userMessage and aiResponse are required' });
  }
-  const episode = await episodic.createEpisode(sessionId, userMessage, aiResponse, tokenCount, metadata, projectId);
-
-  console.log('[memory] create episode body:', {
-    sessionId,
-    userMessageLength: userMessage?.length,
-    aiResponseLength: aiResponse?.length,
-    tokenCount
-  });
+  const episode = await episodic.createEpisode(sessionId, userMessage, aiResponse, tokenCount, projectId);

  res.status(201).json(episode);
 });
@@ -138,10 +131,12 @@ app.get('/episodes', (req, res) => {

 // Search MUST come before /:id — otherwise 'search' gets captured as an id
 app.get('/episodes/search', (req, res) => {
-  const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
+  const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
  if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
-  const results = episodic.searchEpisodes(q, Number(limit));
-  res.json(results);
+  const parsedSessionIds = sessionIds
+    ? sessionIds.split(',').map(Number).filter(Boolean)
+    : null;
+  res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
 });

 app.get('/episodes/:id', (req, res) => {
@@ -166,7 +161,7 @@ app.delete('/episodes/:id', (req, res) => {
  episodic.deleteEpisode(id);

  semantic.deleteEpisode(id)  // fire-and-forget
-    .catch(err => console.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
+    .catch(err => logger.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));

  res.status(204).send();
 });
@@ -210,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {

 // Upsert a relationship between two entities
 app.post('/relationships', (req, res) => {
-  const {fromId, toId, label, metadata } = req.body;
+  const { fromId, toId, label, notes, metadata } = req.body;
  if (!fromId || !toId || !label) {
    return res.status(400).json({ error: 'fromId, toId and label are required' });
  }
-  const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
+  const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
  res.status(201).json(relationship);
 });

 // Get all relationships for a given entity ID
 app.get('/entities/:id/relationships', (req, res) => {
-  res.json(entities.getRelationshipsByEntity(req.params.id));
+  res.json(entities.getOutboundRelationships(req.params.id));
 });

 // Delete a specific relationship
@@ -233,6 +228,37 @@ app.delete('/relationships', (req, res) => {
  res.status(204).send();
 })

+/********************************* */
+/********** Graph Routes ********** */
+/********************************* */
+
+// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
+app.get('/graph/neighborhood/:entityId', (req, res) => {
+    const entity = entities.getEntity(req.params.entityId);
+    if (!entity) return res.status(404).json({ error: 'Entity not found' });
+
+    const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
+    const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
+    res.json({ entity, neighborhood });
+});
+
+// Bulk 1-hop neighborhood — body: { entityIds: [...] }
+app.post('/graph/neighbors', (req, res) => {
+    const { entityIds } = req.body;
+    if (!Array.isArray(entityIds) || entityIds.length === 0) {
+        return res.status(400).json({ error: 'entityIds array is required' });
+    }
+    res.json(graph.getEntityNeighbors(entityIds.map(Number)));
+});
+
+app.post('/episodes/by-entities', (req, res) => {
+    const { entityIds } = req.body;
+    if (!Array.isArray(entityIds) || entityIds.length === 0) {
+        return res.status(400).json({ error: 'entityIds array is required' });
+    }
+    res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
+});
+
 /*********************************** */
 /********** Project Routes ********** */
 /*********************************** */
@@ -243,7 +269,7 @@ app.post('/projects', (req, res) => {
  try {
    res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to create project', detail: err.message });
  }
 });

@@ -251,6 +277,35 @@ app.get('/projects', (req, res) => {
  res.json(getProjects());
 });

+// Generate (or regenerate) a project overview summary on demand
+app.post('/projects/:id/summarize', async (req, res) => {
+    const project = getProject(Number(req.params.id));
+    if (!project) return res.status(404).json({ error: 'Project not found' });
+
+    try {
+        const summary = await generateAndStoreProjectSummary(Number(req.params.id));
+        res.status(201).json(summary);
+    } catch (err) {
+        if (err.message.includes('No session summaries or episodes')) {
+            return res.status(422).json({ error: err.message });
+        }
+        res.status(500).json({ error: 'Failed to generate project summary', detail: err.message });
+    }
+});
+
+// Get the current project overview summary
+app.get('/projects/:id/overview', async (req, res) => {
+    const { getProjectOverviewSummary } = require('./db/summaries');
+    const summary = getProjectOverviewSummary(Number(req.params.id));
+    // 200 with null is fine — frontend can handle "no overview yet" gracefully
+    res.json(summary ?? null);
+});
+
+// Get summaries for a project
+app.get('/projects/:id/summaries', (req, res) => {
+    res.json(getSummariesByProject(req.params.id));
+});
+
 app.get('/projects/:id', (req, res) => {
  const project = getProject(req.params.id);
  if (!project) return res.status(404).json({ error: 'Not found' });
@@ -271,6 +326,10 @@ app.delete('/projects/:id', (req, res) => {
 });


+
+
+
+
 /*********************************** */
 /********** Summary Routes ********** */
 /*********************************** */
@@ -285,7 +344,7 @@ app.post('/summaries', (req, res) => {
        const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
        res.status(201).json(summary);
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to create summary', detail: err.message });
    }
 });

@@ -294,11 +353,6 @@ app.get('/sessions/:id/summaries', (req, res) => {
    res.json(getSummariesBySession(req.params.id));
 });

-// Get summaries for a project
-app.get('/projects/:id/summaries', (req, res) => {
-    res.json(getSummariesByProject(req.params.id));
-});
-
 // Update a summary (for cumulative updates)
 app.patch('/summaries/:id', (req, res) => {
    const summary = getSummary(req.params.id);
@@ -318,5 +372,5 @@ app.delete('/summaries/:id', (req, res) => {
 /********** Start Server ********** */
 /********************************** */
 app.listen(PORT, () => {
-    console.log(`Memory Service is running on port ${PORT}`);
+    logger.info(`Memory Service is running on port ${PORT}`);
 });
--- a/packages/memory-service/src/semantic/index.js
+++ b/packages/memory-service/src/semantic/index.js
@@ -1,5 +1,5 @@
 const {QdrantClient} = require('@qdrant/js-client-rest');
-const {QDRANT, COLLECTIONS, getEnv} = require('@nexusai/shared');
+const {QDRANT, COLLECTIONS, getEnv, logger} = require('@nexusai/shared');

 let client;

@@ -24,9 +24,9 @@ async function initCollections() {
            distance: QDRANT.DISTANCE_METRIC
        }
      });
-      console.log(`Created Qdrant collection: ${name}`);
+      logger.info(`Created Qdrant collection: ${name}`);
    } else {
-      console.log(`Qdrant collection already exists: ${name}`);
+      logger.info(`Qdrant collection already exists: ${name}`);
    }
  }
 }
--- a/packages/memory-service/src/summarization/project.js
+++ b/packages/memory-service/src/summarization/project.js
@@ -0,0 +1,142 @@
+const { SERVICES, getEnv, SUMMARIES } = require('@nexusai/shared');
+const { 
+    getSessionSummariesForProject,
+    getProjectOverviewSummary,
+    createSummary,
+    updateSummary,
+
+ } = require('../db/summaries');
+ const { getEpisodesByProject } = require('../episodic');
+ const { getProject } = require('../db/projects');
+
+const EXTRACTION_URL   = getEnv('EXTRACTION_URL', 'http://localhost:11434');
+const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
+
+const MAX_SUMMARY_CHARS = SUMMARIES.MAX_SUMMARY_CHARS; // generous ceiling before we truncate input
+
+function buildProjectSummaryPrompt(projectName, sessionSummaries) {
+    let summaryBlock = sessionSummaries
+        .map((s, i) => `Session ${i + 1}:\n${s.content}`)
+        .join('\n\n');
+
+    // Guard against very large inputs — truncate oldest sessions if needed
+    if (summaryBlock.length > MAX_SUMMARY_CHARS) {
+        summaryBlock = summaryBlock.slice(-MAX_SUMMARY_CHARS);
+    }
+
+    return [
+        '<|im_start|>user',
+        `The following are session summaries from a project called "${projectName}".`,
+        'Write a project overview covering: goals, progress, key decisions, and current state.',
+        'Scale the length to the material — use multiple paragraphs for complex projects, a few sentences for simple ones.',
+        'Be comprehensive but avoid padding. Do not repeat the same point twice.',
+        'Write in third person. Output only the overview text, no headings or labels.',
+        '',
+    ].join('\n');
+}
+
+function buildProjectSummaryFromEpisodesPrompt(projectName, episodes) {
+    // Condense episodes into a readable block, truncating if needed
+    let episodeBlock = episodes
+        .map(ep => `User: ${ep.user_message}\nAssistant: ${ep.ai_response}`)
+        .join('\n\n');
+
+    if (episodeBlock.length > MAX_SUMMARY_CHARS) {
+        // Keep the most recent episodes — slice from the end
+        episodeBlock = episodeBlock.slice(-MAX_SUMMARY_CHARS);
+    }
+
+    return [
+        '<|im_start|>user',
+        `The following are conversations from a project called "${projectName}".`,
+        'Write a project overview covering: goals, progress, key decisions, and current state.',
+        'Scale the length to the material — use multiple paragraphs for complex projects, a few sentences for simple ones.',
+        'Be comprehensive but avoid padding. Do not repeat the same point twice.',
+        'Write in third person. Output only the overview text, no headings or labels.',
+        '',
+        episodeBlock,
+        '<|im_end|>',
+        '<|im_start|>assistant',
+    ].join('\n');
+}
+
+async function generateProjectSummaryFromEpisodes(projectName, episodes) {
+    const prompt = buildProjectSummaryFromEpisodesPrompt(projectName, episodes);
+
+    const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({
+            model: EXTRACTION_MODEL,
+            prompt,
+            stream: false,
+            options: { temperature: 0.2, num_predict: 1200 },
+        }),
+    });
+
+    if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
+    const data = await res.json();
+
+    const raw = data.response?.trim() ?? '';
+    return raw
+        .replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
+        .replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
+        .trim();
+}
+
+async function generateProjectSummary(projectName, sessionSummaries) {
+    const prompt = buildProjectSummaryPrompt(projectName, sessionSummaries);
+
+    const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({
+            model: EXTRACTION_MODEL,
+            prompt,
+            stream: false,
+            // No format: 'json' — we want free-text narrative, same as session summarization
+            options: { temperature: 0.2, num_predict: 1200 },
+        }),
+    });
+
+    if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
+    const data = await res.json();
+
+    const raw = data.response?.trim() ?? '';
+    return raw
+        .replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
+        .replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
+        .trim();
+}
+
+// Main entry point — called by the route handler
+async function generateAndStoreProjectSummary(projectId) {
+    const project = getProject(projectId);
+    if (!project) throw new Error('Project not found');
+
+    let content;
+    const sessionSummaries = getSessionSummariesForProject(projectId);
+
+    if (sessionSummaries.length > 0) {
+        // Preferred path — summarize the summaries
+        content = await generateProjectSummary(project.name, sessionSummaries);
+    } else {
+        // Fallback — summarize raw episodes directly
+        const episodes = getEpisodesByProject(projectId);
+        if (!episodes.length) {
+            throw new Error('No session summaries or episodes found for this project');
+        }
+        content = await generateProjectSummaryFromEpisodes(project.name, episodes);
+    }
+
+    if (!content) throw new Error('Model returned empty summary');
+
+    const existing = getProjectOverviewSummary(projectId);
+    if (existing) {
+        return updateSummary(existing.id, { content, tokenCount: null, episodeRange: null });
+    } else {
+        return createSummary({ projectId, content, sessionId: null });
+    }
+}
+
+module.exports = { generateAndStoreProjectSummary };
--- a/packages/orchestration-service/CLAUDE.md
+++ b/packages/orchestration-service/CLAUDE.md
@@ -0,0 +1,156 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
+
+## Running This Service
+
+```bash
+npm run orchestration             # From repo root (node src/index.js)
+npm -w packages/orchestration-service run dev   # With --watch
+```
+
+Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
+
+## Context Assembly (`src/chat/index.js`)
+
+`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
+
+1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
+2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
+3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
+4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
+   - No project: `must: [sessionId == this session]`
+   - Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
+   - Dedup against recent episode IDs before including.
+5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
+6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
+7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
+8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
+
+The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
+
+## Graph Context Format
+
+`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
+
+```
+- Alice (person): software engineer working on NexusAI
+  → works_on NexusAI (project)
+  → knows Bob (person)
+- NexusAI (project): AI assistant framework
+- Bob (person): Alice's colleague
+```
+
+Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
+
+## System Prompt Resolution
+
+Priority from highest to lowest:
+1. `project.system_prompt` (stored on the project row in memory-service)
+2. `settings.systemPrompt` (saved in `data/settings.json`)
+3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
+
+## Settings (`src/config/settings.js`)
+
+Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
+
+| Field | Constraint |
+|---|---|
+| `recentEpisodeLimit` | integer, 1–20 |
+| `semanticLimit` | integer, 1–20 |
+| `scoreThreshold` | number, 0–1 |
+| `temperature` | number, 0–2 |
+| `repeatPenalty` | number, 1–2 |
+| `topP` | number, 0–1 |
+| `topK` | integer, 1–100 |
+| `modelsFolderPath` | path must exist and be readable |
+| `systemPrompt` | string (trimmed); `null` reverts to shared default |
+
+`data/settings.json` is created on first save. Parent directories are created if missing.
+
+## Streaming SSE (`src/chat/index.js` — `chatStream`)
+
+The route sets SSE headers and delegates to `chatStream`, which:
+1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
+2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
+3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
+4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
+5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
+
+If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
+
+## Fire-and-Forget Tasks
+
+After every successful chat turn:
+- **Summarization** (`services/summarization.js` → `triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
+- **Auto-naming** (`chat/index.js` → `autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
+
+Both tasks catch all errors and log warnings without surfacing to the client.
+
+## Summarization Recency Guard
+
+`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
+
+When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
+
+## Qdrant Calls (Direct, Not Via Memory-Service)
+
+`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
+
+`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
+
+## Retrieval Fusion (`src/chat/index.js`)
+
+Three functions handle fusion — all pure or lightly async, all non-critical:
+
+- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
+- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
+- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
+
+FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
+
+> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
+
+## Graph Service Client (`src/services/graph.js`)
+
+Thin HTTP client for memory-service graph endpoints. One function:
+
+- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
+
+## Models Endpoint
+
+`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
+
+`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
+
+## Health Check
+
+`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
+
+## Background Model (qwen2.5:3b)
+Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
+
+## API Endpoints Quick Reference
+
+| Method | Path | Notes |
+|---|---|---|
+| GET | `/health` | Returns service URLs |
+| GET | `/health/services` | Parallel status of all dependencies |
+| POST | `/chat` | Blocking completion |
+| POST | `/chat/stream` | SSE streaming |
+| GET/PATCH | `/settings` | Persistent settings |
+| GET | `/models` | `.gguf` file scan |
+| GET | `/models/props` | llama-server model info |
+| GET | `/sessions` | Delegates to memory-service |
+| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
+| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
+| DELETE | `/sessions/:sessionId` | |
+| GET | `/episodes` | Delegates; supports `q` for FTS |
+| DELETE | `/episodes/:id` | Delegates |
+| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
+| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
+| GET | `/summaries/project/:projectId/overview` | |
+| GET | `/summaries/session/:sessionId` | Resolves external ID first |
+| GET | `/summaries/project/:projectId` | |
--- a/packages/orchestration-service/src/chat/index.js
+++ b/packages/orchestration-service/src/chat/index.js
@@ -2,34 +2,32 @@ const memory = require("../services/memory");
 const inference = require("../services/inference");
 const embedding = require("../services/embedding");
 const qdrant = require("../services/qdrant");
-const { ORCHESTRATION } = require("@nexusai/shared");
+const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
 const appSettings = require("../config/settings");
 const {triggerSummary} = require('../services/summarization')
+const graph = require('../services/graph');

-function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
+function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
    const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];

-  if (entities.length > 0) {
-    parts.push(
-      "Here is what you know about entities relevant to this conversation:",
-    );
-    for (const e of entities) {
-      parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
-    }
+    const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
+    if (graphText) {
+        parts.push("Here is what you know about entities relevant to this conversation and their connections:");
+        parts.push(graphText);
        parts.push("---");
    }

-  if (semanticEpisodes.length > 0) {
-    parts.push("Here are some relevant memories from earlier conversations:");
-    for (const ep of semanticEpisodes) {
+  if (selected.length > 0) {
+    parts.push("Relevant memories from earlier conversations:");
+    for (const ep of selected) {
      parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
    }
    parts.push("---");
  }

-  if (recentEpisodes.length > 0) {
-    parts.push(`Here are some relevant memories from your past conversations:`);
-    for (const ep of recentEpisodes) {
+  if (guaranteed.length > 0) {
+    parts.push("Recent conversation history (most recent exchanges):");
+    for (const ep of guaranteed) {
      parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
    }
    parts.push("--- End of recent memories ---\n");
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
  ].join("\n");
 }

+function formatGraphContext(nodes, edges) {
+    if (!nodes.length) return null;
+
+    const nodeMap = new Map(nodes.map(n => [n.id, n]));
+
+    // Build outbound adjacency
+    const outbound = new Map(nodes.map(n => [n.id, []]));
+    for (const edge of edges) {
+        if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
+            const target = nodeMap.get(edge.to_id);
+            outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
+        }
+    }
+
+    return nodes.map(n => {
+        const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
+        for (const conn of outbound.get(n.id) ?? []) lines.push(`  → ${conn}`);
+        return lines.join('\n');
+    }).join('\n');
+}
+
+
 async function autoNameSession(externalId, userMessage, aiResponse) {
  try {
    const prompt = buildNamingPrompt(userMessage, aiResponse);
@@ -64,12 +84,12 @@ async function autoNameSession(externalId, userMessage, aiResponse) {
    const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
    if (name) {
      await memory.updateSession(externalId, { name });
-      console.log(
+      logger.info(
        `[orchestration] Auto-named session "${externalId}": "${name}"`,
      );
    }
  } catch (err) {
-    console.warn(
+    logger.warn(
      "[orchestration] Auto-naming failed (non-critical):",
      err.message,
    );
@@ -99,7 +119,7 @@ async function getSemanticEpisodes(
    );
    return fetched.filter(Boolean);
  } catch (err) {
-    console.warn(
+    logger.warn(
      `[orchestration] Semantic search failed, continuing without: `,
      err.message,
    );
@@ -107,31 +127,142 @@ async function getSemanticEpisodes(
  }
 }

-async function getRelevantEntities(userMessage, projectId=null) {
+async function getRelevantEntities(userMessage, projectId = null) {
    try {
        const vector = await embedding.embed(userMessage);
        const results = await qdrant.searchEntities(vector, { projectId });
-    console.log(
-      "[orchestration] Entity search results:",
+        logger.info(
+            '[orchestration] Entity search results:',
            results.map((r) => ({ name: r.payload?.name, score: r.score })),
        );
-    return results.map((r) => r.payload).filter(Boolean);
+        // Include the Qdrant point ID (== SQLite entity ID) for graph traversal
+        return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
    } catch (err) {
-    console.warn(
-      "[orchestration] Entity search failed, continuing without:",
-      err.message,
-    );
+        logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
        return [];
    }
 }

-async function chat(externalId, userMessage, options = {}) {
-  const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt} =
-    appSettings.load();
+async function getFTSResults(userMessage, { limit, sessionIds }) {
+    try {
+        return await memory.searchEpisodes(userMessage, { limit, sessionIds });
+    } catch (err) {
+        logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
+        return [];
+    }
+}
+
+// Returns {episode, score}[] — scores needed for buildScoredPool downstream
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K;
+    const scores = new Map();
+
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });
+        }
+    });
+
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit);
+
+}
+
+function estimateTokens(episode) {
+    return episode.token_count
+        ?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
+}
+
+function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
+    const k = RETRIEVAL.RRF_K;
+    const pool = new Map(); // episode.id → {episode, score}
+
+    for (const { episode, score } of fusedWithScores) {
+        pool.set(episode.id, { episode, score });
+    }
+
+    recentEpisodes.forEach((ep, i) => {
+        const recencyScore = 1.0 / (k + i + 1);
+        if (pool.has(ep.id)) {
+            pool.get(ep.id).score += recencyScore;
+        } else {
+            pool.set(ep.id, { episode: ep, score: recencyScore });
+        }
+    });
+
+    for (const id of entityBoostedIds) {
+        if (pool.has(id)) pool.get(id).score += entityWeight;
+    }
+
+    return [...pool.values()].sort((a, b) => b.score - a.score);
+}
+
+function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
+    let budget = contextBudget;
+    const sortByTime = (a, b) => a.created_at - b.created_at;
+
+    // Guarantee floor: always include the N most recent episodes
+    const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
+    const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
+    for (const ep of guaranteed) budget -= estimateTokens(ep);
+
+    // Fill remaining budget from scored pool, highest-priority first
+    const selected = [];
+    for (const { episode } of scoredPool) {
+        if (guaranteedIds.has(episode.id)) continue;
+        const cost = estimateTokens(episode);
+
+        // // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
+        if (budget - cost < 0) break;
+        selected.push(episode);
+        budget -= cost;
+    }
+
+    return {
+        guaranteed: [...guaranteed].sort(sortByTime),
+        selected:   selected.sort(sortByTime),
+    };
+}
+
+
+async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
+    const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
+    const ftsSessionIds = projectSessionIds ?? [session.id];
+
+    const ftsPromise = keywordWeight > 0
+        //  FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
+        ? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
+        : Promise.resolve([]);
+
+    const [semanticEps, rawKeywordEps] = await Promise.all([
+        getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
+        ftsPromise,
+    ]);
+
+    const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
+    return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
+}
+
+async function assembleContext(externalId, userMessage) {
+    const settings = appSettings.load();
+    const { recentEpisodeLimit, semanticLimit, scoreThreshold,
+            temperature, repeatPenalty, topP, topK, systemPrompt,
+            semanticWeight, keywordWeight,
+            contextBudget, entityWeight, minRecentEpisodes } = settings;
+
    // 1. Resolve or create session
    let session = await memory.getSessionByExternalId(externalId);
    if (!session) session = await memory.createSession(externalId);

+    // 2. Resolve project context
    let projectSessionIds = null;
    let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
    if (session.project_id) {
@@ -139,73 +270,85 @@ async function chat(externalId, userMessage, options = {}) {
            const project = await memory.getProject(session.project_id);
            if (project) {
                const projectSessions = await memory.getProjectSessions(session.project_id);
-        if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
-        projectSessionIds = projectSessions.map((s) => s.id);
+                if (project.system_prompt) activeSystemPrompt = project.system_prompt;
+                projectSessionIds = projectSessions.map(s => s.id);
            }
        } catch (err) {
-      console.warn(
-        "[orchestration] Failed to resolve project context:",
-        err.message,
-      );
+            logger.warn('[orchestration] Failed to resolve project context:', err.message);
        }
    }
-  // 2. Fetch recent episodes for context
-  const recentEpisodes = await memory.getRecentEpisodes(
-    session.id,
-    recentEpisodeLimit,
-  );
+
+    // 3. Fetch recent episodes
+    const recentEpisodes = await memory.getRecentEpisodes(session.id, recentEpisodeLimit);
    const isFirstMessage = recentEpisodes.length === 0;
-  const recentIds = new Set(recentEpisodes.map((e) => e.id));
+    const recentIds = new Set(recentEpisodes.map(e => e.id));

-  // 3. Semantic Search
-  const semanticEpisodes = await getSemanticEpisodes(
-    userMessage,
-    session.id,
-    recentIds,
-    projectSessionIds,
-    { semanticLimit, scoreThreshold },
-  );
+    // 4. Fused retrieval + entity search in parallel (both are independent)
+    const [fusedWithScores, entityResults] = await Promise.all([
+        getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
+        getRelevantEntities(userMessage, session.project_id ?? null),
+    ]);

-  // 3b. Entity Search
-  const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
+    // 5. Entity-linked episode IDs for scoring bonus
+    const entityIds = entityResults.map(e => e.id);
+    let entityBoostedIds = new Set();
+    if (entityIds.length > 0) {
+        try {
+            const result = await memory.getEpisodesByEntities(entityIds);
+            entityBoostedIds = new Set(result.episodeIds);
+        } catch (err) {
+            logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
+        }
+    }

-  // 4. Assemble prompt
-  const prompt = buildPrompt(
-    recentEpisodes,
-    semanticEpisodes,
-    entities,
-    userMessage,
-    activeSystemPrompt,
-  );
+    // 6. Build unified scored pool and select within token budget
+    const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
+    const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);

-  // 5. Run inference
-  const result = await inference.complete(prompt, {...options, temperature, repeatPenalty, topP, topK});
+    // 7. Graph neighborhood expansion
+    let neighborhood = { nodes: [], edges: [] };
+    if (entityIds.length > 0) {
+        try {
+            neighborhood = await graph.getNeighbors(entityIds);
+        } catch (err) {
+            logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
+            neighborhood = { nodes: entityResults, edges: [] };
+        }
+    }

-  // 6. Write episode back to memory
-  memory
-    .createEpisode(
-      session.id,
-      userMessage,
-      result.text,
+    // 8. Assemble prompt
+    const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
+
+    return {
+        session,
+        prompt,
+        isFirstMessage,
+        inferenceOptions: { temperature, repeatPenalty, topP, topK },
+    };
+}
+
+async function chat(externalId, userMessage, options = {}) {
+    const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
+
+    const result = await inference.complete(prompt, { ...options, ...inferenceOptions });
+
+    try {
+        await memory.createEpisode(
+            session.id, userMessage, result.text,
            (result.evalCount || 0) + (result.promptEvalCount || 0),
            session.project_id ?? null,
-    )
-    .catch((err) =>
-      console.error(`[orchestration] Failed to save episode`, err.message),
        );
+    } catch (err) {
+        logger.error('[orchestration] Failed to save episode:', err.message);
+    }

-  // 7. Trigger summarization check (fire-and-forget)
-  // Pass full episodes list so summarization can sum tokens accurately
    const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
    triggerSummary(session, allEpisodes);

-  
-    // 8. Auto-name on first message
    if (isFirstMessage && !session.name) {
-    autoNameSession(externalId, userMessage, result.text).catch(() => {}); // already logged inside autoNameSession
+        autoNameSession(externalId, userMessage, result.text).catch(() => {});
    }

-  // 9. Return response
    return {
        sessionId: externalId,
        response: result.text,
@@ -216,115 +359,44 @@ async function chat(externalId, userMessage, options = {}) {

 async function chatStream(externalId, userMessage, onChunk, options = {}) {
    try {
-    const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt } = appSettings.load();
-    let session = await memory.getSessionByExternalId(externalId);
-    if (!session) session = await memory.createSession(externalId);
+        const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);

-    let projectSessionIds = null;
-    let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
-    if (session.project_id) {
-      try {
-        const project = await memory.getProject(session.project_id);
-        if (project) {
-          const projectSessions = await memory.getProjectSessions(
-            session.project_id,
-          );
-          projectSessionIds = projectSessions.map((s) => s.id);
-          if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
-        }
+        const res = await inference.completeStream(prompt, { ...options, ...inferenceOptions });

-      } catch (err) {
-        console.warn(
-          "[orchestration] Failed to resolve project context:",
-          err.message,
-        );
-      }
-    }
-
-    const recentEpisodes = await memory.getRecentEpisodes(
-      session.id,
-      recentEpisodeLimit,
-    );
-    const isFirstMessage = recentEpisodes.length === 0;
-    const recentIds = new Set(recentEpisodes.map((e) => e.id));
-    const semanticEpisodes = await getSemanticEpisodes(
-      userMessage,
-      session.id,
-      recentIds,
-      projectSessionIds,
-      {semanticLimit, scoreThreshold }
-    );
-
-    const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
-
-    const prompt = buildPrompt(
-      recentEpisodes,
-      semanticEpisodes,
-      entities,
-      userMessage,
-      activeSystemPrompt,
-    );
-    const res = await inference.completeStream(prompt, {...options, temperature, repeatPenalty, topP, topK});
-
-    let fullText = "";
-    let model = "";
-    let tokenCount = 0;
-    let buffer = "";
+        let fullText = '', model = '', tokenCount = 0, buffer = '';

        for await (const chunk of res.body) {
-      buffer += Buffer.from(chunk).toString("utf8");
-
-      const events = buffer.split("\n\n");
-      buffer = events.pop() || "";
+            buffer += Buffer.from(chunk).toString('utf8');
+            const events = buffer.split('\n\n');
+            buffer = events.pop() || '';

            for (const event of events) {
-        const lines = event.split("\n");
-        const dataLines = lines
-          .filter((line) => line.startsWith("data: "))
-          .map((line) => line.slice(6));
+                const dataLines = event.split('\n')
+                    .filter(line => line.startsWith('data: '))
+                    .map(line => line.slice(6));

-        if (dataLines.length === 0) continue;
-
-        const raw = dataLines.join("\n").trim();
-        if (raw === "[DONE]") continue;
+                if (!dataLines.length) continue;
+                const raw = dataLines.join('\n').trim();
+                if (raw === '[DONE]') continue;

                try {
                    const data = JSON.parse(raw);
-
-          if (data.response) {
-            fullText += data.response;
-            onChunk(data.response);
-          }
-
+                    if (data.response) { fullText += data.response; onChunk(data.response); }
                    if (data.model) model = data.model;
-          if (data.done && data.tokenCount !== undefined) {
-            tokenCount = data.tokenCount;
-          }
-
-          if (data.error) {
-            throw new Error(data.error);
-          }
+                    if (data.done && data.tokenCount !== undefined) tokenCount = data.tokenCount;
+                    if (data.error) throw new Error(data.error);
                } catch (err) {
-          console.error(
-            "[orchestration] Failed to parse inference SSE event:",
-            raw,
-            err.message,
-          );
+                    logger.error('[orchestration] Failed to parse SSE event:', raw, err.message);
                }
            }
        }

-    console.log("[orchestration] final streamed text length:", fullText.length);
-
        if (fullText.trim()) {
-      console.log('[chat] tokenCount before save:', tokenCount);
            await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
            const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
            triggerSummary(session, allEpisodes);
        } else {
-      console.warn(
-        "[orchestration] Stream finished with no assistant text; episode not saved",
-      );
+            logger.warn('[orchestration] Stream finished with no assistant text; episode not saved');
        }

        if (isFirstMessage && !session.name) {
@@ -333,11 +405,7 @@ async function chatStream(externalId, userMessage, onChunk, options = {}) {

        return { model, tokenCount };
    } catch (err) {
-    console.error(
-      "[orchestration] chatStream fatal error:",
-      err.message,
-      err.stack,
-    );
+        logger.error('[orchestration] chatStream fatal error:', err.message, err.stack);
        throw err;
    }
 }
--- a/packages/orchestration-service/src/config/settings.js
+++ b/packages/orchestration-service/src/config/settings.js
@@ -1,6 +1,6 @@
 const fs = require('fs');
 const path = require('path');
-const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
+const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');

 const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');

@@ -14,6 +14,11 @@ const DEFAULTS = {
  topP:                 INFERENCE_DEFAULTS.TOP_P,
  topK:                 INFERENCE_DEFAULTS.TOP_K,
  systemPrompt:         ORCHESTRATION.SYSTEM_PROMPT,
+  semanticWeight:       RETRIEVAL.SEMANTIC_WEIGHT,
+  keywordWeight:        RETRIEVAL.KEYWORD_WEIGHT,
+  contextBudget:        ORCHESTRATION.CONTEXT_BUDGET,
+  entityWeight:         ORCHESTRATION.ENTITY_WEIGHT,
+  minRecentEpisodes:    ORCHESTRATION.MIN_RECENT_EPISODES,
 };

 function load() {
--- a/packages/orchestration-service/src/index.js
+++ b/packages/orchestration-service/src/index.js
@@ -1,6 +1,6 @@
 require ('dotenv').config();
 const express = require('express');
-const {getEnv, PORTS, SERVICES, ORCHESTRATION} = require('@nexusai/shared');
+const {getEnv, PORTS, SERVICES, ORCHESTRATION, logger} = require('@nexusai/shared');

 /**** ROUTERS *** */
 const chatRouter = require('./routes/chat');
@@ -10,11 +10,12 @@ const projectsRouter = require('./routes/projects');
 const episodesRouter = require('./routes/episodes');
 const settingsRouter = require('./routes/settings');
 const healthRouter = require('./routes/health');
+const summariesRouter = require('./routes/summaries')

 const cors = require('cors');

 const app = express();
-app.use(express.json());
+app.use(express.json({ limit: '2mb' }));

 app.use(cors({
    origin: [
@@ -48,8 +49,9 @@ app.use('/projects', projectsRouter);
 app.use('/episodes', episodesRouter);
 app.use('/settings', settingsRouter);
 app.use('/health/services', healthRouter);
+app.use('/summaries', summariesRouter)

 /******* Start the server ************/
 app.listen(PORT, () => {
-    console.log(`Orchestration Service is running on port ${PORT}`);
+    logger.info(`Orchestration Service is running on port ${PORT}`);
 });
--- a/packages/orchestration-service/src/routes/chat.js
+++ b/packages/orchestration-service/src/routes/chat.js
@@ -1,6 +1,8 @@
 const { Router } = require('express')
 const { chat, chatStream } = require('../chat/index');
 const memory = require('../services/memory')
+const logger = require('@nexusai/shared');
+

 const router = Router();

@@ -17,8 +19,8 @@ router.post('/', async (req, res) => {
        });
        res.json(result)
    } catch (err) {
-        console.error(`[orchestration] chat error: `, err.message)
-        res.status(500).json ({ error: err.message})
+        logger.error(`[orchestration] chat error: `, err.message)
+        res.status(500).json ({ error: 'Chat failed', detail: err.message })
    }
 });

--- a/packages/orchestration-service/src/routes/episodes.js
+++ b/packages/orchestration-service/src/routes/episodes.js
@@ -9,7 +9,7 @@ router.get('/', async (req, res) => {
    const result = await memory.getEpisodes({ limit, offset, sessionId, q });
    res.json(result);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch episodes', detail: err.message });
  }
 });

@@ -18,7 +18,7 @@ router.delete('/:id', async (req, res) => {
    await memory.deleteEpisode(req.params.id);
    res.status(204).send();
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to delete episode', detail: err.message });
  }
 });

--- a/packages/orchestration-service/src/routes/models.js
+++ b/packages/orchestration-service/src/routes/models.js
@@ -4,7 +4,7 @@ const fs = require('fs');
 const path = require('path');
 const appSettings = require('../config/settings');

-const { getEnv, LLAMACPP } = require('@nexusai/shared');
+const { getEnv, LLAMACPP, logger } = require('@nexusai/shared');
 const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);

 router.get('/', (req, res) => {
@@ -38,7 +38,7 @@ router.get('/', (req, res) => {

    res.json(models);
  } catch (err) {
-    console.error('[models] Failed to scan folder:', err.message);
+    logger.error('[models] Failed to scan folder:', err.message);
    res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
  }
 });
@@ -53,7 +53,7 @@ router.get('/props', async (req, res) => {
      modelAlias: data.model_alias,
    });
  } catch (err) {
-    console.error('[models/props]', err.message);
+    logger.error('[models/props]', err.message);
    res.status(503).json({ error: 'Could not reach llama-server' });
  }
 });
--- a/packages/orchestration-service/src/routes/projects.js
+++ b/packages/orchestration-service/src/routes/projects.js
@@ -7,7 +7,7 @@ router.get('/', async (req, res) => {
    try {
        res.json(await memory.getProjects());
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to fetch projects', detail: err.message });
    }
 });

@@ -17,7 +17,7 @@ router.post('/', async (req, res) => {
    try {
        res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to create project', detail: err.message });
    }
 });

@@ -25,7 +25,7 @@ router.patch('/:id', async (req, res) => {
  try {
    res.json(await memory.updateProject(req.params.id, req.body));
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to update project', detail: err.message });
  }
 });

@@ -34,7 +34,7 @@ router.delete('/:id', async (req, res) => {
        await memory.deleteProject(req.params.id);
        res.status(204).send();
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to delete project', detail: err.message });
    }
 });

--- a/packages/orchestration-service/src/routes/sessions.js
+++ b/packages/orchestration-service/src/routes/sessions.js
@@ -15,7 +15,7 @@ router.get('/:sessionId/history', async (req, res) => {
    const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
    res.json({ sessionId, episodes: history });
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch session history', detail: err.message });
  }
 });

@@ -26,7 +26,7 @@ router.get('/', async (req, res) => {
    const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
    res.json(sessions);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to fetch sessions', detail: err.message });
  }
 });

@@ -45,7 +45,7 @@ router.patch('/:sessionId', async (req, res) => {
    });
    res.json(session);
  } catch (err) {
-    res.status(500).json({ error: err.message });
+    res.status(500).json({ error: 'Failed to update session', detail: err.message });
  }
 });

@@ -54,7 +54,7 @@ router.delete('/:sessionId', async (req, res) => {
        await memory.deleteSession(req.params.sessionId);
        res.status(204).send();
    } catch (err) {
-        res.status(500).json({ error: err.message });
+        res.status(500).json({ error: 'Failed to delete session', detail: err.message });
    }
 });

--- a/packages/orchestration-service/src/routes/settings.js
+++ b/packages/orchestration-service/src/routes/settings.js
@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
  updates.systemPrompt = val.trim() || null; // null reverts to default
 }

+  if (req.body.semanticWeight !== undefined) {
+    const val = Number(req.body.semanticWeight);
+    if (isNaN(val) || val < 0 || val > 5)
+      return res.status(400).json({ error: 'semanticWeight must be 0–5' });
+    updates.semanticWeight = val;
+  }
+
+  if (req.body.keywordWeight !== undefined) {
+    const val = Number(req.body.keywordWeight);
+    if (isNaN(val) || val < 0 || val > 5)
+      return res.status(400).json({ error: 'keywordWeight must be 0–5' });
+    updates.keywordWeight = val;
+  }
+
+  if (req.body.contextBudget !== undefined) {
+    const val = Number(req.body.contextBudget);
+    if (!Number.isInteger(val) || val < 512 || val > 32768)
+        return res.status(400).json({ error: 'contextBudget must be 512–32768' });
+    updates.contextBudget = val;
+  }
+
+  if (req.body.entityWeight !== undefined) {
+      const val = Number(req.body.entityWeight);
+      if (isNaN(val) || val < 0 || val > 2)
+          return res.status(400).json({ error: 'entityWeight must be 0–2' });
+      updates.entityWeight = val;
+  }
+
+  if (req.body.minRecentEpisodes !== undefined) {
+      const val = Number(req.body.minRecentEpisodes);
+      if (!Number.isInteger(val) || val < 0 || val > 10)
+          return res.status(400).json({ error: 'minRecentEpisodes must be 0–10' });
+      updates.minRecentEpisodes = val;
+  }
+
  res.json(settings.save(updates));
 });

--- a/packages/orchestration-service/src/routes/summaries.js
+++ b/packages/orchestration-service/src/routes/summaries.js
@@ -0,0 +1,48 @@
+const { Router } = require('express');
+const memory = require('../services/memory');
+
+const router = Router();
+
+// Trigger on-demand project summary generation
+router.post('/project/:projectId/generate', async (req, res) => {
+    try {
+        const summary = await memory.generateProjectSummary(req.params.projectId);
+        res.status(201).json(summary);
+    } catch (err) {
+        // Pass through 422 from memory-service ("no session summaries yet")
+        const status = err.message.includes('422') ? 422 : 500;
+        res.status(status).json({ error: err.message });
+    }
+});
+
+// Get current project overview summary
+router.get('/project/:projectId/overview', async (req, res) => {
+    try {
+        const summary = await memory.getProjectOverviewSummary(req.params.projectId);
+        res.json(summary);
+    } catch (err) {
+        res.status(500).json({ error: 'Failed to fetch project overview summary', detail: err.message });
+    }
+});
+
+router.get('/session/:sessionId', async (req, res) => {
+    try {
+        const session = await memory.getSessionByExternalId(req.params.sessionId);
+        if (!session) return res.status(404).json({ error: 'Session not found' });
+        const summaries = await memory.getSummariesBySession(session.id);
+        res.json(summaries);
+    } catch (err) {
+        res.status(500).json({ error: 'Failed to fetch session summaries', detail: err.message });
+    }
+});
+
+router.get('/project/:projectId', async (req, res) => {
+    try {
+        const summaries = await memory.getSummariesByProject(req.params.projectId);
+        res.json(summaries);
+    } catch (err) {
+        res.status(500).json({ error: 'Failed to fetch project summaries', detail: err.message });
+    }
+});
+
+module.exports = router;
--- a/packages/orchestration-service/src/services/graph.js
+++ b/packages/orchestration-service/src/services/graph.js
@@ -0,0 +1,15 @@
+const { getEnv, SERVICES } = require('@nexusai/shared');
+
+const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
+
+async function getNeighbors(entityIds) {
+    const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ entityIds }),
+    });
+    if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
+    return res.json();
+}
+
+module.exports = { getNeighbors };
--- a/packages/orchestration-service/src/services/memory.js
+++ b/packages/orchestration-service/src/services/memory.js
@@ -176,6 +176,46 @@ async function updateSummary(id, { content, tokenCount, episodeRange }) {
    return res.json();
 }

+async function getSummariesByProject(projectId) {
+    const res = await fetch(`${BASE_URL}/projects/${projectId}/summaries`);
+    if (!res.ok) throw new Error(`Failed to fetch summaries: ${res.status}`);
+    return res.json();
+}
+
+async function generateProjectSummary(projectId) {
+    const res = await fetch(`${BASE_URL}/projects/${projectId}/summarize`, {
+        method: 'POST',
+    });
+    if (!res.ok) throw new Error(`Failed to generate project summary: ${res.status}`);
+    return res.json();
+}
+
+async function getProjectOverviewSummary(projectId) {
+    const res = await fetch(`${BASE_URL}/projects/${projectId}/overview`);
+    if (!res.ok) throw new Error(`Failed to fetch project overview: ${res.status}`);
+    return res.json(); // null if none exists yet
+}
+
+async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
+    const url = new URL(`${BASE_URL}/episodes/search`);
+    url.searchParams.set('q', query);
+    url.searchParams.set('limit', limit);
+    if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
+    const res = await fetch(url.toString());
+    if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
+    return res.json();
+}
+
+async function getEpisodesByEntities(entityIds) {
+    const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({ entityIds }),
+    });
+    if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
+    return res.json(); // { episodeIds: [...] }
+}
+
 module.exports = {
    getSessionByExternalId,
    createSession,
@@ -197,4 +237,9 @@ module.exports = {
    getSummariesBySession,
    createSummary,
    updateSummary,
+    getSummariesByProject,
+    generateProjectSummary,
+    getProjectOverviewSummary,
+    searchEpisodes,
+    getEpisodesByEntities,
 }
--- a/packages/orchestration-service/src/services/summarization.js
+++ b/packages/orchestration-service/src/services/summarization.js
@@ -1,4 +1,4 @@
-const { getEnv, SERVICES, SUMMARIES } = require('@nexusai/shared');
+const { getEnv, SERVICES, SUMMARIES, logger } = require('@nexusai/shared');

 const EXTRACTION_URL  = getEnv('EXTRACTION_URL', 'http://localhost:11434');
 const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
@@ -9,34 +9,37 @@ const MAX_SUMMARY_TOKENS = parseInt(getEnv('SUMMARY_MAX_TOKENS', SUMMARIES.MAX_S
 const MIN_EPISODES_SINCE = parseInt(getEnv('SUMMARY_MIN_EPISODES', SUMMARIES.MIN_EPISODES_SINCE));

 function buildSummaryPrompt(episodes, existingSummary = null) {
-    const MAX_CHARS = 3000; // truncate input to keep Phi3 focused
+    const MAX_CHARS = 3000;
    let context = episodes
        .map(ep => `User: ${ep.user_message}\nAssistant: ${ep.ai_response}`)
        .join('\n\n');

-    // Truncate from the start if too long — keep the most recent exchanges
    if (context.length > MAX_CHARS) {
        context = context.slice(-MAX_CHARS);
    }

    const instruction = existingSummary
-        ? `Update the summary below to include the new exchanges. Write 3-5 sentences in third person. Output only the updated summary text, nothing else.
+        ? `Update the summary below to incorporate the new exchanges.
+Write 3-5 sentences in third person. Do not quote directly — paraphrase only.
+Do not include greetings, sign-offs, or filler. Output only the updated summary text.

 Previous summary:
 ${existingSummary}

 New exchanges:
 ${context}`
-        : `Summarize the conversation below in 3-5 sentences. Write in third person. Output only the summary text, nothing else.
+        : `Summarize the conversation below in 3-5 sentences.
+Write in third person. Do not quote directly — paraphrase only.
+Do not include greetings, sign-offs, or filler. Output only the summary text.

 Conversation:
 ${context}`;

    return [
-        '<|user|>',
+        '<|im_start|>user',   // ChatML for qwen2.5
        instruction,
-        '<|end|>',
-        '<|assistant|>',
+        '<|im_end|>',
+        '<|im_start|>assistant',
    ].join('\n');
 }

@@ -52,24 +55,31 @@ async function generateSummary(episodes, existingSummary = null) {
            stream: false,
            options: {
                temperature: 0.2,   // slightly higher than entities — summaries benefit from some fluency
-                num_predict: 200,   // generous but bounded — keeps summaries from running long
+                num_predict: 500,   // generous but bounded — keeps summaries from running long
            },
        }),
    });

    if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
    const data = await res.json();
-    return data.response?.trim() ?? '';
+
+
+    const raw = data.response?.trim() ?? '';
+    // Strip any leaked ChatML tokens Qwen echoes back
+    const content = raw
+        .replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
+        .replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
+        .trim();
+    return content;
 }

 async function maybeSummarize(session, allEpisodes) {
    // 1. Sum total tokens for this session
    const totalTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
    if (totalTokens < THRESHOLD_TOKENS) return; // under threshold — nothing to do
-    console.log('[summarization] fetching existing summaries...');
+
    // 2. Fetch existing summaries for session
    const summariesRes = await fetch(`${MEMORY_URL}/sessions/${session.id}/summaries`);
-    console.log('[summarization] summaries fetch status:', summariesRes.status);
    if (!summariesRes.ok) return;
    const summaries = await summariesRes.json();

@@ -83,19 +93,18 @@ async function maybeSummarize(session, allEpisodes) {
        if (newEpisodes.length < MIN_EPISODES_SINCE) return;
    }

-    // 4. Determine episode range string  e.g. "1-42"
-    const ids = allEpisodes.map(ep => ep.id).sort((a,b) => a - b);
-    const episodeRange = `${ids.at(0)}-${ids.at(-1)}`;
-    const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
-
-    // 5. Generate summary — pass existing content if updating
+    // 4. Determine episodes to summarize
    const episodesToSummarize = latest
        ? allEpisodes.filter(ep => ep.id > lastCoveredId)
        : allEpisodes;

+    // 5. Determine episode range from the episodes actually being summarized
+    const summarizedIds = episodesToSummarize.map(ep => ep.id).sort((a,b) => a - b);
+    const episodeRange = `${summarizedIds.at(0)}-${summarizedIds.at(-1)}`;
+    const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
+
    // add temporarily before the generateSummary call
-    console.log('[summarization] episodes to summarize:', episodesToSummarize.length);
-    console.log('[summarization] total chars:', episodesToSummarize.reduce((s, ep) => s + ep.user_message.length + ep.ai_response.length, 0));
+    logger.debug('[summarization] episodes to summarize:', episodesToSummarize.length);

    const content = await generateSummary(
        episodesToSummarize,
@@ -117,7 +126,7 @@ async function maybeSummarize(session, allEpisodes) {
                episodeRange,
            }),
        });
-        console.log(`[summarization] Created new summary for session ${session.id}`);
+        logger.debug(`[summarization] Created new summary for session ${session.id}`);
    } else {
        await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
            method: 'PATCH',
@@ -128,14 +137,14 @@ async function maybeSummarize(session, allEpisodes) {
                episodeRange,
            }),
        });
-        console.log(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
+        logger.debug(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
    }
 }

 async function triggerSummary(session, allEpisodes) {
    // Intentionally fire-and-forget — caller doesn't await this
    maybeSummarize(session, allEpisodes).catch(err =>
-        console.warn('[summarization] Summary failed (non-critical):', err.message)
+        logger.warn('[summarization] Summary failed (non-critical):', err.message)
    );
 }

--- a/packages/shared/src/config/constants.js
+++ b/packages/shared/src/config/constants.js
@@ -24,10 +24,13 @@ const EPISODIC = {
 const ORCHESTRATION = {
    RECENT_EPISODE_LIMIT:   5,
    SEMANTIC_LIMIT:         5,
-    SCORE_THRESHOLD:        0.75,
+    SCORE_THRESHOLD:        0.5,
    ENTITIES_LIMIT:         5,
-    ENTITIES_THRESHOLD:     0.75,
+    ENTITIES_THRESHOLD:     0.55,
    TEMPERATURE:            0.7,
+    CONTEXT_BUDGET:         4096,
+    ENTITY_WEIGHT:          0.5,
+    MIN_RECENT_EPISODES:    2,
    CORS_ORIGIN:            'http://localhost:5173',
    SYSTEM_PROMPT:          `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
 }
@@ -73,7 +76,35 @@ const SUMMARIES = {
    THRESHOLD_TOKENS:   200,    //trigger summary when session hits this many tokens
    MAX_SUMMARY_TOKENS: 800,    //if existing summary exceeds this, create new instead of update
    MIN_EPISODES_SINCE: 5,      // don't resummarize until N new episodes since last summary
+    MAX_SUMMARY_CHARS:  8000,   // max chars to include from recent episodes when generating summary (to control prompt size)
+    MAX_PROJECT_EPISODE_LIMIT: 200, // max number of episodes to consider from the entire project when generating summary (to control prompt size)
 }
+
+const ENTITIES = {
+    TEMPERATURE:    0.1,    // Low temperature, more precise extraction, less creative
+    NUM_PREDICT:    1500,   // Max tokens to consider for entity extraction (e.g. recent conversation)
+    THRESHOLD:      0.55,   // Minimum confidence score for an extracted entity to be included in the results
+    PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
+    GRAPH_HOP_DEPTH: 1,     // Default traversal depth for neighborhood queries
+    TYPES: [
+        'person', 
+        'place', 
+        'project', 
+        'technology', 
+        'concept', 
+        'organization', 
+        'character', 
+        'event', 
+        'topic'
+    ],
+}
+
+const RETRIEVAL = {
+    RRF_K:              60,     // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
+    SEMANTIC_WEIGHT:    1.0,    // Weight applied to semantic (QDrant) results
+    KEYWORD_WEIGHT:     0,    // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
+}
+
 module.exports = {
    QDRANT,
    COLLECTIONS,
@@ -85,5 +116,7 @@ module.exports = {
    INFERENCE_DEFAULTS,
    SQLITE,
    ORCHESTRATION,
-    SUMMARIES
+    SUMMARIES,
+    ENTITIES,
+    RETRIEVAL,
 };
--- a/packages/shared/src/index.js
+++ b/packages/shared/src/index.js
@@ -1,6 +1,7 @@
 const {getEnv} = require('./config/env');
-const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES } = require('./config/constants');
+const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
 const {parseRow, formatEpisodeText} = require('./utils')
+const logger = require('./utils/logger');

 module.exports = {
    getEnv, 
@@ -17,4 +18,7 @@ module.exports = {
    parseRow,
    formatEpisodeText,
    SUMMARIES,
+    ENTITIES,
+    logger,
+    RETRIEVAL,
 };
--- a/packages/shared/src/utils/logger.js
+++ b/packages/shared/src/utils/logger.js
@@ -0,0 +1,12 @@
+const LEVELS = { error: 0, warn: 1, info: 2, debug: 3 };
+
+const current = LEVELS[process.env.LOG_LEVEL?.toLowerCase()] ?? LEVELS.info;
+
+const logger = {
+    error: (...args) => current >= LEVELS.error && console.error('[ERROR]', ...args),
+    warn:  (...args) => current >= LEVELS.warn  && console.warn( '[WARN]',  ...args),
+    info:  (...args) => current >= LEVELS.info  && console.log(  '[INFO]',  ...args),
+    debug: (...args) => current >= LEVELS.debug && console.log(  '[DEBUG]', ...args),
+};
+
+module.exports = logger;
--- a/test-fusion.js
+++ b/test-fusion.js
@@ -0,0 +1,67 @@
+// test-fusion.js
+const { RETRIEVAL } = require('./packages/shared/src/config/constants');
+
+function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
+    const k = RETRIEVAL.RRF_K;
+    const scores = new Map();
+    semanticEps.forEach((ep, i) => {
+        scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
+    });
+    keywordEps.forEach((ep, i) => {
+        const contrib = keywordWeight / (k + i + 1);
+        if (scores.has(ep.id)) {
+            scores.get(ep.id).score += contrib;
+        } else if (contrib > 0) {
+            scores.set(ep.id, { episode: ep, score: contrib });
+        }
+    });
+    return [...scores.values()]
+        .sort((a, b) => b.score - a.score)
+        .slice(0, limit)
+        .map(({ episode }) => episode);
+}
+
+// --- Test 1: episodes in both lists rank highest ---
+const semantic = [
+    { id: 1, user_message: 'ep1 — semantic only, rank 1' },
+    { id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
+    { id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
+];
+const keyword = [
+    { id: 3, user_message: 'ep3 — rank 1 FTS' },
+    { id: 2, user_message: 'ep2 — rank 2 FTS' },
+    { id: 4, user_message: 'ep4 — FTS only, rank 3' },
+];
+
+const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
+console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
+result.forEach((ep, i) => console.log(`  ${i + 1}. id=${ep.id} "${ep.user_message}"`));
+console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
+console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
+console.log('  PASS\n');
+
+// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
+const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
+console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
+result2.forEach((ep, i) => console.log(`  ${i + 1}. id=${ep.id}`));
+console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
+console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
+console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
+console.log('  PASS\n');
+
+// --- Test 3: limit is respected ---
+const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
+console.log('Test 3 — limit:2 should return exactly 2 results:');
+console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
+console.log('  PASS\n');
+
+// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
+const semOnly = [{ id: 10, user_message: 'sem' }];
+const ftsOnly = [{ id: 20, user_message: 'fts' }];
+const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
+console.log('Test 4 — no overlap, both should appear:');
+console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
+console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
+console.log('  PASS\n');
+
+console.log('All tests passed.');
Author	SHA1	Message	Date
Storme-bit	e4908193bd	smarter context assembly implementation	2026-04-27 21:41:32 -07:00
Storme-bit	b58a4e4692	minor clean up	2026-04-27 20:17:05 -07:00
Storme-bit	055683424d	retrieval fusion	2026-04-27 07:03:46 -07:00
Storme-bit	27ad614130	retrieval fusion	2026-04-27 05:56:23 -07:00
Storme-bit	8ade5c68ca	retrieval fusion	2026-04-27 05:46:01 -07:00
Storme-bit	49982a85de	retrieval fusion	2026-04-27 05:21:43 -07:00
Storme-bit	9c6c5c9a42	entity extraction prompt	2026-04-27 03:50:13 -07:00
Storme-bit	c9cbac87ac	knowledge graph entity fixes	2026-04-27 03:41:56 -07:00
Storme-bit	1a97b19280	roadmap phase 1 complete	2026-04-27 03:10:39 -07:00
Storme-bit	9fe8e568cf	roadmap phase 1 complete	2026-04-27 00:28:42 -07:00
Storme-bit	5ad01c6ad8	clean up	2026-04-27 00:14:51 -07:00
Storme-bit	aac0923351	Merge branch 'main' of http://192.168.0.205:3100/storme/nexusai	2026-04-27 00:10:16 -07:00
Storme-bit	54218894c0	logger clean up	2026-04-27 00:09:16 -07:00
Storme-bit	66a95f4479	logger clean up	2026-04-27 00:07:51 -07:00
storme	78476e166f	Delete .claude/settings.local.json	2026-04-27 06:57:49 +00:00
Storme-bit	696ead29f8	chat/index.js cleanup	2026-04-26 23:04:31 -07:00
Storme-bit	45db47a584	error response consistency, human readible1	2026-04-26 23:00:55 -07:00
Storme-bit	095c9a623e	error response consistency, human readible1	2026-04-26 23:00:18 -07:00
Storme-bit	f5011fddca	logger updates	2026-04-26 22:29:57 -07:00
Storme-bit	86e78cc4c6	logger updates	2026-04-26 22:28:54 -07:00
Storme-bit	c86b565eed	code cleanup/hardening	2026-04-26 21:59:16 -07:00
Storme-bit	be1c38b654	code cleanup/hardening	2026-04-26 21:57:39 -07:00
Storme-bit	4f3b18de08	code cleanup/hardening	2026-04-26 21:53:33 -07:00
Storme-bit	43fa12899c	NexusAI roadmap addition	2026-04-26 21:14:27 -07:00
Storme-bit	84f01ef209	NexusAI roadmap addition	2026-04-26 21:14:04 -07:00
Storme-bit	a50a748bcf	NexusAI roadmap addition	2026-04-26 21:13:15 -07:00
Storme-bit	32e8a83233	NexusAI roadmap addition	2026-04-26 21:08:19 -07:00
Storme-bit	855de6d0af	project summaries addition	2026-04-26 21:02:42 -07:00
Storme-bit	fcaf0e651f	project summaries addition	2026-04-26 19:11:40 -07:00
Storme-bit	6cdee72af2	project summaries addition	2026-04-26 18:59:28 -07:00
Storme-bit	4c6bd1df2d	project summaries addition	2026-04-26 18:57:25 -07:00
Storme-bit	2429fedb2c	code clean up pass	2026-04-26 18:18:40 -07:00
Storme-bit	bdc5947fcb	code clean up pass	2026-04-26 05:38:47 -07:00
Storme-bit	785047a824	code clean up pass	2026-04-26 05:19:31 -07:00
Storme-bit	acda21317b	documentation updates for entity extraction and summarization	2026-04-21 03:50:38 -07:00
Storme-bit	32365e67f4	summarization fix	2026-04-21 03:05:24 -07:00
Storme-bit	59918d5733	summaries chat client	2026-04-21 02:52:31 -07:00
Storme-bit	01f35b7b82	summaries chat client	2026-04-21 02:42:18 -07:00
Storme-bit	21a7e5f3b5	extraction error logging	2026-04-21 01:07:31 -07:00
Storme-bit	c81a1cb20e	extraction error logging	2026-04-21 00:35:48 -07:00
Storme-bit	781bf8a615	extraction error logging	2026-04-21 00:28:13 -07:00
Storme-bit	b44d35e7cb	extraction error logging	2026-04-21 00:27:28 -07:00
Storme-bit	22686fca3c	extraction error logging	2026-04-21 00:26:41 -07:00