update documentation

2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions
--- a/.vs/slnx.sqlite
+++ b/.vs/slnx.sqlite
--- a/.vs/slnx.sqlite-journal
+++ b/.vs/slnx.sqlite-journal
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,13 +1,23 @@
 # NexusAI Documentation
-## Contents
+## Architecture
 - [Architecture Overview](architecture/overview.md)
- [Services](services/)
+
-  - [Shared Package](services/shared.md)
+## Services
-  - [Memory Service](services/memory-service.md)
+
-  - [Embedding Service](services/embedding-service.md)
+- [Shared Package](services/shared.md)
-  - [Inference Service](services/inference-service.md)
+- [Memory Service](services/memory-service.md)
-  - [Orchestration Service](services/orchestration-service.md)
+- [Embedding Service](services/embedding-service.md)
-  - [Chat Client](services/chat-client.md)
+- [Inference Service](services/inference-service.md)
- [Deployment](deployment/homelab.md)
+- [Orchestration Service](services/orchestration-service.md)
 - [Chat Client](services/chat-client.md)
 ## Reference
 - [API Routes](reference/api-routes.md) — all HTTP endpoints across all services
 - [Memory Isolation](reference/memory-isolation.md) — project-scoped memory model
 ## Deployment
 - [Homelab](deployment/homelab.md)
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -1,56 +1,80 @@
 # Architecture Overview
-NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
+NexusAI is a modular, memory-centric AI assistant designed for persistent,
 context-aware conversations. It separates concerns across independent services
 that can be evolved and deployed separately.
 ## Core Design Principles
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
+- **Decoupled layers** — memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
+- **Hybrid retrieval** — semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources
+- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
 - **Home lab first** — services are distributed across nodes according to available hardware
 ## Memory Model
-Memory is split between SQLite and Qdrant, which work together as a pair:
+Memory is split between SQLite and Qdrant, which always work as a pair:
- **SQLite:** episodic interactions, entities, relationships, summaries
+- **SQLite** — episodic interactions, entities, relationships, summaries, sessions, projects
- **Qdrant:** vector embeddings for semantic similarity search
+- **Qdrant** — vector embeddings for semantic similarity search
-When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
+When recalling memory, Qdrant returns IDs and similarity scores, which are used
-full content from SQLite. Neither SQLite nor Qdrant work in isolation.
+to fetch full content from SQLite. Neither store works in isolation.
 Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
 enabling per-session and per-project filtering at search time. See
 `memory-isolation.md` for how project-scoped retrieval works.
 ## Hardware Layout
 | Node | Address | Role |
 |---|---|---|
-| Main PC | local | Primary inference (RTX A4000 16GB) |
+| Main PC | 192.168.0.79 | Primary inference — RTX A4000 16GB |
-| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
+| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
-| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
+| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
 ## Service Communication
-All services expose a REST HTTP API. The orchestration service is the single entry point —
+All services expose a REST HTTP API. The orchestration service is the single
-clients do not talk directly to the memory or inference services.
+entry point — clients never talk directly to memory or inference services.
 ```
-Client
+Client (browser)
-└─► Orchestration (:4000)
+└─► Caddy (HTTPS + Authelia SSO)
-    ├─► Chat Client (static files, /srv/nexusai)
+    └─► Orchestration (:4000) — Mini PC 2
-    ├─► Memory Service (:3002)
+        ├─► Memory Service (:3002) — Mini PC 1
-    │     ├─► Qdrant (:6333)
+        │     ├─► SQLite (local file)
-    │     └─► SQLite
+        │     └─► Qdrant (:6333) — Mini PC 1
-    ├─► Embedding Service (:3003)
+        ├─► Embedding Service (:3003) — Mini PC 1
-    │     └─► Ollama
+        │     └─► Ollama (:11434) — Mini PC 1
-    └─► Inference Service (:3001)
+        ├─► Inference Service (:3001) — Main PC
-          └─► Ollama
+        │     └─► llama-server (:8080) — Main PC
        └─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
 ```
 Note: Orchestration queries Qdrant directly for semantic search (bypassing
 the memory service) but always fetches full episode content from the memory
 service by ID after the vector search.
 ## Technology Choices
 | Concern | Choice | Reason |
 |---|---|---|
-| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
+| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
 | Package management | npm workspaces | Monorepo with shared code, no publishing needed |
 | Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
-| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
+| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
-| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
+| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
-| Version control | Gitea (self-hosted) | Code stays on local network |
+| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
 | Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
 | Version control | Gitea (self-hosted) | Code stays on local network |
 ## Current State
 The core four-service architecture is complete and operational. Key capabilities:
 - **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
 - **Projects** — sessions grouped with shared or isolated memory pools
 - **Auto-naming** — sessions named automatically from first exchange via inference
 - **Project-scoped semantic search** — Qdrant filtered by project session IDs
 - **Chat client** — view-based UI with sidebar navigation, project views, session management
--- a/docs/deployment/homelab.md
+++ b/docs/deployment/homelab.md
@@ -7,50 +7,73 @@ services appropriate for its hardware.
 ## Mini PC 1 — 192.168.0.81
-Runs: Qdrant, Memory Service, Embedding Service
+Runs: Qdrant, Memory Service, Embedding Service, Ollama
 ```bash
-ssh username@192.168.0.81
+ssh storme@192.168.0.81
 cd ~/nexusai
 docker compose -f docker-compose.mini1.yml up -d  # Qdrant
-npm run memory
+npm run memory      # port 3002
-npm run embedding
+npm run embedding   # port 3003
 ollama serve        # port 11434 — must bind 0.0.0.0 (OLLAMA_HOST=0.0.0.0)
 ```
 > Ollama must be started with `OLLAMA_HOST=0.0.0.0` to accept connections
 > from other services on the LAN. Without this, embedding requests from the
 > memory service will be refused.
 ## Mini PC 2 — 192.168.0.205
-Runs: Gitea, Orchestration Service, Chat Client (via Caddy)
+Runs: Orchestration Service, Chat Client (via Caddy), Gitea, Caddy, Authelia
 ```bash
 ssh username@192.168.0.205
-cd ~/gitea
+```bash
-docker compose up -d        # Gitea
+ssh storme@192.168.0.205
 cd /opt/stacks/network
 docker compose up -d        # Caddy, Authelia, and other network services
-cd ~/nexusai
+cd ~/nexusAI
-npm run orchestration
+npm run orchestration       # port 4000
 ```
-## Main PC
+## Main PC — 192.168.0.79
-Runs: Ollama, Inference Service
+Runs: Inference Service, llama-server
-```bash
+
-ollama serve
+```powershell
-npm run inference
+# Start llama-server first — inference service depends on it
 .\llama-gpu\llama-server.exe `
  -m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
  -ngl 99 --reasoning off --host 0.0.0.0 --port 8080 -c 64000
 # Then start inference service
 npm run inference            # port 3001
 ```
 ## Chat Client Deployment
-The chat client is a React + Vite app build to static files and served by Caddy on Mini PC 2 (Infrastructure node).  It does not run as a Node process
+The chat client is a React + Vite app built to static files and served by
 Caddy on Mini PC 2. It does not run as a Node process.
 ```bash
-# On dev machine or Mini PC 2 after git pull
+# On Mini PC 2 after git pull
 cd ~/nexusAI/packages/chat-client
-npm run build
+
 # Set production URL before building
 VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com npm run build
 # Output lands in packages/chat-client/dist/
-# Caddy serves this directory directly via volume mount
+# Caddy serves this directory directly via Docker volume mount
 ```
-Caddy config (`/opt/docker/caddy/Caddyfile`):
+
 > Do NOT set `VITE_ORCHESTRATION_URL` during local dev — Vite's proxy handles
 > routing and setting the HTTPS domain will cause Authelia to intercept API
 > requests, producing confusing JSON parse errors.
 ## Caddy Configuration
 The Caddyfile on Mini PC 2 must include a handle block for each route prefix
 the client needs to reach. Current required blocks for NexusAI:
 ```caddy
 nexus.jellystorm.com {
    import authelia
@@ -63,6 +86,14 @@ nexus.jellystorm.com {
        reverse_proxy 192.168.0.205:4000
    }
    handle /models* {
        reverse_proxy 192.168.0.205:4000
    }
    handle /projects* {
        reverse_proxy 192.168.0.205:4000
    }
    handle {
        root * /srv/nexusai
        try_files {path} /index.html
@@ -71,18 +102,45 @@ nexus.jellystorm.com {
 }
 ```
-The Caddy container mounts the dist directory via Docker volume:
+When adding new top-level routes to the orchestration service, add a matching
 handle block here and reload Caddy:
 ```bash
 caddy reload --config /path/to/Caddyfile
 ```
 The Caddy container mounts the `dist` directory via Docker volume:
 ```yaml
 - /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
 ```
 > After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
-> is required. Caddyfile-only changes only need `docker compose restart caddy`.
+> is required. Caddyfile-only changes only need `caddy reload`.
 ## Environment Files
-Each node needs a `.env` file in the relevant service package directory.
+Each service needs a `.env` file in its package directory. These are not
-These are not committed to git. See each service's documentation for
+committed to git. See each service's documentation for required variables.
-required variables.
+
 | Service | Location | Key Variables |
 |---|---|---|
 | Memory | `packages/memory-service/.env` | `SQLITE_PATH`, `QDRANT_URL`, `EMBEDDING_SERVICE_URL` |
 | Embedding | `packages/embedding-service/.env` | `OLLAMA_URL`, `EMBEDDING_MODEL` |
 | Inference | `packages/inference-service/.env` | `INFERENCE_PROVIDER`, `INFERENCE_URL`, `DEFAULT_MODEL` |
 | Orchestration | `packages/orchestration-service/src/.env` | `MEMORY_SERVICE_URL`, `EMBEDDING_SERVICE_URL`, `INFERENCE_SERVICE_URL`, `QDRANT_URL`, `MODELS_MANIFEST_PATH` |
 | Chat client | `packages/chat-client/.env` | `VITE_ORCHESTRATION_URL` (production builds only) |
 ## Models Manifest
 The models manifest (`models.json`) lives on the Main PC alongside the model
 files, accessible to orchestration via an SMB mount at `/mnt/nexus-models`.
 ```json
 [
  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
 ]
 ```
 `value` must exactly match the model name as reported by `llama-server`
 (including `.gguf` extension). No service restart needed to pick up changes.
--- a/docs/homelab/homelab-overview.md
+++ b/docs/homelab/homelab-overview.md
@@ -39,21 +39,21 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
 |------|--------|
 | GPU | NVIDIA RTX A4000 |
 | Role | Primary AI inference node |
-| Key Services | Ollama (inference) |
+| Key Services | llama-server (llama.cpp), Inference Service |
 ### Mini PC 1 — Media Node (`192.168.0.81`)
 | Spec | Detail |
 |------|--------|
 | GPU | NVIDIA RTX 5050 |
 | Role | Media services, embeddings, vector storage |
-| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding |
+| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding, Ollama |
 | Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) |
 ### Mini PC 2 — Infrastructure Node (`192.168.0.205`)
 | Spec | Detail |
 |------|--------|
-| Role | Network management, monitoring, auth, DNS, git |
+| Role | Network management, monitoring, auth, DNS, git, NexusAI orchestration |
-| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea |
+| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea, NexusAI orchestration |
 | Storage | NVMe (OS only) |
 ---
@@ -155,7 +155,8 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
 | Service | Notes |
 |---------|-------|
-| Ollama | Runs LLM inference using the RTX A4000. Also serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
+| llama-server (llama.cpp) | Primary LLM inference using the RTX A4000. Started manually before the inference service. Serves the OpenAI-compatible API on port 8080. |
 | Ollama | Serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
 ---
@@ -234,7 +235,7 @@ Phase 1 focused on establishing a stable, secure, and observable foundation:
 - ✅ Self-hosted git (Gitea)
 - ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud)
 - ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent)
- ✅ NexusAI foundation services running (Qdrant, Ollama)
+- ✅ NexusAI foundation services running (Qdrant, Ollama, llama.cpp)
 - ✅ Container management across nodes (Portainer + agent)
 ---
@@ -249,6 +250,6 @@ Phase 2 shifts focus to resilience, security hardening, and smart home integrati
 - **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies
 - **IP webcam integration** — Add camera feeds into the homelab ecosystem
 - **Home Assistant** — Integrate smart home automation and sensor data
- **Continued NexusAI development** — Entities layer, embedding service, inference and orchestration buildout
+- **Continued NexusAI development** — Entity extraction pipeline, summaries layer, SettingsView implementation
 > This section will be expanded as Phase 2 planning matures.
--- a/docs/services/API-routes.md
+++ b/docs/services/API-routes.md
@@ -0,0 +1,283 @@
 # API Routes
 All HTTP endpoints across NexusAI services. Clients communicate only with
 the orchestration service (port 4000) — memory service routes are listed
 here for reference and direct debugging use.
 ---
 ## Orchestration Service — port 4000
 ### Health
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |
 ### Chat
 | Method | Path | Description |
 |---|---|---|
 | POST | /chat | Send a message, receive full response |
 | POST | /chat/stream | Send a message, receive SSE token stream |
 **POST /chat and POST /chat/stream — request body:**
 ```json
 {
  "sessionId": "your-session-uuid",
  "message": "Hello, my name is Tim.",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7
 }
 ```
 `model` and `temperature` are optional.
 **POST /chat — response:**
 ```json
 {
  "sessionId": "your-session-uuid",
  "response": "Hello Tim! How can I help you today?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "tokenCount": 87
 }
 ```
 **POST /chat/stream — response (SSE):**
 ```
 data: {"text":"Hello"}
 data: {"text":" Tim"}
 data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":87}
 ```
 ### Sessions
 | Method | Path | Description |
 |---|---|---|
 | GET | /sessions | Paginated session list |
 | GET | /sessions/:sessionId/history | Paginated episode history for a session |
 | PATCH | /sessions/:sessionId | Update session name and/or project assignment |
 | DELETE | /sessions/:sessionId | Delete session and all its episodes |
 **GET /sessions — query params:**
 | Param | Default | Description |
 |---|---|---|
 | limit | 20 | Sessions per page |
 | offset | 0 | Pagination offset |
 | projectId | — | Filter by project (integer ID) |
 **PATCH /sessions/:sessionId — body:**
 ```json
 { "name": "My Session", "projectId": 3 }
 ```
 Either `name` or `projectId` is required. Both can be sent together.
 Returns the updated session object.
 **GET /sessions/:sessionId/history — query params:**
 | Param | Default | Description |
 |---|---|---|
 | limit | 20 | Episodes per page |
 | offset | 0 | Pagination offset |
 Returns `{ sessionId, episodes: [...] }`. Episodes ordered newest first.
 ### Projects
 | Method | Path | Description |
 |---|---|---|
 | GET | /projects | Get all projects |
 | POST | /projects | Create a new project |
 | PATCH | /projects/:id | Update a project |
 | DELETE | /projects/:id | Delete a project (nulls session assignments) |
 **POST /projects — body:**
 ```json
 {
  "name": "My Project",
  "description": "Optional description",
  "colour": "#3d3a79",
  "icon": null,
  "isolated": 0
 }
 ```
 `name` is required. All other fields optional. `isolated` is `0` or `1`.
 Returns `201` with the created project object.
 **PATCH /projects/:id — body:** same fields as POST, all optional.
 ### Models
 | Method | Path | Description |
 |---|---|---|
 | GET | /models | Available models from `models.json` manifest |
 Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
 ---
 ## Memory Service — port 3002
 Direct access is for debugging only. All client traffic goes through
 orchestration.
 ### Health
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |
 ### Sessions
 | Method | Path | Description |
 |---|---|---|
 | POST | /sessions | Create a new session |
 | GET | /sessions | Paginated session list with optional projectId filter |
 | GET | /sessions/:id | Get session by internal ID |
 | GET | /sessions/by-external/:externalId | Get session by external ID |
 | PATCH | /sessions/by-external/:externalId | Update session fields |
 | DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes) |
 > Route ordering: `by-external/:externalId` must be defined before `/:id`
 > to prevent `by-external` being captured as an ID param.
 **POST /sessions — body:**
 ```json
 { "externalId": "unique-uuid", "metadata": {} }
 ```
 **PATCH /sessions/by-external/:externalId — body:**
 ```json
 { "name": "Session Name", "projectId": 3 }
 ```
 Both fields are optional. Only provided fields are updated — other fields
 are not touched.
 ### Episodes
 | Method | Path | Description |
 |---|---|---|
 | POST | /episodes | Create episode + auto-embed into Qdrant |
 | GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
 | GET | /episodes/:id | Get episode by ID |
 | GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
 | DELETE | /episodes/:id | Delete an episode |
 > Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
 **POST /episodes — body:**
 ```json
 {
  "sessionId": 1,
  "userMessage": "Hello",
  "aiResponse": "Hi there!",
  "tokenCount": 10
 }
 ```
 ### Projects
 | Method | Path | Description |
 |---|---|---|
 | POST | /projects | Create a new project |
 | GET | /projects | Get all projects |
 | GET | /projects/:id | Get project by ID |
 | PATCH | /projects/:id | Update a project |
 | DELETE | /projects/:id | Delete project + null session assignments |
 Same request/response shape as orchestration `/projects` above.
 ### Entities
 | Method | Path | Description |
 |---|---|---|
 | POST | /entities | Upsert entity (creates or updates by name + type) |
 | GET | /entities/by-type/:type | All entities of a given type |
 | GET | /entities/:id | Get entity by ID |
 | DELETE | /entities/:id | Delete entity (cascades to relationships) |
 > Route ordering: `/entities/by-type/:type` must be before `/entities/:id`.
 **POST /entities — body:**
 ```json
 {
  "name": "NexusAI",
  "type": "project",
  "notes": "My AI memory project",
  "metadata": {}
 }
 ```
 ### Relationships
 | Method | Path | Description |
 |---|---|---|
 | POST | /relationships | Upsert a relationship between two entities |
 | GET | /entities/:id/relationships | All relationships for an entity |
 | DELETE | /relationships | Delete a specific relationship |
 **POST /relationships — body:**
 ```json
 { "fromId": 1, "toId": 2, "label": "uses", "metadata": {} }
 ```
 **DELETE /relationships — body:**
 ```json
 { "fromId": 1, "toId": 2, "label": "uses" }
 ```
 Relationships are identified by the composite key `(fromId, toId, label)`.
 Delete uses request body rather than URL params since this three-part key
 is awkward to encode in a path.
 ---
 ## Embedding Service — port 3003
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |
 | POST | /embed | Embed a single text string |
 | POST | /embed/batch | Embed an array of text strings |
 **POST /embed — body:**
 ```json
 { "text": "Hello from NexusAI" }
 ```
 **POST /embed — response:**
 ```json
 { "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 }
 ```
 ---
 ## Inference Service — port 3001
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Health check — reports active provider and model |
 | POST | /complete | Full completion — awaits entire response |
 | POST | /complete/stream | Streaming completion via SSE |
 **POST /complete — body:**
 ```json
 {
  "prompt": "What is the capital of France?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7,
  "maxTokens": 1024
 }
 ```
 All fields except `prompt` are optional.
 **POST /complete — response:**
 ```json
 {
  "text": "The capital of France is Paris.",
  "model": "gemma-4-26B...gguf",
  "done": true,
  "evalCount": 8,
  "promptEvalCount": 41
 }
 ```
--- a/docs/services/Memory-isolation.md
+++ b/docs/services/Memory-isolation.md
@@ -0,0 +1,128 @@
 # Memory Isolation
 NexusAI implements project-scoped memory — sessions belonging to the same
 project can share semantic context, and isolated projects can be restricted
 from drawing on memory outside the project. This document describes how the
 system works end-to-end.
 ## Concepts
 **Session** — a single conversation thread. Identified by `external_id`.
 **Project** — a named grouping of sessions. Has an `isolated` flag (0 or 1).
 **Semantic search** — at inference time, the user's message is embedded and
 compared against past episodes in Qdrant to surface relevant context. The
 scope of this search is controlled by the project context.
 ## Semantic Search Scope
 | Session state | Semantic search scope |
 |---|---|
 | No project | Own session's episodes only |
 | Assigned to a non-isolated project | All episodes across all sessions in the project |
 | Assigned to an isolated project | All episodes within the project only |
 | Removed from a project | Own session's episodes only (from that point) |
 Sessions with no project assigned behave the same as they always have —
 only their own past episodes are searched.
 ## How It Works
 ### Step 1 — Project context resolution (orchestration)
 In `chat/index.js`, immediately after session resolution:
 ```js
 let projectSessionIds = null;
 if (session.project_id) {
  const project = await memory.getProject(session.project_id);
  if (project) {
    const projectSessions = await memory.getProjectSessions(session.project_id);
    projectSessionIds = projectSessions.map(s => s.id);
  }
 }
 ```
 If the session belongs to any project (isolated or not), `projectSessionIds`
 is populated with the internal integer IDs of all sessions in that project.
 For **non-isolated projects**, this expands the search to all project sessions.  
 For **isolated projects**, the same set is used but the intent is restriction
 — since `projectSessionIds` only contains project sessions, no external
 episodes can appear.
 Both cases use the same code path — the `isolated` flag does not change the
 query logic, only the conceptual meaning.
 ### Step 2 — Qdrant filter construction
 In `services/qdrant.js`, `searchEpisodes` builds the filter:
 ```js
 if (projectSessionIds) {
  body.filter = {
    should: projectSessionIds.map(id => ({
      key: 'sessionId', match: { value: id }
    }))
  };
 } else if (sessionId) {
  body.filter = { must: [{ key: 'sessionId', match: { value: sessionId } }] };
 }
 ```
 `should` is Qdrant's "match any of" operator — equivalent to SQL
 `WHERE sessionId IN (...)`. When `projectSessionIds` is set, the single-session
 filter is not used.
 ### Step 3 — Episode payloads
 Every episode upserted into Qdrant carries `{ sessionId, createdAt }` in its
 payload. `sessionId` here is the **internal integer ID** from SQLite. This
 is what the Qdrant filter matches against.
 This means the filter works correctly regardless of when episodes were created
 or when a session was added to a project — the payload is immutable.
 ## Important Behaviours
 **Pre-existing episodes are included immediately.** When a session is added
 to a project and a new message is sent, Qdrant can match all of that session's
 existing episodes since the filter only requires the `sessionId` to be in the
 project's session list.
 **Removing a session from a project takes effect immediately.** On the next
 message, `getProjectSessions` will not include that session's ID, so its
 episodes disappear from the semantic search scope.
 **New sessions created from ProjectView are assigned after the first message.**
 The `useChat` hook writes the `project_id` assignment via `updateSession` after
 `onDone` fires. There is a brief window during the first message where the
 session has no project assigned. The project is correctly applied from the
 second message onward.
 ## Isolated vs Non-Isolated
 The `isolated` flag is stored on the project but does not currently change the
 query logic — both isolated and non-isolated projects result in a
 `projectSessionIds` filter. The distinction is semantic and enforced by
 the project's membership:
 - **Non-isolated** — intentionally draws from all sessions in the project,
  creating a shared memory pool for related conversations
 - **Isolated** — by design contains only sessions explicitly added to it,
  so the same filter naturally restricts context to project-only episodes
 If cross-project contamination became a concern (e.g. a session accidentally
 added to the wrong project), removing it from the project immediately restores
 isolation.
 ## Qdrant Payload Structure
 Episodes are stored with this payload:
 ```json
 { "sessionId": 42, "createdAt": 1776080188 }
 ```
 `sessionId` is the SQLite `sessions.id` integer, not the `external_id` UUID.
 This is important when building filters — always use internal IDs.
--- a/docs/services/chat-client.md
+++ b/docs/services/chat-client.md
@@ -55,10 +55,6 @@ VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
 during local development, bypassing Caddy and Authelia entirely:
 ```js
 // vite.config.js
 import { defineConfig } from 'vite';
 import react from '@vitejs/plugin-react';
 export default defineConfig({
  plugins: [react()],
  server: {
@@ -72,7 +68,8 @@ export default defineConfig({
 });
 ```
-If new routes are added to the orchestration service, add them here too.
+When adding new top-level routes to the orchestration service, add a matching
 entry here too.
 ## Internal Structure
@@ -93,12 +90,13 @@ src/
 │   ├── Sidebar.jsx          # Left sidebar — projects, recent chats, navigation
 │   ├── ChatWindow.jsx       # Centre panel — message thread and input bar
 │   ├── MessageBubble.jsx    # Individual message bubble (user or assistant)
-│   ├── InfoPanel.jsx        # Right panel — model selector and session metadata
+│   ├── InfoPanel.jsx        # Right panel — model selector and session metadata (slide-in)
-│   ├── SessionModal.jsx     # Modal for session rename and delete confirmation
+│   ├── SessionModal.jsx     # Modal for session rename, project assignment, delete
-│   ├── ProjectModal.jsx     # Modal for project create, edit, and delete confirmation
+│   ├── ProjectModal.jsx     # Modal for project create, edit, delete
 │   ├── AllChatsView.jsx     # Full paginated session list with multi-select bulk delete
 │   ├── AllProjectsView.jsx  # Project tile grid with create/edit/delete
-│   └── SettingsView.jsx     # Settings placeholder (sections: Appearance, Memory, Models, About)
+│   ├── ProjectView.jsx      # Individual project — session list, new chat button
 │   └── SettingsView.jsx     # Settings placeholder (Appearance, Memory, Models, About)
 ├── index.css                # Global reset, CSS variables, utility classes
 └── main.jsx                 # React entry point
 ```
@@ -107,9 +105,9 @@ src/
 ## Layout
-The app uses a view-based layout. `App.jsx` manages a `view` state
+The app uses a view-based layout. `App.jsx` manages a `view` state string
-(`'chat' | 'all-chats' | 'all-projects' | 'settings'`) that controls which
+that controls which main panel is rendered. The left sidebar and right info
-main panel is rendered. The left sidebar and right info panel are always present.
+panel are persistent across all views.
 ```
 ┌──────────────────┬──────────────────────────────┐
@@ -117,9 +115,9 @@ main panel is rendered. The left sidebar and right info panel are always present
 │  (collapsible)   │                               │
 │                  │  chat         → ChatWindow    │
 │ + New Chat       │  all-chats    → AllChatsView  │
-│ ⊞ New Project    │  all-projects → AllProjectsView│
+│ ⊞ View Projects  │  all-projects → AllProjectsView│
-│                  │  settings     → SettingsView  │
+│                  │  project      → ProjectView   │
-│ PROJECTS ▾       │                               │
+│ PROJECTS ▾       │  settings     → SettingsView  │
 │  [tile] [tile]   │                               │
 │  All Projects →  │                               │
 │                  │                               │
@@ -132,10 +130,22 @@ main panel is rendered. The left sidebar and right info panel are always present
 └──────────────────┴──────────────────────────────┘
 ```
-The sidebar collapses to a 48px icon rail. The right info panel (`InfoPanel`)
+The sidebar collapses to a 48px icon rail. The right `InfoPanel` slides in
-slides in from the right over the main area using `transform: translateX()` —
+from the right using `transform: translateX()` — hidden by default, toggled
-it is hidden by default (`rightOpen` starts `false`) and toggled via a button
+via the `⊹` button in the `ChatWindow` header.
-in the `ChatWindow` header.
+
 ## View Routing
 | View | Component | Trigger |
 |---|---|---|
 | `'chat'` | `ChatWindow` | Default; selecting a session; new chat |
 | `'all-chats'` | `AllChatsView` | "All Chats →" or ☰ icon in collapsed rail |
 | `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
 | `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
 | `'settings'` | `SettingsView` | Settings button or ⚙ icon |
 `activeProject` state in `App.jsx` tracks which project `ProjectView` is
 displaying. Set via `onSelectProject` before navigating to `'project'`.
 ## CSS Architecture
@@ -181,91 +191,47 @@ rules, inline styles for dynamic prop-driven values.
 | `.label-upper` | Uppercase section label style |
 | `.truncate` | Text overflow ellipsis |
 ## API Layer
 All orchestration calls are centralised in `src/api/orchestration.js`:
 | Function | Method | Path | Description |
 |---|---|---|---|
 | `fetchSessions` | GET | /sessions | Load session list for sidebar |
 | `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
 | `sendMessage` | POST | /chat | Send message, await full response |
 | `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
 | `fetchModels` | GET | /models | Load available models from manifest |
 | `renameSession` | PATCH | /sessions/:id | Rename a session |
 | `deleteSession` | DELETE | /sessions/:id | Delete a session |
 | `fetchProjects` | GET | /projects | Load project list |
 | `createProject` | POST | /projects | Create a new project |
 | `updateProject` | PATCH | /projects/:id | Update a project |
 | `deleteProject` | DELETE | /projects/:id | Delete a project |
 `streamMessage` returns an abort function — call it to cancel a stream mid-flight.
 Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
 ## Streaming
-The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
+Messages are sent via `POST /chat/stream`. Tokens arrive as SSE events and
 are written into the active assistant bubble token by token via
 `updateLastMessage`. The blinking cursor in `MessageBubble` is shown while
 `message.streaming === true`.
-```
+`useChat` accepts an optional `projectId` parameter in `sendMessage`. After
-data: {"text":"Hello"}
+the first message completes in a new session, if `projectId` is set,
-data: {"text":" Tim"}
+`updateSession` is called to write the project assignment to the backend.
 data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
 ```
 An empty assistant bubble is appended immediately when the stream opens, then
 updated token by token using `updateLastMessage`. The blinking cursor in
 `MessageBubble` is shown while `message.streaming === true` and disappears
 when the done event is received. Model name and token count from the done
 event are stored in `useChat` state and displayed in the InfoPanel.
 ## Dynamic Model Selector
 Available models are fetched from `GET /models` on mount via the `useModels` hook.
 The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
 with the server response on success. If the fetch fails, the fallback list is used
 silently — a warning is logged to the console.
 To add a model, update `models.json` on the main PC — no client rebuild needed.
 `FALLBACK_MODELS` in `constants.js` should be kept in sync with `models.json`
 as a reasonable last-resort list in case the endpoint is unreachable.
 ## Session Management
-Sessions are identified by `external_id` — a UUID generated client-side via the
+Sessions are identified by `external_id` — a UUID generated client-side via
-`uuid` package. New sessions are created locally and auto-registered in the memory
+the `uuid` package. New sessions are created locally and auto-registered in
-service on the first message. The session list refreshes after each completed
+the memory service on the first message. The session list refreshes after
-response to surface newly created sessions.
+each completed response to surface newly created sessions.
-### Session Name Display
+### Auto-naming
-The chat header and session rows both display `session.name` if set, falling back
+After the first exchange completes, orchestration fires a secondary inference
-to `session.external_id` if no name has been assigned:
+call with a short naming prompt (max 20 tokens, temperature 0.3). The result
 is written back as `session.name`. The client fires a second `refreshSessions`
 after a 3-second delay to pick up the name once written.
-```js
+Manually renamed sessions are never overwritten — the `!session.name` guard
-activeSession.name || activeSession.external_id
+in `chat/index.js` prevents this.
 ```
 ### Session Actions
-Session rows in the sidebar support rename and delete via two entry points:
+Session rows support rename, project assignment, and delete via:
 - **Hover** — reveals ✎ and ✕ icon buttons alongside the row
 - **Right-click** — context menu with the same actions
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons alongside the row
+`SessionModal` handles rename and project assignment together in `settings`
- **Right-click** — opens a context menu with the same actions
+mode, and delete confirmation in `confirm-delete` mode.
 Both trigger `SessionModal` — a shared modal component with two modes:
 | Mode | Trigger | Behaviour |
 |---|---|---|
 | `settings` | Rename button / context menu rename | Shows name input, saves on Enter or Save button |
 | `confirm-delete` | Delete button / context menu delete | Shows confirmation dialog, requires explicit Delete click |
 Actions are disabled on unsaved (new) sessions that haven't had a first message sent yet.
 ### Active Session Clearing on Delete
-When the deleted session is the currently active one, `App.jsx` detects the match
+When the deleted session is the currently active one, `App.jsx` clears the
-and calls `selectSession(null)` to clear the chat window before refreshing the list:
+chat window before refreshing the list:
 ```js
 function handleSessionsChange(deletedSession) {
@@ -276,53 +242,23 @@ function handleSessionsChange(deletedSession) {
 }
 ```
-### Context Menu
+### Key Patterns
-Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
+- Button nesting: action icons are siblings of row buttons, not children — HTML forbids `<button>` inside `<button>`
-attaches a `window` click listener to dismiss on any outside click. Rendered
+- Context menu rendered outside sidebar via React fragment to avoid `overflow: hidden` clipping
-outside the sidebar div via a React fragment to avoid being clipped by
+- `useContextMenu` dismisses on a `window` click listener
-`overflow: hidden`.
+- Dynamic `updateSession` SQL builds `SET` clause from only the fields passed — prevents accidental overwrites
 ### Button Nesting
 Session row action icons (✎ ✕) are rendered as siblings of the session
 `<button>`, not children — HTML does not allow `<button>` inside `<button>`.
 The outer `<div>` owns hover state and context menu; the inner `<button>` handles
 session selection; action icon buttons sit alongside it in the same flex row.
 ## Project Management
-Projects are a first-class concept in the UI. The `useProjects` hook fetches
+`useProjects` fetches the project list from `GET /projects` on mount and
-the project list from `GET /projects` on mount and exposes a `refreshProjects`
+exposes `refreshProjects` for keeping the sidebar in sync after mutations.
 callback for keeping the sidebar in sync after mutations.
-### Project Actions
+`ProjectModal` handles create, edit, and delete confirmation. Fields: name
 (required), description (optional), colour picker, isolated toggle.
-Projects are managed from `AllProjectsView` via `ProjectModal`:
+`ProjectView` shows the project's name, description, isolated badge (if set),
 and a filtered session list. The "+ New Chat" button creates a new session,
 navigates to `'chat'`, and writes the project assignment after the first message.
-| Mode | Behaviour |
+For memory isolation behaviour, see `memory-isolation.md`.
 |---|---|
 | `create` | Name (required), description (optional), colour picker |
 | `edit` | Same fields as create, pre-populated |
 | `confirm-delete` | Confirmation dialog — sessions in the project are not deleted |
 The sidebar Projects section shows up to 6 project tiles as coloured badge buttons.
 Clicking any tile navigates to `AllProjectsView`. The "All Projects →" link is
 always shown below the tiles.
 After any create, edit, or delete in `AllProjectsView`, `onProjectsChange` is called
 to trigger `refreshProjects` in `App.jsx`, keeping the sidebar tiles in sync.
 ## View Routing
 `App.jsx` manages a `view` state string that controls which main panel renders:
 | View | Component | Trigger |
 |---|---|---|
 | `'chat'` | `ChatWindow` | Default; selecting a session from sidebar or AllChatsView |
 | `'all-chats'` | `AllChatsView` | "All Chats →" link or ☰ icon in collapsed rail |
 | `'all-projects'` | `AllProjectsView` | "All Projects →" link, ⊞ icon, or New Project button |
 | `'settings'` | `SettingsView` | Settings button or ⚙ icon in collapsed rail |
 `AllChatsView` navigates back to `'chat'` on session row click, passing the selected
 session to `selectSession` so history loads immediately.
--- a/docs/services/embedding-service.md
+++ b/docs/services/embedding-service.md
@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
 | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
 | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
 > Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
 > from other services.
 ## Model
-**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
+**nomic-embed-text** via Ollama produces **768-dimension** vectors with
-This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
+**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
 If the embedding model is changed, the Qdrant collections must be reinitialized
-with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
+with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
-the single change required to keep everything consistent.
+is the single change required to keep everything consistent.
 ## Ollama API
-Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
+Uses the `/api/embed` endpoint (Ollama v0.4+):
 ```json
 // Request
 { "model": "nomic-embed-text", "input": "text to embed" }
 ```
 Response key is `embeddings[0]` — an array of 768 floats.
-## Endpoints
+// Response key
-
+embeddings[0]  // array of 768 floats
 ### Health
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |
 ### Embed
 | Method | Path | Description |
 |---|---|---|
 | POST | /embed | Embed a single text string |
 | POST | /embed/batch | Embed an array of text strings |
 ---
 **POST /embed**
 Embeds a single text string and returns the vector.
 Request body:
 ```json
 {
  "text": "Hello from NexusAI"
 }
 ```
-Response:
+> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
-```json
+> returned `embedding` (singular). Use `/api/embed`, `input`, and
-{
+> `embeddings[0]` for Ollama v0.4+.
  "embedding": [0.123, -0.456, ...],
  "model": "nomic-embed-text",
  "dimensions": 768
 }
 ```
---
+## Usage in NexusAI
-**POST /embed/batch**
+The embedding service is called in two places:
-Embeds an array of strings sequentially and returns all vectors in the same order.
+1. **Memory service** — after each episode is saved to SQLite, the combined
-Ollama does not natively parallelize embeddings, so requests are processed one at a time.
+   `User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
   This is fire-and-forget — failures are logged but don't affect the response.
-Request body:
+2. **Orchestration service** — the user's message is embedded at the start of
-```json
+   the chat pipeline to perform semantic search against past episodes.
 {
  "texts": ["first sentence", "second sentence"]
 }
 ```
-Response:
+For all HTTP endpoints, see `api-routes.md`.
 ```json
 {
  "embeddings": [[0.123, ...], [0.456, ...]],
  "model": "nomic-embed-text",
  "dimensions": 768,
  "count": 2
 }
 ```
--- a/docs/services/inference-service.md
+++ b/docs/services/inference-service.md
@@ -24,20 +24,19 @@ to switch inference backends without changes to the rest of the system.
 | Variable | Required | Default | Description |
 |---|---|---|---|
 | PORT | No | 3001 | Port to listen on |
-| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
+| INFERENCE_PROVIDER | No | llamacpp | Active provider (`ollama` or `llamacpp`) |
 | INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
 | DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
 > `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
-> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
+> service. The orchestration service uses `INFERENCE_SERVICE_URL` to reach
-> reach this service on port 3001.
+> this service on port 3001.
 ## Provider Architecture
-The inference service uses a provider pattern to abstract the underlying
+The active provider is selected at startup via `INFERENCE_PROVIDER` and
-LLM runtime. The active provider is selected at startup via `INFERENCE_PROVIDER`
+loaded from `src/providers/`. Both providers expose identical function
-and loaded from `src/providers/`. Both providers expose identical function
+signatures.
 signatures, so the rest of the service is unaware of which backend is active.
 ### Supported Providers
@@ -46,28 +45,36 @@ signatures, so the rest of the service is unaware of which backend is active.
 | llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
 | Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
-Switching providers requires only a `.env` change — no code modifications needed:
+Switching providers requires only a `.env` change — no code modifications:
 ```
 INFERENCE_PROVIDER=llamacpp
 INFERENCE_URL=http://localhost:8080
 ```
-### Provider Validation
+The provider loader throws immediately on an unknown value, preventing silent
 misconfiguration.
 ## Internal Structure
 The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
 if an unknown value is set — prevents silent misconfiguration:
 ```
-Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
+src/
 ├── providers/
 │   ├── ollama.js      # Ollama provider
 │   └── llamacpp.js    # llama.cpp provider (OpenAI-compatible REST)
 ├── routes/
 │   └── inference.js   # /complete and /complete/stream route handlers
 ├── infer.js           # Provider loader — selects and re-exports active provider
 └── index.js           # Express app + route definitions
 ```
 ## llama.cpp Provider
-The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
+Uses the OpenAI-compatible REST API exposed by `llama-server`.
 ### Starting llama-server
-`llama-server` must be started manually on the main PC before the inference service
+Must be started manually on the main PC before the inference service can
-can handle requests. It loads a single model at startup:
+handle requests:
 ```powershell
 .\llama-gpu\llama-server.exe `
@@ -79,40 +86,29 @@ can handle requests. It loads a single model at startup:
  -c 64000
 ```
 Key flags:
 | Flag | Description |
 |---|---|
 | `-m` | Path to the `.gguf` model file |
 | `-ngl 99` | Offload as many layers as possible to GPU |
-| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
+| `--reasoning off` | Disables thinking delay on Gemma 4 models |
-| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
+| `--host 0.0.0.0` | Allows LAN connections |
 | `--port 8080` | Port for the llama-server HTTP API |
 | `-c 64000` | Context window size in tokens |
-> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
+> `-c 64000` is intentionally large. NexusAI's memory architecture handles
-> reduce this value. The NexusAI memory architecture handles context injection
+> context injection so 6–8K is often sufficient if VRAM pressure builds.
 > so a smaller window (6–8K) is often sufficient.
 ### Model Naming
-The model name sent in API requests must match the name as reported by
+The model name in requests must match the name reported by `llama-server`
-`llama-server` — including the `.gguf` extension. The reported name can be
+including the `.gguf` extension:
 verified with:
 ```powershell
 Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
 ```
-Set `DEFAULT_MODEL` in `.env` to the exact reported name:
+Set `DEFAULT_MODEL` in `.env` to the exact reported name.
 ```
 DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
 ```
 ### Inference Parameters
 The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
 | NexusAI option | API field | Default |
 |---|---|---|
 | `temperature` | `temperature` | 0.7 |
@@ -122,18 +118,6 @@ The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
 | `repeatPenalty` | `repeat_penalty` | 1.1 |
 | `seed` | `seed` | null (random) |
 ## Internal Structure
 ```
 src/
 ├── providers/
 │   ├── ollama.js      # Ollama provider — uses ollama npm package
 │   └── llamacpp.js    # llama.cpp provider — uses OpenAI-compatible REST API
 ├── routes/
 │   └── inference.js   # /complete and /complete/stream route handlers
 ├── infer.js           # Provider loader — selects and re-exports active provider
 └── index.js           # Express app + route definitions
 ```
 ## Streaming Response Format
 The llama.cpp provider yields chunks in this shape:
@@ -143,7 +127,7 @@ The llama.cpp provider yields chunks in this shape:
 { response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
 ```
-The inference route re-emits these as SSE events:
+The inference route re-emits as SSE:
 ```
 data: {"response":"token text"}
 data: {"done":true,"model":"model-name.gguf","tokenCount":42}
@@ -151,66 +135,6 @@ data: [DONE]
 ```
 `model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
-chunk (`usage.completion_tokens`) and emitted on the done event so the
+chunk and emitted on the done event.
 orchestration layer can forward them to the client.
-## Endpoints
+For all HTTP endpoints, see `api-routes.md`.
 ### Health
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check — reports active provider and model |
 ### Inference
 | Method | Path | Description |
 |---|---|---|
 | POST | /complete | Standard completion — returns full response when done |
 | POST | /complete/stream | Streaming completion via Server-Sent Events |
 ---
 **POST /complete**
 Request body:
 ```json
 {
  "prompt": "What is the capital of France?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7,
  "maxTokens": 1024
 }
 ```
 `model` is optional — falls back to `DEFAULT_MODEL` if omitted.  
 `maxTokens` is optional — defaults to 1024.  
 `temperature` is optional — defaults to 0.7.
 Response:
 ```json
 {
  "text": "The capital of France is Paris.",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "done": true,
  "evalCount": 8,
  "promptEvalCount": 41
 }
 ```
 ---
 **POST /complete/stream**
 Same request body as `/complete`.
 Response is a stream of Server-Sent Events:
 ```
 data: {"response":"The"}
 data: {"response":" capital of France is Paris."}
 data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
 data: [DONE]
 ```
 Clients should accumulate `response` fields to build the full response string.
 The `done` event carries `model` and `tokenCount` for display in the UI.
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -43,48 +43,34 @@ src/
 │   └── index.js       # Qdrant collection management, upsert, search, delete
 ├── entities/
 │   └── index.js       # Entity + relationship CRUD
-└── index.js           # Express app + route definitions
+└── index.js           # Express app + all route definitions
 ```
 ## SQLite Schema
 Six core tables:
- **sessions** — top-level conversation containers, identified by an `external_id`, optional `name`, and optional `project_id`
+- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
 - **episodes** — individual exchanges (user message + AI response) tied to a session
 - **entities** — named things the system learns about (people, places, concepts)
 - **relationships** — directional labeled links between entities
 - **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with optional description, colour, and icon
+- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`
 ### Migrations
-Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
+Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
-as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
+idempotent migrations in `db/index.js` at startup:
 already-applied changes:
 ```js
-try {
+try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
-    db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
+try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
-} catch {}
+try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
-
+try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
 try {
    db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`);
 } catch {}
 try {
    db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`);
 } catch {}
 ```
-This pattern is idempotent — safe to run on every startup. New migrations should
+New migrations are always appended here — never modify the schema file for
-always be appended here rather than modifying the schema file, since `ALTER TABLE`
+existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
 and index creation on existing tables cannot use `IF NOT EXISTS` guards in SQLite.
 Current migrations:
 - `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
 - `ALTER TABLE sessions ADD COLUMN project_id INTEGER` — links sessions to projects
 - `CREATE INDEX idx_sessions_project` — index on the new project_id column
 ### FTS5 Full-Text Search
@@ -96,11 +82,27 @@ keep the FTS index automatically in sync with the episodes table.
 - `journal_mode = WAL` — non-blocking reads during writes
 - `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
+- PRAGMAs set via `db.pragma()`, not `db.exec()`
 ### Dynamic Session Updates
 `updateSession` builds its `SET` clause dynamically from only the fields
 passed — prevents partial updates from overwriting fields that weren't
 touched:
 ```js
 function updateSession(id, { name, projectId } = {}) {
  const updates = [];
  const values = [];
  if (name !== undefined)      { updates.push('name = ?');       values.push(name ?? null); }
  if (projectId !== undefined) { updates.push('project_id = ?'); values.push(projectId ?? null); }
  // ...
 }
 ```
 ## Qdrant / Semantic Layer
-Three collections are initialized on service startup (created if they don't already exist):
+Three Qdrant collections are initialized on service startup:
 | Collection | Purpose |
 |---|---|
@@ -108,208 +110,50 @@ Three collections are initialized on service startup (created if they don't alre
 | `entities` | Embeddings for named entities |
 | `summaries` | Embeddings for condensed episode summaries |
-All collections use **768-dimension vectors** with **Cosine similarity**, matching the
+All collections use **768-dimension vectors** with **Cosine similarity**,
-output of the `nomic-embed-text` embedding model via Ollama.
+matching `nomic-embed-text` via Ollama. Vector size and distance metric are
 defined in `@nexusai/shared` — not hardcoded here.
-Vector dimension and distance metric are defined in `@nexusai/shared` constants
+Each collection exposes three operations in `src/semantic/index.js`:
-(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
+upsert, search (with optional Qdrant filter), and delete. The `wait: true`
-
+flag is used on all writes.
 ### Semantic Layer Operations
 Each collection exposes three operations via helper functions in `src/semantic/index.js`:
 - **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
  lookups back to the full content after a vector search
 - **Search** — returns the top-k most similar vectors, with optional Qdrant filter
 - **Delete** — removes a vector point by ID
 The `wait: true` flag is used on all write operations so the caller receives confirmation
 only after Qdrant has committed the change.
 ## Embedding Write Path
-When a new episode is created, the memory service automatically generates and stores
+When a new episode is created:
 a vector embedding in Qdrant via the embedding service:
-1. Episode is saved to SQLite synchronously — the response is returned immediately
+1. Episode saved to SQLite synchronously — response returned immediately
-2. Both sides of the exchange are combined into a single text:
+2. User message + AI response combined: `User: ...\nAssistant: ...`
-   ```
+3. Text sent to embedding service (`POST /embed`)
-   User: {userMessage}
+4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
   Assistant: {aiResponse}
   ```
 3. This text is sent to the embedding service (`POST /embed`)
 4. The returned vector is upserted into the `episodes` Qdrant collection with a
   payload of `{ sessionId, createdAt }` for filtering and lookups
-The embedding step is **fire-and-forget** — it runs asynchronously after the SQLite
+This step is **fire-and-forget** — if embedding fails, the episode is still
-insert succeeds. If embedding fails, the episode is still saved and searchable via
+saved and searchable via FTS. The error is logged but not surfaced.
 FTS. The error is logged but does not affect the API response.
-### Hybrid Retrieval Pattern
+> The Qdrant payload stores `sessionId` (the internal integer ID). This is
-
+> used for per-session and per-project filtering during semantic search. See
-Qdrant and SQLite work as a pair — neither operates in isolation:
+> `memory-isolation.md` for how project-level filtering works.
 1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
 2. IDs are used to fetch full content from SQLite
 3. Results are ranked and assembled into a context package
 ## Entity Layer
-Entities and relationships are stored in SQLite with two key constraints:
+Entities and relationships use upsert semantics with composite unique
 constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities — ensures no duplicates; upsert updates existing records
+- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships — prevents duplicate edges
+- `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on both `from_id` and `to_id` — deleting an entity automatically
+- `ON DELETE CASCADE` on relationship foreign keys
  removes all relationships where it appears on either end
-## Endpoints
+## Project Delete Behaviour
-### Health
+Deleting a project runs as a transaction — it first nulls out `project_id`
 on all assigned sessions, then deletes the project. This avoids a foreign
 key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
-| Method | Path | Description |
+```js
-|---|---|---|
+const doDelete = db.transaction(() => {
-| GET | /health | Service health check |
+  db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
-
+  db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
-### Sessions
+});
 | Method | Path | Description |
 |---|---|---|
 | POST | /sessions | Create a new session |
 | GET | /sessions | Get paginated list of all sessions |
 | GET | /sessions/:id | Get session by internal ID |
 | GET | /sessions/by-external/:externalId | Get session by external ID |
 | PATCH | /sessions/by-external/:externalId | Update session name |
 | DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
 > Route ordering matters in Express: `by-external/:externalId` must be defined before
 > `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
 **POST /sessions body:**
 ```json
 {
  "externalId": "unique-session-id",
  "metadata": {}
 }
 ```
-**PATCH /sessions/by-external/:externalId body:**
+For all HTTP endpoints, see `api-routes.md`.
 ```json
 {
  "name": "My Renamed Session"
 }
 ```
 Returns the updated session object. `name` is required and must be non-empty.
 **DELETE /sessions/by-external/:externalId**
 Returns `204 No Content` on success. Cascades to delete all associated episodes
 and summaries via SQLite `ON DELETE CASCADE`.
 ### Episodes
 | Method | Path | Description |
 |---|---|---|
 | POST | /episodes | Create episode + auto-embed into Qdrant |
 | GET | /episodes/search?q=&limit= | Full-text search across episodes |
 | GET | /episodes/:id | Get episode by ID |
 | GET | /sessions/:id/episodes?limit=&offset= | Get paginated episodes for a session |
 | DELETE | /episodes/:id | Delete an episode |
 **POST /episodes body:**
 ```json
 {
  "sessionId": 1,
  "userMessage": "Hello",
  "aiResponse": "Hi there!",
  "tokenCount": 10,
  "metadata": {}
 }
 ```
 > Note: `/episodes/search` must be defined before `/episodes/:id` in Express to prevent
 > the word `search` being captured as an ID parameter.
 ### Projects
 | Method | Path | Description |
 |---|---|---|
 | POST | /projects | Create a new project |
 | GET | /projects | Get all projects |
 | GET | /projects/:id | Get project by ID |
 | PATCH | /projects/:id | Update a project |
 | DELETE | /projects/:id | Delete a project |
 **POST /projects body:**
 ```json
 {
  "name": "My Project",
  "description": "Optional description",
  "colour": "#3d3a79",
  "icon": null
 }
 ```
 `name` is required. `description`, `colour`, and `icon` are optional.
 Returns `201` with the created project object on success.
 **PATCH /projects/:id body:** same fields as POST, all optional.
 **DELETE /projects/:id**
 Returns `204 No Content`. Sessions assigned to the project are not deleted —
 their `project_id` foreign key is left as-is (nullable, no cascade).
 ### Entities
 | Method | Path | Description |
 |---|---|---|
 | POST | /entities | Upsert an entity (creates or updates by name + type) |
 | GET | /entities/by-type/:type | Get all entities of a given type |
 | GET | /entities/:id | Get entity by internal ID |
 | DELETE | /entities/:id | Delete entity (cascades to relationships) |
 **POST /entities body:**
 ```json
 {
  "name": "NexusAI",
  "type": "project",
  "notes": "My AI memory project",
  "metadata": {}
 }
 ```
 > Note: `/entities/by-type/:type` must be defined before `/entities/:id` in Express to
 > prevent `by-type` being captured as an ID parameter.
 ### Relationships
 | Method | Path | Description |
 |---|---|---|
 | POST | /relationships | Upsert a relationship between two entities |
 | GET | /entities/:id/relationships | Get all relationships originating from an entity |
 | DELETE | /relationships | Delete a specific relationship |
 **POST /relationships body:**
 ```json
 {
  "fromId": 1,
  "toId": 2,
  "label": "uses",
  "metadata": {}
 }
 ```
 **DELETE /relationships body:**
 ```json
 {
  "fromId": 1,
  "toId": 2,
  "label": "uses"
 }
 ```
 > Relationships are identified by the composite key `(fromId, toId, label)`. Delete uses
 > the request body rather than URL params as this three-part key is awkward to express
 > cleanly in a path.
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -39,56 +39,58 @@ src/
 │   ├── memory.js      # HTTP client for memory service
 │   ├── inference.js   # HTTP client for inference service
 │   ├── embedding.js   # HTTP client for embedding service
-│   └── qdrant.js      # HTTP client for Qdrant vector search
+│   └── qdrant.js      # HTTP client for Qdrant (direct vector search)
 ├── chat/
-│   └── index.js       # Core pipeline logic — context assembly and coordination
+│   └── index.js       # Core pipeline — context assembly, isolation, auto-naming
 ├── routes/
-│   ├── chat.js        # POST /chat and POST /chat/stream route handlers
+│   ├── chat.js        # POST /chat and POST /chat/stream
-│   ├── sessions.js    # Session list, history, rename, and delete routes
+│   ├── sessions.js    # Session CRUD proxy
-│   ├── projects.js    # Project CRUD routes — proxies to memory service
+│   ├── projects.js    # Project CRUD proxy
-│   └── models.js      # GET /models — reads models.json manifest from disk
+│   └── models.js      # GET /models — reads models.json from disk
 └── index.js           # Express app entry point
 ```
-The `services/` layer wraps all downstream HTTP calls in named functions,
+The `services/` layer wraps all downstream HTTP calls in named functions.
 keeping the pipeline logic in `chat/index.js` readable and ensuring that
 URL or endpoint changes have a single place to be updated.
 ## Chat Pipeline
-Both `POST /chat` and `POST /chat/stream` share the same context assembly
+Both `POST /chat` and `POST /chat/stream` share the same steps. The only
-steps. The only difference is how the inference response is delivered to
+difference is how the inference response is delivered to the client.
 the client.
-1. **Session resolution** — looks up the session by `externalId` in the memory
+### Steps
   service. If not found, auto-creates a new session. Clients can generate a
   UUID for new conversations and pass it directly — no pre-creation step needed.
-2. **Recent episode retrieval** — fetches the most recent episodes for the session
+1. **Session resolution** — look up session by `externalId`. Auto-create if
-   (default: 5) from the memory service.
+   not found. Clients generate a UUID for new conversations — no pre-creation
   step needed.
-3. **Semantic search** — embeds the user message via the embedding service, then
+2. **Project context resolution** — if the session has a `project_id`, fetch
-   queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
+   the project and all its session IDs. Used to scope semantic search. See
-   Results are deduplicated against the recent episode set using a `Set` of IDs.
+   `memory-isolation.md` for full behaviour.
-   Full episode content is fetched from the memory service by ID. This step is
+
-   non-critical — if it fails, a warning is logged and the pipeline continues with
+3. **Recent episode retrieval** — fetch the most recent episodes for the
   session (`RECENT_EPISODE_LIMIT`, default 5).
 4. **Semantic search** — embed the user message, query Qdrant for the top-5
   most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
   recent episodes. Non-critical — if it fails, pipeline continues with
   recency-only context.
-4. **Prompt assembly** — combines the system prompt, semantic episodes (if any),
+5. **Prompt assembly** — combine system prompt, semantic episodes, recent
-   recent episodes, and the current user message into a single prompt string.
+   episodes, and user message.
-5. **Inference** — sends the assembled prompt to the inference service. `/chat`
+6. **Inference** — send to inference service. `/chat` awaits full response;
-   awaits the full response; `/chat/stream` opens an SSE connection and pipes
+   `/chat/stream` pipes SSE chunks to the client.
   chunks to the client as they arrive.
-6. **Episode write** — writes the new exchange (user message + AI response)
+7. **Episode write** — write the exchange back to memory. Fire-and-forget
-   back to the memory service as a fire-and-forget operation. For streaming,
+   for `/chat`; awaited for `/chat/stream` to ensure the full text is
-   the full response text is accumulated across chunks before writing.
+   accumulated before saving.
-7. **Response** — returns the AI response, model name, session ID, and token
+8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
-   count to the client.
+   inference call with a naming prompt (max 20 tokens, temperature 0.3) and
   write the result back as `session.name`. Fully fire-and-forget.
-## Prompt Structure
+### Prompt Structure
 ```
 [System prompt]
@@ -108,212 +110,67 @@ User: {current message}
 Assistant:
 ```
-Semantic episodes appear before recent episodes so the model encounters
+Semantic episodes appear before recent episodes so the model sees
-long-range relevant context before the immediate conversation flow.
+long-range context before the immediate conversation flow.
 ## SSE Stream Format
-The inference service emits chunks from the llama.cpp provider in this format:
+Inference service → orchestration:
 ```
 data: {"response":"Hello","done":false}
-data: {"response":"!","done":false}
+data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
 data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
 data: [DONE]
 ```
-The orchestration service re-emits to the client as:
+Orchestration → client:
 ```
 data: {"text":"Hello"}
-data: {"text":"!"}
+data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
 data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
 ```
-The `[DONE]` sentinel from the inference service is consumed internally
+The `[DONE]` sentinel is consumed internally and not forwarded. The stream
-and not forwarded. The client stream is terminated by `res.end()` after
+is terminated by `res.end()` after the done event.
 the done event. Model name and token count are included on the done event
 so the client can display them in the UI.
 ## Models Manifest
-The `/models` endpoint reads a `models.json` file from disk at the path
+`GET /models` reads `models.json` fresh on each request from
-specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
+`MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
-the model files, and is accessible to orchestration via a network share
+accessible via an SMB mount at `/mnt/nexus-models`.
 mounted at `/mnt/nexus-models`.
 The manifest is read fresh on each request — no restart needed when models
 are added or removed.
 **models.json format:**
 ```json
 [
  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
 ]
 ```
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
+`value` must match the model name as reported by `llama-server` (including
- `label` — display name shown in the UI
+`.gguf` extension). No service restart needed when models are added or removed.
-## Endpoints
+## Sessions Route Behaviour
-### Health
+`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
 The validation guard only rejects requests where neither is provided:
-| Method | Path | Description |
+```js
-|---|---|---|
+if (!name?.trim() && projectId === undefined) {
-| GET | /health | Service health check — reports downstream service URLs |
+  return res.status(400).json({ error: 'name or projectId is required' });
 ### Chat
 | Method | Path | Description |
 |---|---|---|
 | POST | /chat | Send a message and receive a complete response |
 | POST | /chat/stream | Send a message and receive a streaming SSE response |
 ### Sessions
 | Method | Path | Description |
 |---|---|---|
 | GET | /sessions | Get paginated list of all sessions |
 | GET | /sessions/:sessionId/history | Get paginated episode history for a session |
 | PATCH | /sessions/:sessionId | Rename a session |
 | DELETE | /sessions/:sessionId | Delete a session and all its episodes |
 ### Projects
 Projects are proxied directly from the memory service with no transformation.
 | Method | Path | Description |
 |---|---|---|
 | GET | /projects | Get all projects |
 | POST | /projects | Create a new project |
 | PATCH | /projects/:id | Update a project |
 | DELETE | /projects/:id | Delete a project |
 ### Models
 | Method | Path | Description |
 |---|---|---|
 | GET | /models | Get list of available models from manifest file |
 ---
 **POST /chat**
 Request body:
 ```json
 {
  "sessionId": "your-session-uuid",
  "message": "Hello, my name is Tim.",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7
 }
 ```
-`model` and `temperature` are optional — fall back to inference service defaults
+This allows `useChat` to write project assignment separately from rename
-if omitted.
+operations.
 Response:
 ```json
 {
  "sessionId": "your-session-uuid",
  "response": "Hello Tim! How can I help you today?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "tokenCount": 87
 }
 ```
 ---
 **POST /chat/stream**
 Same request body as `POST /chat`.
 Response is a stream of Server-Sent Events:
 ```
 data: {"text":"Hello"}
 data: {"text":" Tim"}
 data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
 ```
 ---
 **PATCH /sessions/:sessionId**
 Request body:
 ```json
 { "name": "My Renamed Session" }
 ```
 Returns the updated session object. `name` is required and trimmed of whitespace.
 ---
 **DELETE /sessions/:sessionId**
 Returns `204 No Content`. Cascades to delete all episodes for the session.
 ---
 **GET /sessions/:sessionId/history**
 Query parameters:
 | Parameter | Default | Description |
 |---|---|---|
 | limit | 20 | Maximum number of episodes to return |
 | offset | 0 | Number of episodes to skip (for pagination) |
 Response:
 ```json
 {
  "sessionId": "your-session-uuid",
  "episodes": [
    {
      "id": 42,
      "session_id": 1,
      "user_message": "Hello, my name is Tim.",
      "ai_response": "Hello Tim! How can I help you today?",
      "token_count": 87,
      "created_at": 1712345678,
      "metadata": null
    }
  ]
 }
 ```
 Episodes are ordered newest first.
 ---
 **GET /models**
 Returns the parsed contents of `models.json`:
 ```json
 [
  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
 ]
 ```
 Returns `500` if the manifest file cannot be read or parsed.
 ## Caddy Configuration
-The Caddy reverse proxy on Mini PC 2 must have a handle block for each route
+Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
 prefix the client needs to reach. Current required blocks:
 ```
-handle /chat* {
+handle /chat*     { reverse_proxy localhost:4000 }
-    reverse_proxy localhost:4000
+handle /sessions* { reverse_proxy localhost:4000 }
-}
+handle /models*   { reverse_proxy localhost:4000 }
-handle /sessions* {
+handle /projects* { reverse_proxy localhost:4000 }
    reverse_proxy localhost:4000
 }
 handle /models* {
    reverse_proxy localhost:4000
 }
 handle /projects* {
    reverse_proxy localhost:4000
 }
 ```
-When adding new top-level routes to the orchestration service, add a matching
+After updating: `caddy reload --config /path/to/Caddyfile`
-block here and reload Caddy: `caddy reload --config /path/to/Caddyfile`
+
 For all HTTP endpoints, see `api-routes.md`.