update documentation

2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -1,56 +1,80 @@
 # Architecture Overview

-NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
+NexusAI is a modular, memory-centric AI assistant designed for persistent,
+context-aware conversations. It separates concerns across independent services
+that can be evolved and deployed separately.

 ## Core Design Principles

- **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources
+- **Decoupled layers** — memory, inference, and orchestration are independent of each other
+- **Hybrid retrieval** — semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
+- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
+- **Home lab first** — services are distributed across nodes according to available hardware

 ## Memory Model

-Memory is split between SQLite and Qdrant, which work together as a pair:
+Memory is split between SQLite and Qdrant, which always work as a pair:

- **SQLite:** episodic interactions, entities, relationships, summaries
- **Qdrant:** vector embeddings for semantic similarity search
+- **SQLite** — episodic interactions, entities, relationships, summaries, sessions, projects
+- **Qdrant** — vector embeddings for semantic similarity search

-When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
-full content from SQLite. Neither SQLite nor Qdrant work in isolation.
+When recalling memory, Qdrant returns IDs and similarity scores, which are used
+to fetch full content from SQLite. Neither store works in isolation.
+
+Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
+enabling per-session and per-project filtering at search time. See
+`memory-isolation.md` for how project-scoped retrieval works.

 ## Hardware Layout

 | Node | Address | Role |
 |---|---|---|
-| Main PC | local | Primary inference (RTX A4000 16GB) |
-| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
-| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
+| Main PC | 192.168.0.79 | Primary inference — RTX A4000 16GB |
+| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
+| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |

 ## Service Communication

-All services expose a REST HTTP API. The orchestration service is the single entry point —
-clients do not talk directly to the memory or inference services.
+All services expose a REST HTTP API. The orchestration service is the single
+entry point — clients never talk directly to memory or inference services.

 ```
-Client
-└─► Orchestration (:4000)
-    ├─► Chat Client (static files, /srv/nexusai)
-    ├─► Memory Service (:3002)
-    │     ├─► Qdrant (:6333)
-    │     └─► SQLite
-    ├─► Embedding Service (:3003)
-    │     └─► Ollama
-    └─► Inference Service (:3001)
-          └─► Ollama
+Client (browser)
+└─► Caddy (HTTPS + Authelia SSO)
+    └─► Orchestration (:4000) — Mini PC 2
+        ├─► Memory Service (:3002) — Mini PC 1
+        │     ├─► SQLite (local file)
+        │     └─► Qdrant (:6333) — Mini PC 1
+        ├─► Embedding Service (:3003) — Mini PC 1
+        │     └─► Ollama (:11434) — Mini PC 1
+        ├─► Inference Service (:3001) — Main PC
+        │     └─► llama-server (:8080) — Main PC
+        └─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
 ```

+Note: Orchestration queries Qdrant directly for semantic search (bypassing
+the memory service) but always fetches full episode content from the memory
+service by ID after the vector search.
+
 ## Technology Choices

 | Concern | Choice | Reason |
 |---|---|---|
-| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
+| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
 | Package management | npm workspaces | Monorepo with shared code, no publishing needed |
 | Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
-| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
-| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
-| Version control | Gitea (self-hosted) | Code stays on local network |
+| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
+| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
+| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
+| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
+| Version control | Gitea (self-hosted) | Code stays on local network |
+
+## Current State
+
+The core four-service architecture is complete and operational. Key capabilities:
+
+- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
+- **Projects** — sessions grouped with shared or isolated memory pools
+- **Auto-naming** — sessions named automatically from first exchange via inference
+- **Project-scoped semantic search** — Qdrant filtered by project session IDs
+- **Chat client** — view-based UI with sidebar navigation, project views, session management