update documentation

This commit is contained in:
Storme-bit
2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions

View File

@@ -1,56 +1,80 @@
# Architecture Overview
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
NexusAI is a modular, memory-centric AI assistant designed for persistent,
context-aware conversations. It separates concerns across independent services
that can be evolved and deployed separately.
## Core Design Principles
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources
- **Decoupled layers** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
- **Home lab first** — services are distributed across nodes according to available hardware
## Memory Model
Memory is split between SQLite and Qdrant, which work together as a pair:
Memory is split between SQLite and Qdrant, which always work as a pair:
- **SQLite:** episodic interactions, entities, relationships, summaries
- **Qdrant:** vector embeddings for semantic similarity search
- **SQLite** episodic interactions, entities, relationships, summaries, sessions, projects
- **Qdrant** vector embeddings for semantic similarity search
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
When recalling memory, Qdrant returns IDs and similarity scores, which are used
to fetch full content from SQLite. Neither store works in isolation.
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
enabling per-session and per-project filtering at search time. See
`memory-isolation.md` for how project-scoped retrieval works.
## Hardware Layout
| Node | Address | Role |
|---|---|---|
| Main PC | local | Primary inference (RTX A4000 16GB) |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
| Main PC | 192.168.0.79 | Primary inference RTX A4000 16GB |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
## Service Communication
All services expose a REST HTTP API. The orchestration service is the single entry point —
clients do not talk directly to the memory or inference services.
All services expose a REST HTTP API. The orchestration service is the single
entry point — clients never talk directly to memory or inference services.
```
Client
└─► Orchestration (:4000)
─► Chat Client (static files, /srv/nexusai)
├─► Memory Service (:3002)
│ ├─► Qdrant (:6333)
│ └─► SQLite
├─► Embedding Service (:3003)
│ └─► Ollama
─► Inference Service (:3001)
└─► Ollama
Client (browser)
└─► Caddy (HTTPS + Authelia SSO)
─► Orchestration (:4000) — Mini PC 2
├─► Memory Service (:3002) — Mini PC 1
│ ├─► SQLite (local file)
│ └─► Qdrant (:6333) — Mini PC 1
├─► Embedding Service (:3003) — Mini PC 1
│ └─► Ollama (:11434) — Mini PC 1
─► Inference Service (:3001) — Main PC
└─► llama-server (:8080) — Main PC
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
```
Note: Orchestration queries Qdrant directly for semantic search (bypassing
the memory service) but always fetches full episode content from the memory
service by ID after the vector search.
## Technology Choices
| Concern | Choice | Reason |
|---|---|---|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
| Version control | Gitea (self-hosted) | Code stays on local network |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
| Version control | Gitea (self-hosted) | Code stays on local network |
## Current State
The core four-service architecture is complete and operational. Key capabilities:
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Projects** — sessions grouped with shared or isolated memory pools
- **Auto-naming** — sessions named automatically from first exchange via inference
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
- **Chat client** — view-based UI with sidebar navigation, project views, session management