update documentation
This commit is contained in:
@@ -1,56 +1,80 @@
|
||||
# Architecture Overview
|
||||
|
||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
||||
NexusAI is a modular, memory-centric AI assistant designed for persistent,
|
||||
context-aware conversations. It separates concerns across independent services
|
||||
that can be evolved and deployed separately.
|
||||
|
||||
## Core Design Principles
|
||||
|
||||
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
||||
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
||||
- **Decoupled layers** — memory, inference, and orchestration are independent of each other
|
||||
- **Hybrid retrieval** — semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
|
||||
- **Home lab first** — services are distributed across nodes according to available hardware
|
||||
|
||||
## Memory Model
|
||||
|
||||
Memory is split between SQLite and Qdrant, which work together as a pair:
|
||||
Memory is split between SQLite and Qdrant, which always work as a pair:
|
||||
|
||||
- **SQLite:** episodic interactions, entities, relationships, summaries
|
||||
- **Qdrant:** vector embeddings for semantic similarity search
|
||||
- **SQLite** — episodic interactions, entities, relationships, summaries, sessions, projects
|
||||
- **Qdrant** — vector embeddings for semantic similarity search
|
||||
|
||||
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
||||
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
||||
When recalling memory, Qdrant returns IDs and similarity scores, which are used
|
||||
to fetch full content from SQLite. Neither store works in isolation.
|
||||
|
||||
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
|
||||
enabling per-session and per-project filtering at search time. See
|
||||
`memory-isolation.md` for how project-scoped retrieval works.
|
||||
|
||||
## Hardware Layout
|
||||
|
||||
| Node | Address | Role |
|
||||
|---|---|---|
|
||||
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
|
||||
| Main PC | 192.168.0.79 | Primary inference — RTX A4000 16GB |
|
||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
|
||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
|
||||
|
||||
## Service Communication
|
||||
|
||||
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
||||
clients do not talk directly to the memory or inference services.
|
||||
All services expose a REST HTTP API. The orchestration service is the single
|
||||
entry point — clients never talk directly to memory or inference services.
|
||||
|
||||
```
|
||||
Client
|
||||
└─► Orchestration (:4000)
|
||||
├─► Chat Client (static files, /srv/nexusai)
|
||||
├─► Memory Service (:3002)
|
||||
│ ├─► Qdrant (:6333)
|
||||
│ └─► SQLite
|
||||
├─► Embedding Service (:3003)
|
||||
│ └─► Ollama
|
||||
└─► Inference Service (:3001)
|
||||
└─► Ollama
|
||||
Client (browser)
|
||||
└─► Caddy (HTTPS + Authelia SSO)
|
||||
└─► Orchestration (:4000) — Mini PC 2
|
||||
├─► Memory Service (:3002) — Mini PC 1
|
||||
│ ├─► SQLite (local file)
|
||||
│ └─► Qdrant (:6333) — Mini PC 1
|
||||
├─► Embedding Service (:3003) — Mini PC 1
|
||||
│ └─► Ollama (:11434) — Mini PC 1
|
||||
├─► Inference Service (:3001) — Main PC
|
||||
│ └─► llama-server (:8080) — Main PC
|
||||
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
|
||||
```
|
||||
|
||||
Note: Orchestration queries Qdrant directly for semantic search (bypassing
|
||||
the memory service) but always fetches full episode content from the memory
|
||||
service by ID after the vector search.
|
||||
|
||||
## Technology Choices
|
||||
|
||||
| Concern | Choice | Reason |
|
||||
|---|---|---|
|
||||
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
||||
| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
|
||||
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
|
||||
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
|
||||
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
|
||||
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
|
||||
| Version control | Gitea (self-hosted) | Code stays on local network |
|
||||
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
|
||||
| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
|
||||
| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
|
||||
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
|
||||
| Version control | Gitea (self-hosted) | Code stays on local network |
|
||||
|
||||
## Current State
|
||||
|
||||
The core four-service architecture is complete and operational. Key capabilities:
|
||||
|
||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||
- **Chat client** — view-based UI with sidebar navigation, project views, session management
|
||||
Reference in New Issue
Block a user