81 lines
4.1 KiB
Markdown
81 lines
4.1 KiB
Markdown
# Architecture Overview
|
|
|
|
NexusAI is a modular, memory-centric AI assistant designed for persistent,
|
|
context-aware conversations. It separates concerns across independent services
|
|
that can be evolved and deployed separately.
|
|
|
|
## Core Design Principles
|
|
|
|
- **Decoupled layers** — memory, inference, and orchestration are independent of each other
|
|
- **Hybrid retrieval** — semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
|
- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
|
|
- **Home lab first** — services are distributed across nodes according to available hardware
|
|
|
|
## Memory Model
|
|
|
|
Memory is split between SQLite and Qdrant, which always work as a pair:
|
|
|
|
- **SQLite** — episodic interactions, entities, relationships, summaries, sessions, projects
|
|
- **Qdrant** — vector embeddings for semantic similarity search
|
|
|
|
When recalling memory, Qdrant returns IDs and similarity scores, which are used
|
|
to fetch full content from SQLite. Neither store works in isolation.
|
|
|
|
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
|
|
enabling per-session and per-project filtering at search time. See
|
|
`memory-isolation.md` for how project-scoped retrieval works.
|
|
|
|
## Hardware Layout
|
|
|
|
| Node | Address | Role |
|
|
|---|---|---|
|
|
| Main PC | 192.168.0.79 | Primary inference — RTX A4000 16GB |
|
|
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
|
|
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
|
|
|
|
## Service Communication
|
|
|
|
All services expose a REST HTTP API. The orchestration service is the single
|
|
entry point — clients never talk directly to memory or inference services.
|
|
|
|
```
|
|
Client (browser)
|
|
└─► Caddy (HTTPS + Authelia SSO)
|
|
└─► Orchestration (:4000) — Mini PC 2
|
|
├─► Memory Service (:3002) — Mini PC 1
|
|
│ ├─► SQLite (local file)
|
|
│ └─► Qdrant (:6333) — Mini PC 1
|
|
├─► Embedding Service (:3003) — Mini PC 1
|
|
│ └─► Ollama (:11434) — Mini PC 1
|
|
├─► Inference Service (:3001) — Main PC
|
|
│ └─► llama-server (:8080) — Main PC
|
|
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
|
|
```
|
|
|
|
Note: Orchestration queries Qdrant directly for semantic search (bypassing
|
|
the memory service) but always fetches full episode content from the memory
|
|
service by ID after the vector search.
|
|
|
|
## Technology Choices
|
|
|
|
| Concern | Choice | Reason |
|
|
|---|---|---|
|
|
| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
|
|
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
|
|
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
|
|
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
|
|
| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
|
|
| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
|
|
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
|
|
| Version control | Gitea (self-hosted) | Code stays on local network |
|
|
|
|
## Current State
|
|
|
|
The core four-service architecture is complete and operational. Key capabilities:
|
|
|
|
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
|
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
|
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
|
- **Chat client** — view-based UI with sidebar navigation, project views, session management |