55 lines
2.2 KiB
Markdown
55 lines
2.2 KiB
Markdown
# Architecture Overview
|
|
|
|
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
|
|
|
## Core Design Principles
|
|
|
|
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
|
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
|
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
|
|
|
## Memory Model
|
|
|
|
Memory is split between SQLite and Qdrant, which work together as a pair:
|
|
|
|
- **SQLite:** episodic interactions, entities, relationships, summaries
|
|
- **Qdrant:** vector embeddings for semantic similarity search
|
|
|
|
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
|
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
|
|
|
## Hardware Layout
|
|
|
|
| Node | Address | Role |
|
|
|---|---|---|
|
|
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
|
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
|
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
|
|
|
|
## Service Communication
|
|
|
|
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
|
clients do not talk directly to the memory or inference services.
|
|
|
|
```
|
|
Client
|
|
└─► Orchestration (:4000)
|
|
├─► Memory Service (:3002)
|
|
│ ├─► Qdrant (:6333)
|
|
│ └─► SQLite
|
|
├─► Embedding Service (:3003)
|
|
│ └─► Ollama
|
|
└─► Inference Service (:3001)
|
|
└─► Ollama
|
|
```
|
|
|
|
## Technology Choices
|
|
|
|
| Concern | Choice | Reason |
|
|
|---|---|---|
|
|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
|
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
|
|
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
|
|
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
|
|
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
|
|
| Version control | Gitea (self-hosted) | Code stays on local network | |