nexusAI/docs/architecture/overview.md

# Architecture Overview

NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.

## Core Design Principles

- **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources

## Memory Model

Memory is split between SQLite and Qdrant, which work together as a pair:

- **SQLite:** episodic interactions, entities, relationships, summaries
- **Qdrant:** vector embeddings for semantic similarity search

When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
full content from SQLite. Neither SQLite nor Qdrant work in isolation.

## Hardware Layout

| Node | Address | Role |
|---|---|---|
| Main PC | local | Primary inference (RTX A4000 16GB) |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |

## Service Communication

All services expose a REST HTTP API. The orchestration service is the single entry point —
clients do not talk directly to the memory or inference services.

```
Client
└─► Orchestration (:4000)
    ├─► Memory Service (:3002)
    │     ├─► Qdrant (:6333)
    │     └─► SQLite
    ├─► Embedding Service (:3003)
    │     └─► Ollama
    └─► Inference Service (:3001)
          └─► Ollama
```

## Technology Choices

| Concern | Choice | Reason |
|---|---|---|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
| Version control | Gitea (self-hosted) | Code stays on local network |