updated documentation for semantic and constant refactor
This commit is contained in:
@@ -1,38 +1,50 @@
|
||||
# Architecture Overview
|
||||
|
||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved
|
||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
||||
|
||||
## Core Design Principles
|
||||
- **Decoupled layers:** memory, inference, orchestration independent of eachother
|
||||
- **Hybrid retrieval:** semantic similarity (QDrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Home lab:** Services are properly distributed across the various nodes according to available hardware and resources
|
||||
|
||||
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
||||
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
||||
|
||||
## Memory Model
|
||||
Memory is split between SQLite and QDrant, which both work together as a pair
|
||||
- **SQlite:** episodic interactions, entities, relationships, summaries
|
||||
- **QDrant:** vector embeddings for semantic similarity search
|
||||
|
||||
When recallng memory, QDrant returns IDs and similarity scores, which are used to fetch full content from SQLite. Neither SQlite or QDrant work in isolation
|
||||
Memory is split between SQLite and Qdrant, which work together as a pair:
|
||||
|
||||
- **SQLite:** episodic interactions, entities, relationships, summaries
|
||||
- **Qdrant:** vector embeddings for semantic similarity search
|
||||
|
||||
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
||||
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
||||
|
||||
## Hardware Layout
|
||||
|
||||
| Node | Address | Role |
|
||||
|---|---|---|
|
||||
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
|
||||
|
||||
## Service Communication
|
||||
All services expose a REST HTTP api. The orchestration service is the single entgry-point. Clients dont talk directly to the memory or inference services
|
||||
|
||||
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
||||
clients do not talk directly to the memory or inference services.
|
||||
|
||||
```
|
||||
Client
|
||||
└─► Orchestration (:4000)
|
||||
├─► Memory Service (:3002)
|
||||
│ └─► Qdrant (:6333)
|
||||
│ └─► SQLite
|
||||
├─► Embedding Service (:3003)
|
||||
└─► Inference Service (:3001)
|
||||
└─► Ollama
|
||||
├─► Memory Service (:3002)
|
||||
│ ├─► Qdrant (:6333)
|
||||
│ └─► SQLite
|
||||
├─► Embedding Service (:3003)
|
||||
│ └─► Ollama
|
||||
└─► Inference Service (:3001)
|
||||
└─► Ollama
|
||||
```
|
||||
|
||||
## Technology Choices
|
||||
|
||||
| Concern | Choice | Reason |
|
||||
|---|---|---|
|
||||
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
||||
|
||||
Reference in New Issue
Block a user