updated documentation for semantic and constant refactor
This commit is contained in:
@@ -1,38 +1,50 @@
|
||||
# Architecture Overview
|
||||
|
||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved
|
||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
||||
|
||||
## Core Design Principles
|
||||
- **Decoupled layers:** memory, inference, orchestration independent of eachother
|
||||
- **Hybrid retrieval:** semantic similarity (QDrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Home lab:** Services are properly distributed across the various nodes according to available hardware and resources
|
||||
|
||||
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
||||
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
||||
|
||||
## Memory Model
|
||||
Memory is split between SQLite and QDrant, which both work together as a pair
|
||||
- **SQlite:** episodic interactions, entities, relationships, summaries
|
||||
- **QDrant:** vector embeddings for semantic similarity search
|
||||
|
||||
When recallng memory, QDrant returns IDs and similarity scores, which are used to fetch full content from SQLite. Neither SQlite or QDrant work in isolation
|
||||
Memory is split between SQLite and Qdrant, which work together as a pair:
|
||||
|
||||
- **SQLite:** episodic interactions, entities, relationships, summaries
|
||||
- **Qdrant:** vector embeddings for semantic similarity search
|
||||
|
||||
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
||||
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
||||
|
||||
## Hardware Layout
|
||||
|
||||
| Node | Address | Role |
|
||||
|---|---|---|
|
||||
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
|
||||
|
||||
## Service Communication
|
||||
All services expose a REST HTTP api. The orchestration service is the single entgry-point. Clients dont talk directly to the memory or inference services
|
||||
|
||||
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
||||
clients do not talk directly to the memory or inference services.
|
||||
|
||||
```
|
||||
Client
|
||||
└─► Orchestration (:4000)
|
||||
├─► Memory Service (:3002)
|
||||
│ └─► Qdrant (:6333)
|
||||
│ └─► SQLite
|
||||
├─► Embedding Service (:3003)
|
||||
└─► Inference Service (:3001)
|
||||
└─► Ollama
|
||||
├─► Memory Service (:3002)
|
||||
│ ├─► Qdrant (:6333)
|
||||
│ └─► SQLite
|
||||
├─► Embedding Service (:3003)
|
||||
│ └─► Ollama
|
||||
└─► Inference Service (:3001)
|
||||
└─► Ollama
|
||||
```
|
||||
|
||||
## Technology Choices
|
||||
|
||||
| Concern | Choice | Reason |
|
||||
|---|---|---|
|
||||
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
||||
|
||||
@@ -17,7 +17,7 @@ stores directly.
|
||||
- `better-sqlite3` — SQLite driver
|
||||
- `@qdrant/js-client-rest` — Qdrant vector store client
|
||||
- `dotenv` — environment variable loading
|
||||
- `@nexusai/shared` — shared utilities
|
||||
- `@nexusai/shared` — shared utilities and constants
|
||||
|
||||
## Environment Variables
|
||||
|
||||
@@ -28,18 +28,23 @@ stores directly.
|
||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
||||
|
||||
## Internal Structure
|
||||
|
||||
```
|
||||
src/
|
||||
├── db/
|
||||
│ ├── index.js # SQLite connection + initialization
|
||||
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
||||
├── episodic/
|
||||
│ └── index.js # Session + episode CRUD
|
||||
├── semantic/ # Qdrant vector operations (in progress)
|
||||
│ └── index.js # Session + episode CRUD and FTS search
|
||||
├── semantic/
|
||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||
├── entities/ # Entity + relationship CRUD (upcoming)
|
||||
└── index.js # Express app + route definitions
|
||||
```
|
||||
|
||||
## SQLite Schema
|
||||
Four core tables:
|
||||
|
||||
Five core tables:
|
||||
|
||||
- **sessions** — top-level conversation containers, identified by an `external_id`
|
||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||
@@ -59,6 +64,42 @@ keep the FTS index automatically in sync with the episodes table.
|
||||
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
||||
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
|
||||
|
||||
## Qdrant / Semantic Layer
|
||||
|
||||
Three collections are initialized on service startup (created if they don't already exist):
|
||||
|
||||
| Collection | Purpose |
|
||||
|---|---|
|
||||
| `episodes` | Embeddings for individual conversation exchanges |
|
||||
| `entities` | Embeddings for named entities |
|
||||
| `summaries` | Embeddings for condensed episode summaries |
|
||||
|
||||
All collections use **768-dimension vectors** with **Cosine similarity**, matching the
|
||||
output of the `nomic-embed-text` embedding model via Ollama.
|
||||
|
||||
Vector dimension and distance metric are defined in `@nexusai/shared` constants
|
||||
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
|
||||
|
||||
### Semantic Layer Operations
|
||||
|
||||
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
|
||||
|
||||
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
|
||||
lookups back to the full content after a vector search
|
||||
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
|
||||
- **Delete** — removes a vector point by ID
|
||||
|
||||
The `wait: true` flag is used on all write operations so the caller receives confirmation
|
||||
only after Qdrant has committed the change.
|
||||
|
||||
### Hybrid Retrieval Pattern
|
||||
|
||||
Qdrant and SQLite work as a pair — neither operates in isolation:
|
||||
|
||||
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
|
||||
2. IDs are used to fetch full content from SQLite
|
||||
3. Results are ranked and assembled into a context package
|
||||
|
||||
## Endpoints
|
||||
|
||||
### Health
|
||||
@@ -105,12 +146,4 @@ keep the FTS index automatically in sync with the episodes table.
|
||||
}
|
||||
```
|
||||
|
||||
> Semantic (Qdrant) and entity endpoints will be documented as they are built out.
|
||||
|
||||
## Endpoints
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /health | Service health check |
|
||||
|
||||
> Further endpoints will be documented as the service is built out.
|
||||
> Semantic (Qdrant) and entity REST endpoints will be documented as they are built out.
|
||||
@@ -1,18 +1,67 @@
|
||||
# Shared Package
|
||||
|
||||
**Package:** '@nexusai/shared'
|
||||
**Location:** 'packages/shared'
|
||||
**Package:** `@nexusai/shared`
|
||||
**Location:** `packages/shared`
|
||||
|
||||
## Purpose
|
||||
Common utilities and configuration used across all NexusAI services
|
||||
Keeping these here avoids duplicating and ensure consistent behavior
|
||||
|
||||
# Exports
|
||||
Common utilities and configuration used across all NexusAI services.
|
||||
Keeping these here avoids duplication and ensures consistent behaviour.
|
||||
|
||||
### 'getEnv(key, defaultValue?)'
|
||||
Loads an environment variable by key. If no default is provided and the variable is missing, throws at startup rather than failing later on.
|
||||
```javascript
|
||||
## Exports
|
||||
|
||||
### `getEnv(key, defaultValue?)`
|
||||
|
||||
Loads an environment variable by key. If no default is provided and the
|
||||
variable is missing, throws at startup rather than failing silently later.
|
||||
|
||||
```js
|
||||
const { getEnv } = require('@nexusai/shared');
|
||||
|
||||
const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002
|
||||
const DB = getEnv('SQLITE_PATH'); // required — throws if missing
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Constants
|
||||
|
||||
Tuneable values and shared identifiers are centralised in `constants.js`
|
||||
rather than hardcoded across services. Import the relevant group by name.
|
||||
|
||||
```js
|
||||
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
|
||||
```
|
||||
|
||||
#### `QDRANT`
|
||||
|
||||
Vector store configuration. Values here must stay in sync with the
|
||||
embedding model and Qdrant collection setup.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
|
||||
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
|
||||
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
|
||||
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
|
||||
|
||||
#### `COLLECTIONS`
|
||||
|
||||
Canonical Qdrant collection names. Used by both the semantic layer and
|
||||
any service that constructs Qdrant queries directly.
|
||||
|
||||
| Key | Value |
|
||||
|---|---|
|
||||
| `EPISODES` | `'episodes'` |
|
||||
| `ENTITIES` | `'entities'` |
|
||||
| `SUMMARIES` | `'summaries'` |
|
||||
|
||||
#### `EPISODIC`
|
||||
|
||||
Default pagination and result limits for SQLite episode queries.
|
||||
|
||||
| Key | Value | Description |
|
||||
|---|---|---|
|
||||
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
|
||||
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
|
||||
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
|
||||
Reference in New Issue
Block a user