updated documentation for semantic and constant refactor
This commit is contained in:
@@ -1,38 +1,50 @@
|
|||||||
# Architecture Overview
|
# Architecture Overview
|
||||||
|
|
||||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved
|
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
||||||
|
|
||||||
## Core Design Principles
|
## Core Design Principles
|
||||||
- **Decoupled layers:** memory, inference, orchestration independent of eachother
|
|
||||||
- **Hybrid retrieval:** semantic similarity (QDrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
||||||
- **Home lab:** Services are properly distributed across the various nodes according to available hardware and resources
|
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||||
|
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
||||||
|
|
||||||
## Memory Model
|
## Memory Model
|
||||||
Memory is split between SQLite and QDrant, which both work together as a pair
|
|
||||||
- **SQlite:** episodic interactions, entities, relationships, summaries
|
|
||||||
- **QDrant:** vector embeddings for semantic similarity search
|
|
||||||
|
|
||||||
When recallng memory, QDrant returns IDs and similarity scores, which are used to fetch full content from SQLite. Neither SQlite or QDrant work in isolation
|
Memory is split between SQLite and Qdrant, which work together as a pair:
|
||||||
|
|
||||||
|
- **SQLite:** episodic interactions, entities, relationships, summaries
|
||||||
|
- **Qdrant:** vector embeddings for semantic similarity search
|
||||||
|
|
||||||
|
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
||||||
|
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
||||||
|
|
||||||
## Hardware Layout
|
## Hardware Layout
|
||||||
|
|
||||||
|
| Node | Address | Role |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
||||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
||||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
|
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
|
||||||
|
|
||||||
## Service Communication
|
## Service Communication
|
||||||
All services expose a REST HTTP api. The orchestration service is the single entgry-point. Clients dont talk directly to the memory or inference services
|
|
||||||
|
|
||||||
|
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
||||||
|
clients do not talk directly to the memory or inference services.
|
||||||
|
|
||||||
|
```
|
||||||
Client
|
Client
|
||||||
└─► Orchestration (:4000)
|
└─► Orchestration (:4000)
|
||||||
├─► Memory Service (:3002)
|
├─► Memory Service (:3002)
|
||||||
│ └─► Qdrant (:6333)
|
│ ├─► Qdrant (:6333)
|
||||||
│ └─► SQLite
|
│ └─► SQLite
|
||||||
├─► Embedding Service (:3003)
|
├─► Embedding Service (:3003)
|
||||||
└─► Inference Service (:3001)
|
│ └─► Ollama
|
||||||
└─► Ollama
|
└─► Inference Service (:3001)
|
||||||
|
└─► Ollama
|
||||||
|
```
|
||||||
|
|
||||||
## Technology Choices
|
## Technology Choices
|
||||||
|
|
||||||
| Concern | Choice | Reason |
|
| Concern | Choice | Reason |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
||||||
|
|||||||
@@ -17,7 +17,7 @@ stores directly.
|
|||||||
- `better-sqlite3` — SQLite driver
|
- `better-sqlite3` — SQLite driver
|
||||||
- `@qdrant/js-client-rest` — Qdrant vector store client
|
- `@qdrant/js-client-rest` — Qdrant vector store client
|
||||||
- `dotenv` — environment variable loading
|
- `dotenv` — environment variable loading
|
||||||
- `@nexusai/shared` — shared utilities
|
- `@nexusai/shared` — shared utilities and constants
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
@@ -28,18 +28,23 @@ stores directly.
|
|||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
|
```
|
||||||
src/
|
src/
|
||||||
├── db/
|
├── db/
|
||||||
│ ├── index.js # SQLite connection + initialization
|
│ ├── index.js # SQLite connection + initialization
|
||||||
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
│ └── schema.js # Table definitions, indexes, FTS5, triggers
|
||||||
├── episodic/
|
├── episodic/
|
||||||
│ └── index.js # Session + episode CRUD
|
│ └── index.js # Session + episode CRUD and FTS search
|
||||||
├── semantic/ # Qdrant vector operations (in progress)
|
├── semantic/
|
||||||
|
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||||
├── entities/ # Entity + relationship CRUD (upcoming)
|
├── entities/ # Entity + relationship CRUD (upcoming)
|
||||||
└── index.js # Express app + route definitions
|
└── index.js # Express app + route definitions
|
||||||
|
```
|
||||||
|
|
||||||
## SQLite Schema
|
## SQLite Schema
|
||||||
Four core tables:
|
|
||||||
|
Five core tables:
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers, identified by an `external_id`
|
- **sessions** — top-level conversation containers, identified by an `external_id`
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||||
@@ -59,6 +64,42 @@ keep the FTS index automatically in sync with the episodes table.
|
|||||||
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
||||||
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
|
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
|
||||||
|
|
||||||
|
## Qdrant / Semantic Layer
|
||||||
|
|
||||||
|
Three collections are initialized on service startup (created if they don't already exist):
|
||||||
|
|
||||||
|
| Collection | Purpose |
|
||||||
|
|---|---|
|
||||||
|
| `episodes` | Embeddings for individual conversation exchanges |
|
||||||
|
| `entities` | Embeddings for named entities |
|
||||||
|
| `summaries` | Embeddings for condensed episode summaries |
|
||||||
|
|
||||||
|
All collections use **768-dimension vectors** with **Cosine similarity**, matching the
|
||||||
|
output of the `nomic-embed-text` embedding model via Ollama.
|
||||||
|
|
||||||
|
Vector dimension and distance metric are defined in `@nexusai/shared` constants
|
||||||
|
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
|
||||||
|
|
||||||
|
### Semantic Layer Operations
|
||||||
|
|
||||||
|
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
|
||||||
|
|
||||||
|
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
|
||||||
|
lookups back to the full content after a vector search
|
||||||
|
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
|
||||||
|
- **Delete** — removes a vector point by ID
|
||||||
|
|
||||||
|
The `wait: true` flag is used on all write operations so the caller receives confirmation
|
||||||
|
only after Qdrant has committed the change.
|
||||||
|
|
||||||
|
### Hybrid Retrieval Pattern
|
||||||
|
|
||||||
|
Qdrant and SQLite work as a pair — neither operates in isolation:
|
||||||
|
|
||||||
|
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
|
||||||
|
2. IDs are used to fetch full content from SQLite
|
||||||
|
3. Results are ranked and assembled into a context package
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
### Health
|
### Health
|
||||||
@@ -105,12 +146,4 @@ keep the FTS index automatically in sync with the episodes table.
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
> Semantic (Qdrant) and entity endpoints will be documented as they are built out.
|
> Semantic (Qdrant) and entity REST endpoints will be documented as they are built out.
|
||||||
|
|
||||||
## Endpoints
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /health | Service health check |
|
|
||||||
|
|
||||||
> Further endpoints will be documented as the service is built out.
|
|
||||||
@@ -1,18 +1,67 @@
|
|||||||
# Shared Package
|
# Shared Package
|
||||||
|
|
||||||
**Package:** '@nexusai/shared'
|
**Package:** `@nexusai/shared`
|
||||||
**Location:** 'packages/shared'
|
**Location:** `packages/shared`
|
||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
Common utilities and configuration used across all NexusAI services
|
|
||||||
Keeping these here avoids duplicating and ensure consistent behavior
|
|
||||||
|
|
||||||
# Exports
|
Common utilities and configuration used across all NexusAI services.
|
||||||
|
Keeping these here avoids duplication and ensures consistent behaviour.
|
||||||
|
|
||||||
### 'getEnv(key, defaultValue?)'
|
## Exports
|
||||||
Loads an environment variable by key. If no default is provided and the variable is missing, throws at startup rather than failing later on.
|
|
||||||
```javascript
|
### `getEnv(key, defaultValue?)`
|
||||||
|
|
||||||
|
Loads an environment variable by key. If no default is provided and the
|
||||||
|
variable is missing, throws at startup rather than failing silently later.
|
||||||
|
|
||||||
|
```js
|
||||||
const { getEnv } = require('@nexusai/shared');
|
const { getEnv } = require('@nexusai/shared');
|
||||||
const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002
|
|
||||||
const DB = getEnv('SQLITE_PATH'); // required — throws if missing
|
const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002
|
||||||
|
const DB = getEnv('SQLITE_PATH'); // required — throws if missing
|
||||||
```
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Constants
|
||||||
|
|
||||||
|
Tuneable values and shared identifiers are centralised in `constants.js`
|
||||||
|
rather than hardcoded across services. Import the relevant group by name.
|
||||||
|
|
||||||
|
```js
|
||||||
|
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `QDRANT`
|
||||||
|
|
||||||
|
Vector store configuration. Values here must stay in sync with the
|
||||||
|
embedding model and Qdrant collection setup.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
|
||||||
|
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
|
||||||
|
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
|
||||||
|
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
|
||||||
|
|
||||||
|
#### `COLLECTIONS`
|
||||||
|
|
||||||
|
Canonical Qdrant collection names. Used by both the semantic layer and
|
||||||
|
any service that constructs Qdrant queries directly.
|
||||||
|
|
||||||
|
| Key | Value |
|
||||||
|
|---|---|
|
||||||
|
| `EPISODES` | `'episodes'` |
|
||||||
|
| `ENTITIES` | `'entities'` |
|
||||||
|
| `SUMMARIES` | `'summaries'` |
|
||||||
|
|
||||||
|
#### `EPISODIC`
|
||||||
|
|
||||||
|
Default pagination and result limits for SQLite episode queries.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
|
||||||
|
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
|
||||||
|
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
|
||||||
Reference in New Issue
Block a user