updated documentation for semantic and constant refactor

This commit is contained in:
Storme-bit
2026-04-04 08:15:29 -07:00
parent bd600d9865
commit 7d3f083485
3 changed files with 132 additions and 38 deletions

View File

@@ -1,38 +1,50 @@
# Architecture Overview # Architecture Overview
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
## Core Design Principles ## Core Design Principles
- **Decoupled layers:** memory, inference, orchestration independent of eachother
- **Hybrid retrieval:** semantic similarity (QDrant) combined with structured storage (SQLite) for flexible, ranked context assembly - **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Home lab:** Services are properly distributed across the various nodes according to available hardware and resources - **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources
## Memory Model ## Memory Model
Memory is split between SQLite and QDrant, which both work together as a pair
- **SQlite:** episodic interactions, entities, relationships, summaries
- **QDrant:** vector embeddings for semantic similarity search
When recallng memory, QDrant returns IDs and similarity scores, which are used to fetch full content from SQLite. Neither SQlite or QDrant work in isolation Memory is split between SQLite and Qdrant, which work together as a pair:
- **SQLite:** episodic interactions, entities, relationships, summaries
- **Qdrant:** vector embeddings for semantic similarity search
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
## Hardware Layout ## Hardware Layout
| Node | Address | Role |
|---|---|---| |---|---|---|
| Main PC | local | Primary inference (RTX A4000 16GB) | | Main PC | local | Primary inference (RTX A4000 16GB) |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant | | Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea | | Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
## Service Communication ## Service Communication
All services expose a REST HTTP api. The orchestration service is the single entgry-point. Clients dont talk directly to the memory or inference services
All services expose a REST HTTP API. The orchestration service is the single entry point —
clients do not talk directly to the memory or inference services.
```
Client Client
└─► Orchestration (:4000) └─► Orchestration (:4000)
├─► Memory Service (:3002) ├─► Memory Service (:3002)
─► Qdrant (:6333) ─► Qdrant (:6333)
│ └─► SQLite │ └─► SQLite
├─► Embedding Service (:3003) ├─► Embedding Service (:3003)
└─► Inference Service (:3001) │ └─► Ollama
└─► Ollama └─► Inference Service (:3001)
└─► Ollama
```
## Technology Choices ## Technology Choices
| Concern | Choice | Reason | | Concern | Choice | Reason |
|---|---|---| |---|---|---|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture | | Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |

View File

@@ -17,7 +17,7 @@ stores directly.
- `better-sqlite3` — SQLite driver - `better-sqlite3` — SQLite driver
- `@qdrant/js-client-rest` — Qdrant vector store client - `@qdrant/js-client-rest` — Qdrant vector store client
- `dotenv` — environment variable loading - `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities - `@nexusai/shared` — shared utilities and constants
## Environment Variables ## Environment Variables
@@ -28,18 +28,23 @@ stores directly.
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL | | QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
## Internal Structure ## Internal Structure
```
src/ src/
├── db/ ├── db/
│ ├── index.js # SQLite connection + initialization │ ├── index.js # SQLite connection + initialization
│ └── schema.js # Table definitions, indexes, FTS5, triggers │ └── schema.js # Table definitions, indexes, FTS5, triggers
├── episodic/ ├── episodic/
│ └── index.js # Session + episode CRUD │ └── index.js # Session + episode CRUD and FTS search
├── semantic/ # Qdrant vector operations (in progress) ├── semantic/
│ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/ # Entity + relationship CRUD (upcoming) ├── entities/ # Entity + relationship CRUD (upcoming)
└── index.js # Express app + route definitions └── index.js # Express app + route definitions
```
## SQLite Schema ## SQLite Schema
Four core tables:
Five core tables:
- **sessions** — top-level conversation containers, identified by an `external_id` - **sessions** — top-level conversation containers, identified by an `external_id`
- **episodes** — individual exchanges (user message + AI response) tied to a session - **episodes** — individual exchanges (user message + AI response) tied to a session
@@ -59,6 +64,42 @@ keep the FTS index automatically in sync with the episodes table.
- `foreign_keys = ON` — enforces referential integrity and cascade deletes - `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs are set via `db.pragma()` separately from `db.exec()` - PRAGMAs are set via `db.pragma()` separately from `db.exec()`
## Qdrant / Semantic Layer
Three collections are initialized on service startup (created if they don't already exist):
| Collection | Purpose |
|---|---|
| `episodes` | Embeddings for individual conversation exchanges |
| `entities` | Embeddings for named entities |
| `summaries` | Embeddings for condensed episode summaries |
All collections use **768-dimension vectors** with **Cosine similarity**, matching the
output of the `nomic-embed-text` embedding model via Ollama.
Vector dimension and distance metric are defined in `@nexusai/shared` constants
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
### Semantic Layer Operations
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
lookups back to the full content after a vector search
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
- **Delete** — removes a vector point by ID
The `wait: true` flag is used on all write operations so the caller receives confirmation
only after Qdrant has committed the change.
### Hybrid Retrieval Pattern
Qdrant and SQLite work as a pair — neither operates in isolation:
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
2. IDs are used to fetch full content from SQLite
3. Results are ranked and assembled into a context package
## Endpoints ## Endpoints
### Health ### Health
@@ -105,12 +146,4 @@ keep the FTS index automatically in sync with the episodes table.
} }
``` ```
> Semantic (Qdrant) and entity endpoints will be documented as they are built out. > Semantic (Qdrant) and entity REST endpoints will be documented as they are built out.
## Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
> Further endpoints will be documented as the service is built out.

View File

@@ -1,18 +1,67 @@
# Shared Package # Shared Package
**Package:** '@nexusai/shared' **Package:** `@nexusai/shared`
**Location:** 'packages/shared' **Location:** `packages/shared`
## Purpose ## Purpose
Common utilities and configuration used across all NexusAI services
Keeping these here avoids duplicating and ensure consistent behavior
# Exports Common utilities and configuration used across all NexusAI services.
Keeping these here avoids duplication and ensures consistent behaviour.
### 'getEnv(key, defaultValue?)' ## Exports
Loads an environment variable by key. If no default is provided and the variable is missing, throws at startup rather than failing later on.
```javascript ### `getEnv(key, defaultValue?)`
Loads an environment variable by key. If no default is provided and the
variable is missing, throws at startup rather than failing silently later.
```js
const { getEnv } = require('@nexusai/shared'); const { getEnv } = require('@nexusai/shared');
const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002 const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002
const DB = getEnv('SQLITE_PATH'); // required — throws if missing const DB = getEnv('SQLITE_PATH'); // required — throws if missing
``` ```
---
### Constants
Tuneable values and shared identifiers are centralised in `constants.js`
rather than hardcoded across services. Import the relevant group by name.
```js
const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
```
#### `QDRANT`
Vector store configuration. Values here must stay in sync with the
embedding model and Qdrant collection setup.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
#### `COLLECTIONS`
Canonical Qdrant collection names. Used by both the semantic layer and
any service that constructs Qdrant queries directly.
| Key | Value |
|---|---|
| `EPISODES` | `'episodes'` |
| `ENTITIES` | `'entities'` |
| `SUMMARIES` | `'summaries'` |
#### `EPISODIC`
Default pagination and result limits for SQLite episode queries.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |