updated documentation for semantic and constant refactor

2026-04-04 08:15:29 -07:00
parent bd600d9865
commit 7d3f083485
3 changed files with 132 additions and 38 deletions
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -1,38 +1,50 @@
 # Architecture Overview
-NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations.  It separates concerns across different services that can be independently deployed and evolved
+NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
 ## Core Design Principles
- **Decoupled layers:** memory, inference, orchestration independent of eachother
+
- **Hybrid retrieval:** semantic similarity (QDrant) combined with structured storage (SQLite) for flexible, ranked context assembly
+- **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Home lab:**  Services are properly distributed across the various nodes according to available hardware and resources
+- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
 - **Home lab:** services are distributed across nodes according to available hardware and resources
 ## Memory Model
 Memory is split between SQLite and QDrant, which both work together as a pair
 - **SQlite:** episodic interactions, entities, relationships, summaries
 - **QDrant:** vector embeddings for semantic similarity search
-When recallng memory, QDrant returns IDs and similarity scores, which are used to fetch full content from SQLite.  Neither SQlite or QDrant work in isolation
+Memory is split between SQLite and Qdrant, which work together as a pair:
 - **SQLite:** episodic interactions, entities, relationships, summaries
 - **Qdrant:** vector embeddings for semantic similarity search
 When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
 full content from SQLite. Neither SQLite nor Qdrant work in isolation.
 ## Hardware Layout
 | Node | Address | Role |
 |---|---|---|
 | Main PC | local | Primary inference (RTX A4000 16GB) |
 | Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
 | Mini PC 2 | 192.168.0.205 | Orchestration service, Gitea |
 ## Service Communication
 All services expose a REST HTTP api.  The orchestration service is the single entgry-point.  Clients dont talk directly to the memory or inference services
 All services expose a REST HTTP API. The orchestration service is the single entry point —
 clients do not talk directly to the memory or inference services.
 ```
 Client
 └─► Orchestration (:4000)
-├─► Memory Service (:3002)
+    ├─► Memory Service (:3002)
-│     └─► Qdrant (:6333)
+    │     ├─► Qdrant (:6333)
-│     └─► SQLite
+    │     └─► SQLite
-├─► Embedding Service (:3003)
+    ├─► Embedding Service (:3003)
-└─► Inference Service (:3001)
+    │     └─► Ollama
-└─► Ollama
+    └─► Inference Service (:3001)
          └─► Ollama
 ```
 ## Technology Choices
 | Concern | Choice | Reason |
 |---|---|---|
 | Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -17,7 +17,7 @@ stores directly.
 - `better-sqlite3` — SQLite driver
 - `@qdrant/js-client-rest` — Qdrant vector store client
 - `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities
+- `@nexusai/shared` — shared utilities and constants
 ## Environment Variables
@@ -28,18 +28,23 @@ stores directly.
 | QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
 ## Internal Structure
 ```
 src/
 ├── db/
 │   ├── index.js       # SQLite connection + initialization
 │   └── schema.js      # Table definitions, indexes, FTS5, triggers
 ├── episodic/
-│   └── index.js       # Session + episode CRUD
+│   └── index.js       # Session + episode CRUD and FTS search
-├── semantic/          # Qdrant vector operations (in progress)
+├── semantic/
 │   └── index.js       # Qdrant collection management, upsert, search, delete
 ├── entities/          # Entity + relationship CRUD (upcoming)
 └── index.js           # Express app + route definitions
 ```
 ## SQLite Schema
-Four core tables:
+
 Five core tables:
 - **sessions** — top-level conversation containers, identified by an `external_id`
 - **episodes** — individual exchanges (user message + AI response) tied to a session
@@ -59,6 +64,42 @@ keep the FTS index automatically in sync with the episodes table.
 - `foreign_keys = ON` — enforces referential integrity and cascade deletes
 - PRAGMAs are set via `db.pragma()` separately from `db.exec()`
 ## Qdrant / Semantic Layer
 Three collections are initialized on service startup (created if they don't already exist):
 | Collection | Purpose |
 |---|---|
 | `episodes` | Embeddings for individual conversation exchanges |
 | `entities` | Embeddings for named entities |
 | `summaries` | Embeddings for condensed episode summaries |
 All collections use **768-dimension vectors** with **Cosine similarity**, matching the
 output of the `nomic-embed-text` embedding model via Ollama.
 Vector dimension and distance metric are defined in `@nexusai/shared` constants
 (`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
 ### Semantic Layer Operations
 Each collection exposes three operations via helper functions in `src/semantic/index.js`:
 - **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
  lookups back to the full content after a vector search
 - **Search** — returns the top-k most similar vectors, with optional Qdrant filter
 - **Delete** — removes a vector point by ID
 The `wait: true` flag is used on all write operations so the caller receives confirmation
 only after Qdrant has committed the change.
 ### Hybrid Retrieval Pattern
 Qdrant and SQLite work as a pair — neither operates in isolation:
 1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
 2. IDs are used to fetch full content from SQLite
 3. Results are ranked and assembled into a context package
 ## Endpoints
 ### Health
@@ -105,12 +146,4 @@ keep the FTS index automatically in sync with the episodes table.
 }
 ```
-> Semantic (Qdrant) and entity endpoints will be documented as they are built out.
+> Semantic (Qdrant) and entity REST endpoints will be documented as they are built out.
 ## Endpoints
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |
 > Further endpoints will be documented as the service is built out.
--- a/docs/services/shared.md
+++ b/docs/services/shared.md
@@ -1,18 +1,67 @@
 # Shared Package
-**Package:** '@nexusai/shared'
+**Package:** `@nexusai/shared`  
-**Location:** 'packages/shared'
+**Location:** `packages/shared`
 ## Purpose
 Common utilities and configuration used across all NexusAI services
 Keeping these here avoids duplicating and ensure consistent behavior
-# Exports
+Common utilities and configuration used across all NexusAI services.
 Keeping these here avoids duplication and ensures consistent behaviour.
-### 'getEnv(key, defaultValue?)'
+## Exports
-Loads an environment variable by key.  If no default is provided and the variable is missing, throws at startup rather than failing later on.
+
-```javascript
+### `getEnv(key, defaultValue?)`
 Loads an environment variable by key. If no default is provided and the
 variable is missing, throws at startup rather than failing silently later.
 ```js
 const { getEnv } = require('@nexusai/shared');
 const PORT = getEnv('PORT', '3002');   // optional — falls back to 3002
 const DB   = getEnv('SQLITE_PATH');    // required — throws if missing
 ```
 ---
 ### Constants
 Tuneable values and shared identifiers are centralised in `constants.js`
 rather than hardcoded across services. Import the relevant group by name.
 ```js
 const { QDRANT, COLLECTIONS, EPISODIC } = require('@nexusai/shared');
 ```
 #### `QDRANT`
 Vector store configuration. Values here must stay in sync with the
 embedding model and Qdrant collection setup.
 | Key | Value | Description |
 |---|---|---|
 | `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL if `QDRANT_URL` env var is not set |
 | `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
 | `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
 | `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
 #### `COLLECTIONS`
 Canonical Qdrant collection names. Used by both the semantic layer and
 any service that constructs Qdrant queries directly.
 | Key | Value |
 |---|---|
 | `EPISODES` | `'episodes'` |
 | `ENTITIES` | `'entities'` |
 | `SUMMARIES` | `'summaries'` |
 #### `EPISODIC`
 Default pagination and result limits for SQLite episode queries.
 | Key | Value | Description |
 |---|---|---|
 | `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
 | `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
 | `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |