update documentation

This commit is contained in:
Storme-bit
2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions

BIN
.vs/slnx.sqlite Normal file

Binary file not shown.

BIN
.vs/slnx.sqlite-journal Normal file

Binary file not shown.

View File

@@ -1,13 +1,23 @@
# NexusAI Documentation
## Contents
## Architecture
- [Architecture Overview](architecture/overview.md)
- [Services](services/)
## Services
- [Shared Package](services/shared.md)
- [Memory Service](services/memory-service.md)
- [Embedding Service](services/embedding-service.md)
- [Inference Service](services/inference-service.md)
- [Orchestration Service](services/orchestration-service.md)
- [Chat Client](services/chat-client.md)
- [Deployment](deployment/homelab.md)
## Reference
- [API Routes](reference/api-routes.md) — all HTTP endpoints across all services
- [Memory Isolation](reference/memory-isolation.md) — project-scoped memory model
## Deployment
- [Homelab](deployment/homelab.md)

View File

@@ -1,56 +1,80 @@
# Architecture Overview
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
NexusAI is a modular, memory-centric AI assistant designed for persistent,
context-aware conversations. It separates concerns across independent services
that can be evolved and deployed separately.
## Core Design Principles
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources
- **Decoupled layers** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
- **Home lab first** — services are distributed across nodes according to available hardware
## Memory Model
Memory is split between SQLite and Qdrant, which work together as a pair:
Memory is split between SQLite and Qdrant, which always work as a pair:
- **SQLite:** episodic interactions, entities, relationships, summaries
- **Qdrant:** vector embeddings for semantic similarity search
- **SQLite** episodic interactions, entities, relationships, summaries, sessions, projects
- **Qdrant** vector embeddings for semantic similarity search
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
When recalling memory, Qdrant returns IDs and similarity scores, which are used
to fetch full content from SQLite. Neither store works in isolation.
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
enabling per-session and per-project filtering at search time. See
`memory-isolation.md` for how project-scoped retrieval works.
## Hardware Layout
| Node | Address | Role |
|---|---|---|
| Main PC | local | Primary inference (RTX A4000 16GB) |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
| Main PC | 192.168.0.79 | Primary inference RTX A4000 16GB |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
## Service Communication
All services expose a REST HTTP API. The orchestration service is the single entry point —
clients do not talk directly to the memory or inference services.
All services expose a REST HTTP API. The orchestration service is the single
entry point — clients never talk directly to memory or inference services.
```
Client
└─► Orchestration (:4000)
─► Chat Client (static files, /srv/nexusai)
├─► Memory Service (:3002)
│ ├─► Qdrant (:6333)
│ └─► SQLite
├─► Embedding Service (:3003)
│ └─► Ollama
─► Inference Service (:3001)
└─► Ollama
Client (browser)
└─► Caddy (HTTPS + Authelia SSO)
─► Orchestration (:4000) — Mini PC 2
├─► Memory Service (:3002) — Mini PC 1
│ ├─► SQLite (local file)
│ └─► Qdrant (:6333) — Mini PC 1
├─► Embedding Service (:3003) — Mini PC 1
│ └─► Ollama (:11434) — Mini PC 1
─► Inference Service (:3001) — Main PC
└─► llama-server (:8080) — Main PC
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
```
Note: Orchestration queries Qdrant directly for semantic search (bypassing
the memory service) but always fetches full episode content from the memory
service by ID after the vector search.
## Technology Choices
| Concern | Choice | Reason |
|---|---|---|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
| Version control | Gitea (self-hosted) | Code stays on local network |
## Current State
The core four-service architecture is complete and operational. Key capabilities:
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Projects** — sessions grouped with shared or isolated memory pools
- **Auto-naming** — sessions named automatically from first exchange via inference
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
- **Chat client** — view-based UI with sidebar navigation, project views, session management

View File

@@ -7,50 +7,73 @@ services appropriate for its hardware.
## Mini PC 1 — 192.168.0.81
Runs: Qdrant, Memory Service, Embedding Service
Runs: Qdrant, Memory Service, Embedding Service, Ollama
```bash
ssh username@192.168.0.81
cd ~/nexusai
ssh storme@192.168.0.81
docker compose -f docker-compose.mini1.yml up -d # Qdrant
npm run memory
npm run embedding
npm run memory # port 3002
npm run embedding # port 3003
ollama serve # port 11434 — must bind 0.0.0.0 (OLLAMA_HOST=0.0.0.0)
```
> Ollama must be started with `OLLAMA_HOST=0.0.0.0` to accept connections
> from other services on the LAN. Without this, embedding requests from the
> memory service will be refused.
## Mini PC 2 — 192.168.0.205
Runs: Gitea, Orchestration Service, Chat Client (via Caddy)
```bash
ssh username@192.168.0.205
Runs: Orchestration Service, Chat Client (via Caddy), Gitea, Caddy, Authelia
cd ~/gitea
docker compose up -d # Gitea
```bash
ssh storme@192.168.0.205
cd /opt/stacks/network
docker compose up -d # Caddy, Authelia, and other network services
cd ~/nexusai
npm run orchestration
cd ~/nexusAI
npm run orchestration # port 4000
```
## Main PC
## Main PC — 192.168.0.79
Runs: Ollama, Inference Service
```bash
ollama serve
npm run inference
Runs: Inference Service, llama-server
```powershell
# Start llama-server first — inference service depends on it
.\llama-gpu\llama-server.exe `
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
-ngl 99 --reasoning off --host 0.0.0.0 --port 8080 -c 64000
# Then start inference service
npm run inference # port 3001
```
## Chat Client Deployment
The chat client is a React + Vite app build to static files and served by Caddy on Mini PC 2 (Infrastructure node). It does not run as a Node process
The chat client is a React + Vite app built to static files and served by
Caddy on Mini PC 2. It does not run as a Node process.
```bash
# On dev machine or Mini PC 2 after git pull
# On Mini PC 2 after git pull
cd ~/nexusAI/packages/chat-client
npm run build
# Set production URL before building
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com npm run build
# Output lands in packages/chat-client/dist/
# Caddy serves this directory directly via volume mount
# Caddy serves this directory directly via Docker volume mount
```
Caddy config (`/opt/docker/caddy/Caddyfile`):
> Do NOT set `VITE_ORCHESTRATION_URL` during local dev — Vite's proxy handles
> routing and setting the HTTPS domain will cause Authelia to intercept API
> requests, producing confusing JSON parse errors.
## Caddy Configuration
The Caddyfile on Mini PC 2 must include a handle block for each route prefix
the client needs to reach. Current required blocks for NexusAI:
```caddy
nexus.jellystorm.com {
import authelia
@@ -63,6 +86,14 @@ nexus.jellystorm.com {
reverse_proxy 192.168.0.205:4000
}
handle /models* {
reverse_proxy 192.168.0.205:4000
}
handle /projects* {
reverse_proxy 192.168.0.205:4000
}
handle {
root * /srv/nexusai
try_files {path} /index.html
@@ -71,18 +102,45 @@ nexus.jellystorm.com {
}
```
The Caddy container mounts the dist directory via Docker volume:
When adding new top-level routes to the orchestration service, add a matching
handle block here and reload Caddy:
```bash
caddy reload --config /path/to/Caddyfile
```
The Caddy container mounts the `dist` directory via Docker volume:
```yaml
- /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
```
> After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
> is required. Caddyfile-only changes only need `docker compose restart caddy`.
> is required. Caddyfile-only changes only need `caddy reload`.
## Environment Files
Each node needs a `.env` file in the relevant service package directory.
These are not committed to git. See each service's documentation for
required variables.
Each service needs a `.env` file in its package directory. These are not
committed to git. See each service's documentation for required variables.
| Service | Location | Key Variables |
|---|---|---|
| Memory | `packages/memory-service/.env` | `SQLITE_PATH`, `QDRANT_URL`, `EMBEDDING_SERVICE_URL` |
| Embedding | `packages/embedding-service/.env` | `OLLAMA_URL`, `EMBEDDING_MODEL` |
| Inference | `packages/inference-service/.env` | `INFERENCE_PROVIDER`, `INFERENCE_URL`, `DEFAULT_MODEL` |
| Orchestration | `packages/orchestration-service/src/.env` | `MEMORY_SERVICE_URL`, `EMBEDDING_SERVICE_URL`, `INFERENCE_SERVICE_URL`, `QDRANT_URL`, `MODELS_MANIFEST_PATH` |
| Chat client | `packages/chat-client/.env` | `VITE_ORCHESTRATION_URL` (production builds only) |
## Models Manifest
The models manifest (`models.json`) lives on the Main PC alongside the model
files, accessible to orchestration via an SMB mount at `/mnt/nexus-models`.
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
`value` must exactly match the model name as reported by `llama-server`
(including `.gguf` extension). No service restart needed to pick up changes.

View File

@@ -39,21 +39,21 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
|------|--------|
| GPU | NVIDIA RTX A4000 |
| Role | Primary AI inference node |
| Key Services | Ollama (inference) |
| Key Services | llama-server (llama.cpp), Inference Service |
### Mini PC 1 — Media Node (`192.168.0.81`)
| Spec | Detail |
|------|--------|
| GPU | NVIDIA RTX 5050 |
| Role | Media services, embeddings, vector storage |
| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding |
| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding, Ollama |
| Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) |
### Mini PC 2 — Infrastructure Node (`192.168.0.205`)
| Spec | Detail |
|------|--------|
| Role | Network management, monitoring, auth, DNS, git |
| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea |
| Role | Network management, monitoring, auth, DNS, git, NexusAI orchestration |
| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea, NexusAI orchestration |
| Storage | NVMe (OS only) |
---
@@ -155,7 +155,8 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
| Service | Notes |
|---------|-------|
| Ollama | Runs LLM inference using the RTX A4000. Also serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
| llama-server (llama.cpp) | Primary LLM inference using the RTX A4000. Started manually before the inference service. Serves the OpenAI-compatible API on port 8080. |
| Ollama | Serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
---
@@ -234,7 +235,7 @@ Phase 1 focused on establishing a stable, secure, and observable foundation:
- ✅ Self-hosted git (Gitea)
- ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud)
- ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent)
- ✅ NexusAI foundation services running (Qdrant, Ollama)
- ✅ NexusAI foundation services running (Qdrant, Ollama, llama.cpp)
- ✅ Container management across nodes (Portainer + agent)
---
@@ -249,6 +250,6 @@ Phase 2 shifts focus to resilience, security hardening, and smart home integrati
- **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies
- **IP webcam integration** — Add camera feeds into the homelab ecosystem
- **Home Assistant** — Integrate smart home automation and sensor data
- **Continued NexusAI development** — Entities layer, embedding service, inference and orchestration buildout
- **Continued NexusAI development** — Entity extraction pipeline, summaries layer, SettingsView implementation
> This section will be expanded as Phase 2 planning matures.

283
docs/services/API-routes.md Normal file
View File

@@ -0,0 +1,283 @@
# API Routes
All HTTP endpoints across NexusAI services. Clients communicate only with
the orchestration service (port 4000) — memory service routes are listed
here for reference and direct debugging use.
---
## Orchestration Service — port 4000
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Chat
| Method | Path | Description |
|---|---|---|
| POST | /chat | Send a message, receive full response |
| POST | /chat/stream | Send a message, receive SSE token stream |
**POST /chat and POST /chat/stream — request body:**
```json
{
"sessionId": "your-session-uuid",
"message": "Hello, my name is Tim.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7
}
```
`model` and `temperature` are optional.
**POST /chat — response:**
```json
{
"sessionId": "your-session-uuid",
"response": "Hello Tim! How can I help you today?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"tokenCount": 87
}
```
**POST /chat/stream — response (SSE):**
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":87}
```
### Sessions
| Method | Path | Description |
|---|---|---|
| GET | /sessions | Paginated session list |
| GET | /sessions/:sessionId/history | Paginated episode history for a session |
| PATCH | /sessions/:sessionId | Update session name and/or project assignment |
| DELETE | /sessions/:sessionId | Delete session and all its episodes |
**GET /sessions — query params:**
| Param | Default | Description |
|---|---|---|
| limit | 20 | Sessions per page |
| offset | 0 | Pagination offset |
| projectId | — | Filter by project (integer ID) |
**PATCH /sessions/:sessionId — body:**
```json
{ "name": "My Session", "projectId": 3 }
```
Either `name` or `projectId` is required. Both can be sent together.
Returns the updated session object.
**GET /sessions/:sessionId/history — query params:**
| Param | Default | Description |
|---|---|---|
| limit | 20 | Episodes per page |
| offset | 0 | Pagination offset |
Returns `{ sessionId, episodes: [...] }`. Episodes ordered newest first.
### Projects
| Method | Path | Description |
|---|---|---|
| GET | /projects | Get all projects |
| POST | /projects | Create a new project |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project (nulls session assignments) |
**POST /projects — body:**
```json
{
"name": "My Project",
"description": "Optional description",
"colour": "#3d3a79",
"icon": null,
"isolated": 0
}
```
`name` is required. All other fields optional. `isolated` is `0` or `1`.
Returns `201` with the created project object.
**PATCH /projects/:id — body:** same fields as POST, all optional.
### Models
| Method | Path | Description |
|---|---|---|
| GET | /models | Available models from `models.json` manifest |
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
---
## Memory Service — port 3002
Direct access is for debugging only. All client traffic goes through
orchestration.
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Sessions
| Method | Path | Description |
|---|---|---|
| POST | /sessions | Create a new session |
| GET | /sessions | Paginated session list with optional projectId filter |
| GET | /sessions/:id | Get session by internal ID |
| GET | /sessions/by-external/:externalId | Get session by external ID |
| PATCH | /sessions/by-external/:externalId | Update session fields |
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes) |
> Route ordering: `by-external/:externalId` must be defined before `/:id`
> to prevent `by-external` being captured as an ID param.
**POST /sessions — body:**
```json
{ "externalId": "unique-uuid", "metadata": {} }
```
**PATCH /sessions/by-external/:externalId — body:**
```json
{ "name": "Session Name", "projectId": 3 }
```
Both fields are optional. Only provided fields are updated — other fields
are not touched.
### Episodes
| Method | Path | Description |
|---|---|---|
| POST | /episodes | Create episode + auto-embed into Qdrant |
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
| GET | /episodes/:id | Get episode by ID |
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
| DELETE | /episodes/:id | Delete an episode |
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
**POST /episodes — body:**
```json
{
"sessionId": 1,
"userMessage": "Hello",
"aiResponse": "Hi there!",
"tokenCount": 10
}
```
### Projects
| Method | Path | Description |
|---|---|---|
| POST | /projects | Create a new project |
| GET | /projects | Get all projects |
| GET | /projects/:id | Get project by ID |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete project + null session assignments |
Same request/response shape as orchestration `/projects` above.
### Entities
| Method | Path | Description |
|---|---|---|
| POST | /entities | Upsert entity (creates or updates by name + type) |
| GET | /entities/by-type/:type | All entities of a given type |
| GET | /entities/:id | Get entity by ID |
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
> Route ordering: `/entities/by-type/:type` must be before `/entities/:id`.
**POST /entities — body:**
```json
{
"name": "NexusAI",
"type": "project",
"notes": "My AI memory project",
"metadata": {}
}
```
### Relationships
| Method | Path | Description |
|---|---|---|
| POST | /relationships | Upsert a relationship between two entities |
| GET | /entities/:id/relationships | All relationships for an entity |
| DELETE | /relationships | Delete a specific relationship |
**POST /relationships — body:**
```json
{ "fromId": 1, "toId": 2, "label": "uses", "metadata": {} }
```
**DELETE /relationships — body:**
```json
{ "fromId": 1, "toId": 2, "label": "uses" }
```
Relationships are identified by the composite key `(fromId, toId, label)`.
Delete uses request body rather than URL params since this three-part key
is awkward to encode in a path.
---
## Embedding Service — port 3003
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |
**POST /embed — body:**
```json
{ "text": "Hello from NexusAI" }
```
**POST /embed — response:**
```json
{ "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 }
```
---
## Inference Service — port 3001
| Method | Path | Description |
|---|---|---|
| GET | /health | Health check — reports active provider and model |
| POST | /complete | Full completion — awaits entire response |
| POST | /complete/stream | Streaming completion via SSE |
**POST /complete — body:**
```json
{
"prompt": "What is the capital of France?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
}
```
All fields except `prompt` are optional.
**POST /complete — response:**
```json
{
"text": "The capital of France is Paris.",
"model": "gemma-4-26B...gguf",
"done": true,
"evalCount": 8,
"promptEvalCount": 41
}
```

View File

@@ -0,0 +1,128 @@
# Memory Isolation
NexusAI implements project-scoped memory — sessions belonging to the same
project can share semantic context, and isolated projects can be restricted
from drawing on memory outside the project. This document describes how the
system works end-to-end.
## Concepts
**Session** — a single conversation thread. Identified by `external_id`.
**Project** — a named grouping of sessions. Has an `isolated` flag (0 or 1).
**Semantic search** — at inference time, the user's message is embedded and
compared against past episodes in Qdrant to surface relevant context. The
scope of this search is controlled by the project context.
## Semantic Search Scope
| Session state | Semantic search scope |
|---|---|
| No project | Own session's episodes only |
| Assigned to a non-isolated project | All episodes across all sessions in the project |
| Assigned to an isolated project | All episodes within the project only |
| Removed from a project | Own session's episodes only (from that point) |
Sessions with no project assigned behave the same as they always have —
only their own past episodes are searched.
## How It Works
### Step 1 — Project context resolution (orchestration)
In `chat/index.js`, immediately after session resolution:
```js
let projectSessionIds = null;
if (session.project_id) {
const project = await memory.getProject(session.project_id);
if (project) {
const projectSessions = await memory.getProjectSessions(session.project_id);
projectSessionIds = projectSessions.map(s => s.id);
}
}
```
If the session belongs to any project (isolated or not), `projectSessionIds`
is populated with the internal integer IDs of all sessions in that project.
For **non-isolated projects**, this expands the search to all project sessions.
For **isolated projects**, the same set is used but the intent is restriction
— since `projectSessionIds` only contains project sessions, no external
episodes can appear.
Both cases use the same code path — the `isolated` flag does not change the
query logic, only the conceptual meaning.
### Step 2 — Qdrant filter construction
In `services/qdrant.js`, `searchEpisodes` builds the filter:
```js
if (projectSessionIds) {
body.filter = {
should: projectSessionIds.map(id => ({
key: 'sessionId', match: { value: id }
}))
};
} else if (sessionId) {
body.filter = { must: [{ key: 'sessionId', match: { value: sessionId } }] };
}
```
`should` is Qdrant's "match any of" operator — equivalent to SQL
`WHERE sessionId IN (...)`. When `projectSessionIds` is set, the single-session
filter is not used.
### Step 3 — Episode payloads
Every episode upserted into Qdrant carries `{ sessionId, createdAt }` in its
payload. `sessionId` here is the **internal integer ID** from SQLite. This
is what the Qdrant filter matches against.
This means the filter works correctly regardless of when episodes were created
or when a session was added to a project — the payload is immutable.
## Important Behaviours
**Pre-existing episodes are included immediately.** When a session is added
to a project and a new message is sent, Qdrant can match all of that session's
existing episodes since the filter only requires the `sessionId` to be in the
project's session list.
**Removing a session from a project takes effect immediately.** On the next
message, `getProjectSessions` will not include that session's ID, so its
episodes disappear from the semantic search scope.
**New sessions created from ProjectView are assigned after the first message.**
The `useChat` hook writes the `project_id` assignment via `updateSession` after
`onDone` fires. There is a brief window during the first message where the
session has no project assigned. The project is correctly applied from the
second message onward.
## Isolated vs Non-Isolated
The `isolated` flag is stored on the project but does not currently change the
query logic — both isolated and non-isolated projects result in a
`projectSessionIds` filter. The distinction is semantic and enforced by
the project's membership:
- **Non-isolated** — intentionally draws from all sessions in the project,
creating a shared memory pool for related conversations
- **Isolated** — by design contains only sessions explicitly added to it,
so the same filter naturally restricts context to project-only episodes
If cross-project contamination became a concern (e.g. a session accidentally
added to the wrong project), removing it from the project immediately restores
isolation.
## Qdrant Payload Structure
Episodes are stored with this payload:
```json
{ "sessionId": 42, "createdAt": 1776080188 }
```
`sessionId` is the SQLite `sessions.id` integer, not the `external_id` UUID.
This is important when building filters — always use internal IDs.

View File

@@ -55,10 +55,6 @@ VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
during local development, bypassing Caddy and Authelia entirely:
```js
// vite.config.js
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
export default defineConfig({
plugins: [react()],
server: {
@@ -72,7 +68,8 @@ export default defineConfig({
});
```
If new routes are added to the orchestration service, add them here too.
When adding new top-level routes to the orchestration service, add a matching
entry here too.
## Internal Structure
@@ -93,12 +90,13 @@ src/
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata
│ ├── SessionModal.jsx # Modal for session rename and delete confirmation
│ ├── ProjectModal.jsx # Modal for project create, edit, and delete confirmation
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata (slide-in)
│ ├── SessionModal.jsx # Modal for session rename, project assignment, delete
│ ├── ProjectModal.jsx # Modal for project create, edit, delete
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
── SettingsView.jsx # Settings placeholder (sections: Appearance, Memory, Models, About)
── ProjectView.jsx # Individual project — session list, new chat button
│ └── SettingsView.jsx # Settings placeholder (Appearance, Memory, Models, About)
├── index.css # Global reset, CSS variables, utility classes
└── main.jsx # React entry point
```
@@ -107,9 +105,9 @@ src/
## Layout
The app uses a view-based layout. `App.jsx` manages a `view` state
(`'chat' | 'all-chats' | 'all-projects' | 'settings'`) that controls which
main panel is rendered. The left sidebar and right info panel are always present.
The app uses a view-based layout. `App.jsx` manages a `view` state string
that controls which main panel is rendered. The left sidebar and right info
panel are persistent across all views.
```
┌──────────────────┬──────────────────────────────┐
@@ -117,9 +115,9 @@ main panel is rendered. The left sidebar and right info panel are always present
│ (collapsible) │ │
│ │ chat → ChatWindow │
│ + New Chat │ all-chats → AllChatsView │
│ ⊞ New Project │ all-projects → AllProjectsView│
│ │ settings → SettingsView │
│ PROJECTS ▾ │
│ ⊞ View Projects │ all-projects → AllProjectsView│
│ │ project → ProjectView
│ PROJECTS ▾ │ settings → SettingsView
│ [tile] [tile] │ │
│ All Projects → │ │
│ │ │
@@ -132,10 +130,22 @@ main panel is rendered. The left sidebar and right info panel are always present
└──────────────────┴──────────────────────────────┘
```
The sidebar collapses to a 48px icon rail. The right info panel (`InfoPanel`)
slides in from the right over the main area using `transform: translateX()`
it is hidden by default (`rightOpen` starts `false`) and toggled via a button
in the `ChatWindow` header.
The sidebar collapses to a 48px icon rail. The right `InfoPanel` slides in
from the right using `transform: translateX()` hidden by default, toggled
via the `⊹` button in the `ChatWindow` header.
## View Routing
| View | Component | Trigger |
|---|---|---|
| `'chat'` | `ChatWindow` | Default; selecting a session; new chat |
| `'all-chats'` | `AllChatsView` | "All Chats →" or ☰ icon in collapsed rail |
| `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
| `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
| `'settings'` | `SettingsView` | Settings button or ⚙ icon |
`activeProject` state in `App.jsx` tracks which project `ProjectView` is
displaying. Set via `onSelectProject` before navigating to `'project'`.
## CSS Architecture
@@ -181,91 +191,47 @@ rules, inline styles for dynamic prop-driven values.
| `.label-upper` | Uppercase section label style |
| `.truncate` | Text overflow ellipsis |
## API Layer
All orchestration calls are centralised in `src/api/orchestration.js`:
| Function | Method | Path | Description |
|---|---|---|---|
| `fetchSessions` | GET | /sessions | Load session list for sidebar |
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
| `sendMessage` | POST | /chat | Send message, await full response |
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
| `fetchModels` | GET | /models | Load available models from manifest |
| `renameSession` | PATCH | /sessions/:id | Rename a session |
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
| `fetchProjects` | GET | /projects | Load project list |
| `createProject` | POST | /projects | Create a new project |
| `updateProject` | PATCH | /projects/:id | Update a project |
| `deleteProject` | DELETE | /projects/:id | Delete a project |
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
## Streaming
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
Messages are sent via `POST /chat/stream`. Tokens arrive as SSE events and
are written into the active assistant bubble token by token via
`updateLastMessage`. The blinking cursor in `MessageBubble` is shown while
`message.streaming === true`.
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
An empty assistant bubble is appended immediately when the stream opens, then
updated token by token using `updateLastMessage`. The blinking cursor in
`MessageBubble` is shown while `message.streaming === true` and disappears
when the done event is received. Model name and token count from the done
event are stored in `useChat` state and displayed in the InfoPanel.
## Dynamic Model Selector
Available models are fetched from `GET /models` on mount via the `useModels` hook.
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
with the server response on success. If the fetch fails, the fallback list is used
silently — a warning is logged to the console.
To add a model, update `models.json` on the main PC — no client rebuild needed.
`FALLBACK_MODELS` in `constants.js` should be kept in sync with `models.json`
as a reasonable last-resort list in case the endpoint is unreachable.
`useChat` accepts an optional `projectId` parameter in `sendMessage`. After
the first message completes in a new session, if `projectId` is set,
`updateSession` is called to write the project assignment to the backend.
## Session Management
Sessions are identified by `external_id` — a UUID generated client-side via the
`uuid` package. New sessions are created locally and auto-registered in the memory
service on the first message. The session list refreshes after each completed
response to surface newly created sessions.
Sessions are identified by `external_id` — a UUID generated client-side via
the `uuid` package. New sessions are created locally and auto-registered in
the memory service on the first message. The session list refreshes after
each completed response to surface newly created sessions.
### Session Name Display
### Auto-naming
The chat header and session rows both display `session.name` if set, falling back
to `session.external_id` if no name has been assigned:
After the first exchange completes, orchestration fires a secondary inference
call with a short naming prompt (max 20 tokens, temperature 0.3). The result
is written back as `session.name`. The client fires a second `refreshSessions`
after a 3-second delay to pick up the name once written.
```js
activeSession.name || activeSession.external_id
```
Manually renamed sessions are never overwritten — the `!session.name` guard
in `chat/index.js` prevents this.
### Session Actions
Session rows in the sidebar support rename and delete via two entry points:
Session rows support rename, project assignment, and delete via:
- **Hover** — reveals ✎ and ✕ icon buttons alongside the row
- **Right-click** — context menu with the same actions
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons alongside the row
- **Right-click** — opens a context menu with the same actions
Both trigger `SessionModal` — a shared modal component with two modes:
| Mode | Trigger | Behaviour |
|---|---|---|
| `settings` | Rename button / context menu rename | Shows name input, saves on Enter or Save button |
| `confirm-delete` | Delete button / context menu delete | Shows confirmation dialog, requires explicit Delete click |
Actions are disabled on unsaved (new) sessions that haven't had a first message sent yet.
`SessionModal` handles rename and project assignment together in `settings`
mode, and delete confirmation in `confirm-delete` mode.
### Active Session Clearing on Delete
When the deleted session is the currently active one, `App.jsx` detects the match
and calls `selectSession(null)` to clear the chat window before refreshing the list:
When the deleted session is the currently active one, `App.jsx` clears the
chat window before refreshing the list:
```js
function handleSessionsChange(deletedSession) {
@@ -276,53 +242,23 @@ function handleSessionsChange(deletedSession) {
}
```
### Context Menu
### Key Patterns
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
attaches a `window` click listener to dismiss on any outside click. Rendered
outside the sidebar div via a React fragment to avoid being clipped by
`overflow: hidden`.
### Button Nesting
Session row action icons (✎ ✕) are rendered as siblings of the session
`<button>`, not children — HTML does not allow `<button>` inside `<button>`.
The outer `<div>` owns hover state and context menu; the inner `<button>` handles
session selection; action icon buttons sit alongside it in the same flex row.
- Button nesting: action icons are siblings of row buttons, not children — HTML forbids `<button>` inside `<button>`
- Context menu rendered outside sidebar via React fragment to avoid `overflow: hidden` clipping
- `useContextMenu` dismisses on a `window` click listener
- Dynamic `updateSession` SQL builds `SET` clause from only the fields passed — prevents accidental overwrites
## Project Management
Projects are a first-class concept in the UI. The `useProjects` hook fetches
the project list from `GET /projects` on mount and exposes a `refreshProjects`
callback for keeping the sidebar in sync after mutations.
`useProjects` fetches the project list from `GET /projects` on mount and
exposes `refreshProjects` for keeping the sidebar in sync after mutations.
### Project Actions
`ProjectModal` handles create, edit, and delete confirmation. Fields: name
(required), description (optional), colour picker, isolated toggle.
Projects are managed from `AllProjectsView` via `ProjectModal`:
`ProjectView` shows the project's name, description, isolated badge (if set),
and a filtered session list. The "+ New Chat" button creates a new session,
navigates to `'chat'`, and writes the project assignment after the first message.
| Mode | Behaviour |
|---|---|
| `create` | Name (required), description (optional), colour picker |
| `edit` | Same fields as create, pre-populated |
| `confirm-delete` | Confirmation dialog — sessions in the project are not deleted |
The sidebar Projects section shows up to 6 project tiles as coloured badge buttons.
Clicking any tile navigates to `AllProjectsView`. The "All Projects →" link is
always shown below the tiles.
After any create, edit, or delete in `AllProjectsView`, `onProjectsChange` is called
to trigger `refreshProjects` in `App.jsx`, keeping the sidebar tiles in sync.
## View Routing
`App.jsx` manages a `view` state string that controls which main panel renders:
| View | Component | Trigger |
|---|---|---|
| `'chat'` | `ChatWindow` | Default; selecting a session from sidebar or AllChatsView |
| `'all-chats'` | `AllChatsView` | "All Chats →" link or ☰ icon in collapsed rail |
| `'all-projects'` | `AllProjectsView` | "All Projects →" link, ⊞ icon, or New Project button |
| `'settings'` | `SettingsView` | Settings button or ⚙ icon in collapsed rail |
`AllChatsView` navigates back to `'chat'` on session row click, passing the selected
session to `selectSession` so history loads immediately.
For memory isolation behaviour, see `memory-isolation.md`.

View File

@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
> from other services.
## Model
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
**nomic-embed-text** via Ollama produces **768-dimension** vectors with
**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
If the embedding model is changed, the Qdrant collections must be reinitialized
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
the single change required to keep everything consistent.
with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
is the single change required to keep everything consistent.
## Ollama API
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
Uses the `/api/embed` endpoint (Ollama v0.4+):
```json
// Request
{ "model": "nomic-embed-text", "input": "text to embed" }
```
Response key is `embeddings[0]` — an array of 768 floats.
## Endpoints
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Embed
| Method | Path | Description |
|---|---|---|
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |
---
**POST /embed**
Embeds a single text string and returns the vector.
Request body:
```json
{
"text": "Hello from NexusAI"
}
// Response key
embeddings[0] // array of 768 floats
```
Response:
```json
{
"embedding": [0.123, -0.456, ...],
"model": "nomic-embed-text",
"dimensions": 768
}
```
> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
> returned `embedding` (singular). Use `/api/embed`, `input`, and
> `embeddings[0]` for Ollama v0.4+.
---
## Usage in NexusAI
**POST /embed/batch**
The embedding service is called in two places:
Embeds an array of strings sequentially and returns all vectors in the same order.
Ollama does not natively parallelize embeddings, so requests are processed one at a time.
1. **Memory service** — after each episode is saved to SQLite, the combined
`User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
This is fire-and-forget — failures are logged but don't affect the response.
Request body:
```json
{
"texts": ["first sentence", "second sentence"]
}
```
2. **Orchestration service** — the user's message is embedded at the start of
the chat pipeline to perform semantic search against past episodes.
Response:
```json
{
"embeddings": [[0.123, ...], [0.456, ...]],
"model": "nomic-embed-text",
"dimensions": 768,
"count": 2
}
```
For all HTTP endpoints, see `api-routes.md`.

View File

@@ -24,20 +24,19 @@ to switch inference backends without changes to the rest of the system.
| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3001 | Port to listen on |
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
| INFERENCE_PROVIDER | No | llamacpp | Active provider (`ollama` or `llamacpp`) |
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
> reach this service on port 3001.
> service. The orchestration service uses `INFERENCE_SERVICE_URL` to reach
> this service on port 3001.
## Provider Architecture
The inference service uses a provider pattern to abstract the underlying
LLM runtime. The active provider is selected at startup via `INFERENCE_PROVIDER`
and loaded from `src/providers/`. Both providers expose identical function
signatures, so the rest of the service is unaware of which backend is active.
The active provider is selected at startup via `INFERENCE_PROVIDER` and
loaded from `src/providers/`. Both providers expose identical function
signatures.
### Supported Providers
@@ -46,28 +45,36 @@ signatures, so the rest of the service is unaware of which backend is active.
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
Switching providers requires only a `.env` change — no code modifications needed:
Switching providers requires only a `.env` change — no code modifications:
```
INFERENCE_PROVIDER=llamacpp
INFERENCE_URL=http://localhost:8080
```
### Provider Validation
The provider loader throws immediately on an unknown value, preventing silent
misconfiguration.
## Internal Structure
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
if an unknown value is set — prevents silent misconfiguration:
```
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
src/
├── providers/
│ ├── ollama.js # Ollama provider
│ └── llamacpp.js # llama.cpp provider (OpenAI-compatible REST)
├── routes/
│ └── inference.js # /complete and /complete/stream route handlers
├── infer.js # Provider loader — selects and re-exports active provider
└── index.js # Express app + route definitions
```
## llama.cpp Provider
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
Uses the OpenAI-compatible REST API exposed by `llama-server`.
### Starting llama-server
`llama-server` must be started manually on the main PC before the inference service
can handle requests. It loads a single model at startup:
Must be started manually on the main PC before the inference service can
handle requests:
```powershell
.\llama-gpu\llama-server.exe `
@@ -79,40 +86,29 @@ can handle requests. It loads a single model at startup:
-c 64000
```
Key flags:
| Flag | Description |
|---|---|
| `-m` | Path to the `.gguf` model file |
| `-ngl 99` | Offload as many layers as possible to GPU |
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
| `--port 8080` | Port for the llama-server HTTP API |
| `--reasoning off` | Disables thinking delay on Gemma 4 models |
| `--host 0.0.0.0` | Allows LAN connections |
| `-c 64000` | Context window size in tokens |
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
> reduce this value. The NexusAI memory architecture handles context injection
> so a smaller window (68K) is often sufficient.
> `-c 64000` is intentionally large. NexusAI's memory architecture handles
> context injection so 68K is often sufficient if VRAM pressure builds.
### Model Naming
The model name sent in API requests must match the name as reported by
`llama-server`including the `.gguf` extension. The reported name can be
verified with:
The model name in requests must match the name reported by `llama-server`
including the `.gguf` extension:
```powershell
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
```
Set `DEFAULT_MODEL` in `.env` to the exact reported name:
```
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
```
Set `DEFAULT_MODEL` in `.env` to the exact reported name.
### Inference Parameters
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
| NexusAI option | API field | Default |
|---|---|---|
| `temperature` | `temperature` | 0.7 |
@@ -122,18 +118,6 @@ The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
| `repeatPenalty` | `repeat_penalty` | 1.1 |
| `seed` | `seed` | null (random) |
## Internal Structure
```
src/
├── providers/
│ ├── ollama.js # Ollama provider — uses ollama npm package
│ └── llamacpp.js # llama.cpp provider — uses OpenAI-compatible REST API
├── routes/
│ └── inference.js # /complete and /complete/stream route handlers
├── infer.js # Provider loader — selects and re-exports active provider
└── index.js # Express app + route definitions
```
## Streaming Response Format
The llama.cpp provider yields chunks in this shape:
@@ -143,7 +127,7 @@ The llama.cpp provider yields chunks in this shape:
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
```
The inference route re-emits these as SSE events:
The inference route re-emits as SSE:
```
data: {"response":"token text"}
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
@@ -151,66 +135,6 @@ data: [DONE]
```
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
chunk (`usage.completion_tokens`) and emitted on the done event so the
orchestration layer can forward them to the client.
chunk and emitted on the done event.
## Endpoints
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check — reports active provider and model |
### Inference
| Method | Path | Description |
|---|---|---|
| POST | /complete | Standard completion — returns full response when done |
| POST | /complete/stream | Streaming completion via Server-Sent Events |
---
**POST /complete**
Request body:
```json
{
"prompt": "What is the capital of France?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
}
```
`model` is optional — falls back to `DEFAULT_MODEL` if omitted.
`maxTokens` is optional — defaults to 1024.
`temperature` is optional — defaults to 0.7.
Response:
```json
{
"text": "The capital of France is Paris.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"done": true,
"evalCount": 8,
"promptEvalCount": 41
}
```
---
**POST /complete/stream**
Same request body as `/complete`.
Response is a stream of Server-Sent Events:
```
data: {"response":"The"}
data: {"response":" capital of France is Paris."}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
data: [DONE]
```
Clients should accumulate `response` fields to build the full response string.
The `done` event carries `model` and `tokenCount` for display in the UI.
For all HTTP endpoints, see `api-routes.md`.

View File

@@ -43,48 +43,34 @@ src/
│ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/
│ └── index.js # Entity + relationship CRUD
└── index.js # Express app + route definitions
└── index.js # Express app + all route definitions
```
## SQLite Schema
Six core tables:
- **sessions** — top-level conversation containers, identified by an `external_id`, optional `name`, and optional `project_id`
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with optional description, colour, and icon
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`
### Migrations
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
already-applied changes:
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
idempotent migrations in `db/index.js` at startup:
```js
try {
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
} catch {}
try {
db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`);
} catch {}
try {
db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`);
} catch {}
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
```
This pattern is idempotent — safe to run on every startup. New migrations should
always be appended here rather than modifying the schema file, since `ALTER TABLE`
and index creation on existing tables cannot use `IF NOT EXISTS` guards in SQLite.
Current migrations:
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
- `ALTER TABLE sessions ADD COLUMN project_id INTEGER` — links sessions to projects
- `CREATE INDEX idx_sessions_project` — index on the new project_id column
New migrations are always appended here — never modify the schema file for
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
### FTS5 Full-Text Search
@@ -96,11 +82,27 @@ keep the FTS index automatically in sync with the episodes table.
- `journal_mode = WAL` — non-blocking reads during writes
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
- PRAGMAs set via `db.pragma()`, not `db.exec()`
### Dynamic Session Updates
`updateSession` builds its `SET` clause dynamically from only the fields
passed — prevents partial updates from overwriting fields that weren't
touched:
```js
function updateSession(id, { name, projectId } = {}) {
const updates = [];
const values = [];
if (name !== undefined) { updates.push('name = ?'); values.push(name ?? null); }
if (projectId !== undefined) { updates.push('project_id = ?'); values.push(projectId ?? null); }
// ...
}
```
## Qdrant / Semantic Layer
Three collections are initialized on service startup (created if they don't already exist):
Three Qdrant collections are initialized on service startup:
| Collection | Purpose |
|---|---|
@@ -108,208 +110,50 @@ Three collections are initialized on service startup (created if they don't alre
| `entities` | Embeddings for named entities |
| `summaries` | Embeddings for condensed episode summaries |
All collections use **768-dimension vectors** with **Cosine similarity**, matching the
output of the `nomic-embed-text` embedding model via Ollama.
All collections use **768-dimension vectors** with **Cosine similarity**,
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
defined in `@nexusai/shared` — not hardcoded here.
Vector dimension and distance metric are defined in `@nexusai/shared` constants
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
### Semantic Layer Operations
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
lookups back to the full content after a vector search
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
- **Delete** — removes a vector point by ID
The `wait: true` flag is used on all write operations so the caller receives confirmation
only after Qdrant has committed the change.
Each collection exposes three operations in `src/semantic/index.js`:
upsert, search (with optional Qdrant filter), and delete. The `wait: true`
flag is used on all writes.
## Embedding Write Path
When a new episode is created, the memory service automatically generates and stores
a vector embedding in Qdrant via the embedding service:
When a new episode is created:
1. Episode is saved to SQLite synchronously — the response is returned immediately
2. Both sides of the exchange are combined into a single text:
```
User: {userMessage}
Assistant: {aiResponse}
```
3. This text is sent to the embedding service (`POST /embed`)
4. The returned vector is upserted into the `episodes` Qdrant collection with a
payload of `{ sessionId, createdAt }` for filtering and lookups
1. Episode saved to SQLite synchronously — response returned immediately
2. User message + AI response combined: `User: ...\nAssistant: ...`
3. Text sent to embedding service (`POST /embed`)
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
The embedding step is **fire-and-forget** — it runs asynchronously after the SQLite
insert succeeds. If embedding fails, the episode is still saved and searchable via
FTS. The error is logged but does not affect the API response.
This step is **fire-and-forget** — if embedding fails, the episode is still
saved and searchable via FTS. The error is logged but not surfaced.
### Hybrid Retrieval Pattern
Qdrant and SQLite work as a pair — neither operates in isolation:
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
2. IDs are used to fetch full content from SQLite
3. Results are ranked and assembled into a context package
> The Qdrant payload stores `sessionId` (the internal integer ID). This is
> used for per-session and per-project filtering during semantic search. See
> `memory-isolation.md` for how project-level filtering works.
## Entity Layer
Entities and relationships are stored in SQLite with two key constraints:
Entities and relationships use upsert semantics with composite unique
constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities — ensures no duplicates; upsert updates existing records
- `UNIQUE(from_id, to_id, label)` on relationships — prevents duplicate edges
- `ON DELETE CASCADE` on both `from_id` and `to_id` — deleting an entity automatically
removes all relationships where it appears on either end
- `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on relationship foreign keys
## Endpoints
## Project Delete Behaviour
### Health
Deleting a project runs as a transaction — it first nulls out `project_id`
on all assigned sessions, then deletes the project. This avoids a foreign
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Sessions
| Method | Path | Description |
|---|---|---|
| POST | /sessions | Create a new session |
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:id | Get session by internal ID |
| GET | /sessions/by-external/:externalId | Get session by external ID |
| PATCH | /sessions/by-external/:externalId | Update session name |
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
> Route ordering matters in Express: `by-external/:externalId` must be defined before
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
**POST /sessions body:**
```json
{
"externalId": "unique-session-id",
"metadata": {}
}
```js
const doDelete = db.transaction(() => {
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
});
```
**PATCH /sessions/by-external/:externalId body:**
```json
{
"name": "My Renamed Session"
}
```
Returns the updated session object. `name` is required and must be non-empty.
**DELETE /sessions/by-external/:externalId**
Returns `204 No Content` on success. Cascades to delete all associated episodes
and summaries via SQLite `ON DELETE CASCADE`.
### Episodes
| Method | Path | Description |
|---|---|---|
| POST | /episodes | Create episode + auto-embed into Qdrant |
| GET | /episodes/search?q=&limit= | Full-text search across episodes |
| GET | /episodes/:id | Get episode by ID |
| GET | /sessions/:id/episodes?limit=&offset= | Get paginated episodes for a session |
| DELETE | /episodes/:id | Delete an episode |
**POST /episodes body:**
```json
{
"sessionId": 1,
"userMessage": "Hello",
"aiResponse": "Hi there!",
"tokenCount": 10,
"metadata": {}
}
```
> Note: `/episodes/search` must be defined before `/episodes/:id` in Express to prevent
> the word `search` being captured as an ID parameter.
### Projects
| Method | Path | Description |
|---|---|---|
| POST | /projects | Create a new project |
| GET | /projects | Get all projects |
| GET | /projects/:id | Get project by ID |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project |
**POST /projects body:**
```json
{
"name": "My Project",
"description": "Optional description",
"colour": "#3d3a79",
"icon": null
}
```
`name` is required. `description`, `colour`, and `icon` are optional.
Returns `201` with the created project object on success.
**PATCH /projects/:id body:** same fields as POST, all optional.
**DELETE /projects/:id**
Returns `204 No Content`. Sessions assigned to the project are not deleted —
their `project_id` foreign key is left as-is (nullable, no cascade).
### Entities
| Method | Path | Description |
|---|---|---|
| POST | /entities | Upsert an entity (creates or updates by name + type) |
| GET | /entities/by-type/:type | Get all entities of a given type |
| GET | /entities/:id | Get entity by internal ID |
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
**POST /entities body:**
```json
{
"name": "NexusAI",
"type": "project",
"notes": "My AI memory project",
"metadata": {}
}
```
> Note: `/entities/by-type/:type` must be defined before `/entities/:id` in Express to
> prevent `by-type` being captured as an ID parameter.
### Relationships
| Method | Path | Description |
|---|---|---|
| POST | /relationships | Upsert a relationship between two entities |
| GET | /entities/:id/relationships | Get all relationships originating from an entity |
| DELETE | /relationships | Delete a specific relationship |
**POST /relationships body:**
```json
{
"fromId": 1,
"toId": 2,
"label": "uses",
"metadata": {}
}
```
**DELETE /relationships body:**
```json
{
"fromId": 1,
"toId": 2,
"label": "uses"
}
```
> Relationships are identified by the composite key `(fromId, toId, label)`. Delete uses
> the request body rather than URL params as this three-part key is awkward to express
> cleanly in a path.
For all HTTP endpoints, see `api-routes.md`.

View File

@@ -39,56 +39,58 @@ src/
│ ├── memory.js # HTTP client for memory service
│ ├── inference.js # HTTP client for inference service
│ ├── embedding.js # HTTP client for embedding service
│ └── qdrant.js # HTTP client for Qdrant vector search
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
├── chat/
│ └── index.js # Core pipeline logic — context assembly and coordination
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
├── routes/
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
│ ├── sessions.js # Session list, history, rename, and delete routes
│ ├── projects.js # Project CRUD routes — proxies to memory service
│ └── models.js # GET /models — reads models.json manifest from disk
│ ├── chat.js # POST /chat and POST /chat/stream
│ ├── sessions.js # Session CRUD proxy
│ ├── projects.js # Project CRUD proxy
│ └── models.js # GET /models — reads models.json from disk
└── index.js # Express app entry point
```
The `services/` layer wraps all downstream HTTP calls in named functions,
keeping the pipeline logic in `chat/index.js` readable and ensuring that
The `services/` layer wraps all downstream HTTP calls in named functions.
URL or endpoint changes have a single place to be updated.
## Chat Pipeline
Both `POST /chat` and `POST /chat/stream` share the same context assembly
steps. The only difference is how the inference response is delivered to
the client.
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
difference is how the inference response is delivered to the client.
1. **Session resolution** — looks up the session by `externalId` in the memory
service. If not found, auto-creates a new session. Clients can generate a
UUID for new conversations and pass it directly — no pre-creation step needed.
### Steps
2. **Recent episode retrieval** — fetches the most recent episodes for the session
(default: 5) from the memory service.
1. **Session resolution** — look up session by `externalId`. Auto-create if
not found. Clients generate a UUID for new conversations — no pre-creation
step needed.
3. **Semantic search**embeds the user message via the embedding service, then
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
Results are deduplicated against the recent episode set using a `Set` of IDs.
Full episode content is fetched from the memory service by ID. This step is
non-critical — if it fails, a warning is logged and the pipeline continues with
2. **Project context resolution**if the session has a `project_id`, fetch
the project and all its session IDs. Used to scope semantic search. See
`memory-isolation.md` for full behaviour.
3. **Recent episode retrieval** — fetch the most recent episodes for the
session (`RECENT_EPISODE_LIMIT`, default 5).
4. **Semantic search** — embed the user message, query Qdrant for the top-5
most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
recent episodes. Non-critical — if it fails, pipeline continues with
recency-only context.
4. **Prompt assembly** — combines the system prompt, semantic episodes (if any),
recent episodes, and the current user message into a single prompt string.
5. **Prompt assembly** — combine system prompt, semantic episodes, recent
episodes, and user message.
5. **Inference** — sends the assembled prompt to the inference service. `/chat`
awaits the full response; `/chat/stream` opens an SSE connection and pipes
chunks to the client as they arrive.
6. **Inference** — send to inference service. `/chat` awaits full response;
`/chat/stream` pipes SSE chunks to the client.
6. **Episode write** — writes the new exchange (user message + AI response)
back to the memory service as a fire-and-forget operation. For streaming,
the full response text is accumulated across chunks before writing.
7. **Episode write** — write the exchange back to memory. Fire-and-forget
for `/chat`; awaited for `/chat/stream` to ensure the full text is
accumulated before saving.
7. **Response** — returns the AI response, model name, session ID, and token
count to the client.
8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
write the result back as `session.name`. Fully fire-and-forget.
## Prompt Structure
### Prompt Structure
```
[System prompt]
@@ -108,212 +110,67 @@ User: {current message}
Assistant:
```
Semantic episodes appear before recent episodes so the model encounters
long-range relevant context before the immediate conversation flow.
Semantic episodes appear before recent episodes so the model sees
long-range context before the immediate conversation flow.
## SSE Stream Format
The inference service emits chunks from the llama.cpp provider in this format:
Inference service → orchestration:
```
data: {"response":"Hello","done":false}
data: {"response":"!","done":false}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
data: [DONE]
```
The orchestration service re-emits to the client as:
Orchestration client:
```
data: {"text":"Hello"}
data: {"text":"!"}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
```
The `[DONE]` sentinel from the inference service is consumed internally
and not forwarded. The client stream is terminated by `res.end()` after
the done event. Model name and token count are included on the done event
so the client can display them in the UI.
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
is terminated by `res.end()` after the done event.
## Models Manifest
The `/models` endpoint reads a `models.json` file from disk at the path
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
the model files, and is accessible to orchestration via a network share
mounted at `/mnt/nexus-models`.
`GET /models` reads `models.json` fresh on each request from
`MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
accessible via an SMB mount at `/mnt/nexus-models`.
The manifest is read fresh on each request — no restart needed when models
are added or removed.
**models.json format:**
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
- `value` must match the model name as reported by `llama-server` (including `.gguf` extension)
- `label` — display name shown in the UI
`value` must match the model name as reported by `llama-server` (including
`.gguf` extension). No service restart needed when models are added or removed.
## Endpoints
## Sessions Route Behaviour
### Health
`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
The validation guard only rejects requests where neither is provided:
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check — reports downstream service URLs |
### Chat
| Method | Path | Description |
|---|---|---|
| POST | /chat | Send a message and receive a complete response |
| POST | /chat/stream | Send a message and receive a streaming SSE response |
### Sessions
| Method | Path | Description |
|---|---|---|
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
| PATCH | /sessions/:sessionId | Rename a session |
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
### Projects
Projects are proxied directly from the memory service with no transformation.
| Method | Path | Description |
|---|---|---|
| GET | /projects | Get all projects |
| POST | /projects | Create a new project |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project |
### Models
| Method | Path | Description |
|---|---|---|
| GET | /models | Get list of available models from manifest file |
---
**POST /chat**
Request body:
```json
{
"sessionId": "your-session-uuid",
"message": "Hello, my name is Tim.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7
```js
if (!name?.trim() && projectId === undefined) {
return res.status(400).json({ error: 'name or projectId is required' });
}
```
`model` and `temperature` are optional — fall back to inference service defaults
if omitted.
Response:
```json
{
"sessionId": "your-session-uuid",
"response": "Hello Tim! How can I help you today?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"tokenCount": 87
}
```
---
**POST /chat/stream**
Same request body as `POST /chat`.
Response is a stream of Server-Sent Events:
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
---
**PATCH /sessions/:sessionId**
Request body:
```json
{ "name": "My Renamed Session" }
```
Returns the updated session object. `name` is required and trimmed of whitespace.
---
**DELETE /sessions/:sessionId**
Returns `204 No Content`. Cascades to delete all episodes for the session.
---
**GET /sessions/:sessionId/history**
Query parameters:
| Parameter | Default | Description |
|---|---|---|
| limit | 20 | Maximum number of episodes to return |
| offset | 0 | Number of episodes to skip (for pagination) |
Response:
```json
{
"sessionId": "your-session-uuid",
"episodes": [
{
"id": 42,
"session_id": 1,
"user_message": "Hello, my name is Tim.",
"ai_response": "Hello Tim! How can I help you today?",
"token_count": 87,
"created_at": 1712345678,
"metadata": null
}
]
}
```
Episodes are ordered newest first.
---
**GET /models**
Returns the parsed contents of `models.json`:
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
Returns `500` if the manifest file cannot be read or parsed.
This allows `useChat` to write project assignment separately from rename
operations.
## Caddy Configuration
The Caddy reverse proxy on Mini PC 2 must have a handle block for each route
prefix the client needs to reach. Current required blocks:
Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
```
handle /chat* {
reverse_proxy localhost:4000
}
handle /sessions* {
reverse_proxy localhost:4000
}
handle /models* {
reverse_proxy localhost:4000
}
handle /projects* {
reverse_proxy localhost:4000
}
handle /chat* { reverse_proxy localhost:4000 }
handle /sessions* { reverse_proxy localhost:4000 }
handle /models* { reverse_proxy localhost:4000 }
handle /projects* { reverse_proxy localhost:4000 }
```
When adding new top-level routes to the orchestration service, add a matching
block here and reload Caddy: `caddy reload --config /path/to/Caddyfile`
After updating: `caddy reload --config /path/to/Caddyfile`
For all HTTP endpoints, see `api-routes.md`.