update documentation
This commit is contained in:
BIN
.vs/slnx.sqlite
Normal file
BIN
.vs/slnx.sqlite
Normal file
Binary file not shown.
BIN
.vs/slnx.sqlite-journal
Normal file
BIN
.vs/slnx.sqlite-journal
Normal file
Binary file not shown.
@@ -1,13 +1,23 @@
|
|||||||
# NexusAI Documentation
|
# NexusAI Documentation
|
||||||
|
|
||||||
## Contents
|
## Architecture
|
||||||
|
|
||||||
- [Architecture Overview](architecture/overview.md)
|
- [Architecture Overview](architecture/overview.md)
|
||||||
- [Services](services/)
|
|
||||||
- [Shared Package](services/shared.md)
|
## Services
|
||||||
- [Memory Service](services/memory-service.md)
|
|
||||||
- [Embedding Service](services/embedding-service.md)
|
- [Shared Package](services/shared.md)
|
||||||
- [Inference Service](services/inference-service.md)
|
- [Memory Service](services/memory-service.md)
|
||||||
- [Orchestration Service](services/orchestration-service.md)
|
- [Embedding Service](services/embedding-service.md)
|
||||||
- [Chat Client](services/chat-client.md)
|
- [Inference Service](services/inference-service.md)
|
||||||
- [Deployment](deployment/homelab.md)
|
- [Orchestration Service](services/orchestration-service.md)
|
||||||
|
- [Chat Client](services/chat-client.md)
|
||||||
|
|
||||||
|
## Reference
|
||||||
|
|
||||||
|
- [API Routes](reference/api-routes.md) — all HTTP endpoints across all services
|
||||||
|
- [Memory Isolation](reference/memory-isolation.md) — project-scoped memory model
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
- [Homelab](deployment/homelab.md)
|
||||||
@@ -1,56 +1,80 @@
|
|||||||
# Architecture Overview
|
# Architecture Overview
|
||||||
|
|
||||||
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved.
|
NexusAI is a modular, memory-centric AI assistant designed for persistent,
|
||||||
|
context-aware conversations. It separates concerns across independent services
|
||||||
|
that can be evolved and deployed separately.
|
||||||
|
|
||||||
## Core Design Principles
|
## Core Design Principles
|
||||||
|
|
||||||
- **Decoupled layers:** memory, inference, and orchestration are independent of each other
|
- **Decoupled layers** — memory, inference, and orchestration are independent of each other
|
||||||
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
- **Hybrid retrieval** — semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
|
||||||
- **Home lab:** services are distributed across nodes according to available hardware and resources
|
- **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
|
||||||
|
- **Home lab first** — services are distributed across nodes according to available hardware
|
||||||
|
|
||||||
## Memory Model
|
## Memory Model
|
||||||
|
|
||||||
Memory is split between SQLite and Qdrant, which work together as a pair:
|
Memory is split between SQLite and Qdrant, which always work as a pair:
|
||||||
|
|
||||||
- **SQLite:** episodic interactions, entities, relationships, summaries
|
- **SQLite** — episodic interactions, entities, relationships, summaries, sessions, projects
|
||||||
- **Qdrant:** vector embeddings for semantic similarity search
|
- **Qdrant** — vector embeddings for semantic similarity search
|
||||||
|
|
||||||
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch
|
When recalling memory, Qdrant returns IDs and similarity scores, which are used
|
||||||
full content from SQLite. Neither SQLite nor Qdrant work in isolation.
|
to fetch full content from SQLite. Neither store works in isolation.
|
||||||
|
|
||||||
|
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
|
||||||
|
enabling per-session and per-project filtering at search time. See
|
||||||
|
`memory-isolation.md` for how project-scoped retrieval works.
|
||||||
|
|
||||||
## Hardware Layout
|
## Hardware Layout
|
||||||
|
|
||||||
| Node | Address | Role |
|
| Node | Address | Role |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Main PC | local | Primary inference (RTX A4000 16GB) |
|
| Main PC | 192.168.0.79 | Primary inference — RTX A4000 16GB |
|
||||||
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant |
|
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
|
||||||
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea |
|
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
|
||||||
|
|
||||||
## Service Communication
|
## Service Communication
|
||||||
|
|
||||||
All services expose a REST HTTP API. The orchestration service is the single entry point —
|
All services expose a REST HTTP API. The orchestration service is the single
|
||||||
clients do not talk directly to the memory or inference services.
|
entry point — clients never talk directly to memory or inference services.
|
||||||
|
|
||||||
```
|
```
|
||||||
Client
|
Client (browser)
|
||||||
└─► Orchestration (:4000)
|
└─► Caddy (HTTPS + Authelia SSO)
|
||||||
├─► Chat Client (static files, /srv/nexusai)
|
└─► Orchestration (:4000) — Mini PC 2
|
||||||
├─► Memory Service (:3002)
|
├─► Memory Service (:3002) — Mini PC 1
|
||||||
│ ├─► Qdrant (:6333)
|
│ ├─► SQLite (local file)
|
||||||
│ └─► SQLite
|
│ └─► Qdrant (:6333) — Mini PC 1
|
||||||
├─► Embedding Service (:3003)
|
├─► Embedding Service (:3003) — Mini PC 1
|
||||||
│ └─► Ollama
|
│ └─► Ollama (:11434) — Mini PC 1
|
||||||
└─► Inference Service (:3001)
|
├─► Inference Service (:3001) — Main PC
|
||||||
└─► Ollama
|
│ └─► llama-server (:8080) — Main PC
|
||||||
|
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Note: Orchestration queries Qdrant directly for semantic search (bypassing
|
||||||
|
the memory service) but always fetches full episode content from the memory
|
||||||
|
service by ID after the vector search.
|
||||||
|
|
||||||
## Technology Choices
|
## Technology Choices
|
||||||
|
|
||||||
| Concern | Choice | Reason |
|
| Concern | Choice | Reason |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture |
|
| Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
|
||||||
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
|
| Package management | npm workspaces | Monorepo with shared code, no publishing needed |
|
||||||
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
|
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
|
||||||
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user |
|
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
|
||||||
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too |
|
| LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
|
||||||
| Version control | Gitea (self-hosted) | Code stays on local network |
|
| Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
|
||||||
|
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
|
||||||
|
| Version control | Gitea (self-hosted) | Code stays on local network |
|
||||||
|
|
||||||
|
## Current State
|
||||||
|
|
||||||
|
The core four-service architecture is complete and operational. Key capabilities:
|
||||||
|
|
||||||
|
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
||||||
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||||
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||||
|
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||||
|
- **Chat client** — view-based UI with sidebar navigation, project views, session management
|
||||||
@@ -7,50 +7,73 @@ services appropriate for its hardware.
|
|||||||
|
|
||||||
## Mini PC 1 — 192.168.0.81
|
## Mini PC 1 — 192.168.0.81
|
||||||
|
|
||||||
Runs: Qdrant, Memory Service, Embedding Service
|
Runs: Qdrant, Memory Service, Embedding Service, Ollama
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
ssh username@192.168.0.81
|
ssh storme@192.168.0.81
|
||||||
cd ~/nexusai
|
|
||||||
docker compose -f docker-compose.mini1.yml up -d # Qdrant
|
docker compose -f docker-compose.mini1.yml up -d # Qdrant
|
||||||
npm run memory
|
npm run memory # port 3002
|
||||||
npm run embedding
|
npm run embedding # port 3003
|
||||||
|
ollama serve # port 11434 — must bind 0.0.0.0 (OLLAMA_HOST=0.0.0.0)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
> Ollama must be started with `OLLAMA_HOST=0.0.0.0` to accept connections
|
||||||
|
> from other services on the LAN. Without this, embedding requests from the
|
||||||
|
> memory service will be refused.
|
||||||
|
|
||||||
## Mini PC 2 — 192.168.0.205
|
## Mini PC 2 — 192.168.0.205
|
||||||
|
|
||||||
Runs: Gitea, Orchestration Service, Chat Client (via Caddy)
|
Runs: Orchestration Service, Chat Client (via Caddy), Gitea, Caddy, Authelia
|
||||||
```bash
|
|
||||||
ssh username@192.168.0.205
|
|
||||||
|
|
||||||
cd ~/gitea
|
```bash
|
||||||
docker compose up -d # Gitea
|
ssh storme@192.168.0.205
|
||||||
|
|
||||||
cd /opt/stacks/network
|
cd /opt/stacks/network
|
||||||
docker compose up -d # Caddy, Authelia, and other network services
|
docker compose up -d # Caddy, Authelia, and other network services
|
||||||
|
|
||||||
cd ~/nexusai
|
cd ~/nexusAI
|
||||||
npm run orchestration
|
npm run orchestration # port 4000
|
||||||
```
|
```
|
||||||
|
|
||||||
## Main PC
|
## Main PC — 192.168.0.79
|
||||||
|
|
||||||
Runs: Ollama, Inference Service
|
Runs: Inference Service, llama-server
|
||||||
```bash
|
|
||||||
ollama serve
|
```powershell
|
||||||
npm run inference
|
# Start llama-server first — inference service depends on it
|
||||||
|
.\llama-gpu\llama-server.exe `
|
||||||
|
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
|
||||||
|
-ngl 99 --reasoning off --host 0.0.0.0 --port 8080 -c 64000
|
||||||
|
|
||||||
|
# Then start inference service
|
||||||
|
npm run inference # port 3001
|
||||||
```
|
```
|
||||||
|
|
||||||
## Chat Client Deployment
|
## Chat Client Deployment
|
||||||
|
|
||||||
The chat client is a React + Vite app build to static files and served by Caddy on Mini PC 2 (Infrastructure node). It does not run as a Node process
|
The chat client is a React + Vite app built to static files and served by
|
||||||
|
Caddy on Mini PC 2. It does not run as a Node process.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# On dev machine or Mini PC 2 after git pull
|
# On Mini PC 2 after git pull
|
||||||
cd ~/nexusAI/packages/chat-client
|
cd ~/nexusAI/packages/chat-client
|
||||||
npm run build
|
|
||||||
|
# Set production URL before building
|
||||||
|
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com npm run build
|
||||||
|
|
||||||
# Output lands in packages/chat-client/dist/
|
# Output lands in packages/chat-client/dist/
|
||||||
# Caddy serves this directory directly via volume mount
|
# Caddy serves this directory directly via Docker volume mount
|
||||||
```
|
```
|
||||||
Caddy config (`/opt/docker/caddy/Caddyfile`):
|
|
||||||
|
> Do NOT set `VITE_ORCHESTRATION_URL` during local dev — Vite's proxy handles
|
||||||
|
> routing and setting the HTTPS domain will cause Authelia to intercept API
|
||||||
|
> requests, producing confusing JSON parse errors.
|
||||||
|
|
||||||
|
## Caddy Configuration
|
||||||
|
|
||||||
|
The Caddyfile on Mini PC 2 must include a handle block for each route prefix
|
||||||
|
the client needs to reach. Current required blocks for NexusAI:
|
||||||
|
|
||||||
```caddy
|
```caddy
|
||||||
nexus.jellystorm.com {
|
nexus.jellystorm.com {
|
||||||
import authelia
|
import authelia
|
||||||
@@ -63,6 +86,14 @@ nexus.jellystorm.com {
|
|||||||
reverse_proxy 192.168.0.205:4000
|
reverse_proxy 192.168.0.205:4000
|
||||||
}
|
}
|
||||||
|
|
||||||
|
handle /models* {
|
||||||
|
reverse_proxy 192.168.0.205:4000
|
||||||
|
}
|
||||||
|
|
||||||
|
handle /projects* {
|
||||||
|
reverse_proxy 192.168.0.205:4000
|
||||||
|
}
|
||||||
|
|
||||||
handle {
|
handle {
|
||||||
root * /srv/nexusai
|
root * /srv/nexusai
|
||||||
try_files {path} /index.html
|
try_files {path} /index.html
|
||||||
@@ -71,18 +102,45 @@ nexus.jellystorm.com {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
The Caddy container mounts the dist directory via Docker volume:
|
When adding new top-level routes to the orchestration service, add a matching
|
||||||
|
handle block here and reload Caddy:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
caddy reload --config /path/to/Caddyfile
|
||||||
|
```
|
||||||
|
|
||||||
|
The Caddy container mounts the `dist` directory via Docker volume:
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
- /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
|
- /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
|
||||||
```
|
```
|
||||||
|
|
||||||
> After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
|
> After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
|
||||||
> is required. Caddyfile-only changes only need `docker compose restart caddy`.
|
> is required. Caddyfile-only changes only need `caddy reload`.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Environment Files
|
## Environment Files
|
||||||
|
|
||||||
Each node needs a `.env` file in the relevant service package directory.
|
Each service needs a `.env` file in its package directory. These are not
|
||||||
These are not committed to git. See each service's documentation for
|
committed to git. See each service's documentation for required variables.
|
||||||
required variables.
|
|
||||||
|
| Service | Location | Key Variables |
|
||||||
|
|---|---|---|
|
||||||
|
| Memory | `packages/memory-service/.env` | `SQLITE_PATH`, `QDRANT_URL`, `EMBEDDING_SERVICE_URL` |
|
||||||
|
| Embedding | `packages/embedding-service/.env` | `OLLAMA_URL`, `EMBEDDING_MODEL` |
|
||||||
|
| Inference | `packages/inference-service/.env` | `INFERENCE_PROVIDER`, `INFERENCE_URL`, `DEFAULT_MODEL` |
|
||||||
|
| Orchestration | `packages/orchestration-service/src/.env` | `MEMORY_SERVICE_URL`, `EMBEDDING_SERVICE_URL`, `INFERENCE_SERVICE_URL`, `QDRANT_URL`, `MODELS_MANIFEST_PATH` |
|
||||||
|
| Chat client | `packages/chat-client/.env` | `VITE_ORCHESTRATION_URL` (production builds only) |
|
||||||
|
|
||||||
|
## Models Manifest
|
||||||
|
|
||||||
|
The models manifest (`models.json`) lives on the Main PC alongside the model
|
||||||
|
files, accessible to orchestration via an SMB mount at `/mnt/nexus-models`.
|
||||||
|
|
||||||
|
```json
|
||||||
|
[
|
||||||
|
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||||
|
]
|
||||||
|
```
|
||||||
|
|
||||||
|
`value` must exactly match the model name as reported by `llama-server`
|
||||||
|
(including `.gguf` extension). No service restart needed to pick up changes.
|
||||||
@@ -39,21 +39,21 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
|
|||||||
|------|--------|
|
|------|--------|
|
||||||
| GPU | NVIDIA RTX A4000 |
|
| GPU | NVIDIA RTX A4000 |
|
||||||
| Role | Primary AI inference node |
|
| Role | Primary AI inference node |
|
||||||
| Key Services | Ollama (inference) |
|
| Key Services | llama-server (llama.cpp), Inference Service |
|
||||||
|
|
||||||
### Mini PC 1 — Media Node (`192.168.0.81`)
|
### Mini PC 1 — Media Node (`192.168.0.81`)
|
||||||
| Spec | Detail |
|
| Spec | Detail |
|
||||||
|------|--------|
|
|------|--------|
|
||||||
| GPU | NVIDIA RTX 5050 |
|
| GPU | NVIDIA RTX 5050 |
|
||||||
| Role | Media services, embeddings, vector storage |
|
| Role | Media services, embeddings, vector storage |
|
||||||
| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding |
|
| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding, Ollama |
|
||||||
| Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) |
|
| Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) |
|
||||||
|
|
||||||
### Mini PC 2 — Infrastructure Node (`192.168.0.205`)
|
### Mini PC 2 — Infrastructure Node (`192.168.0.205`)
|
||||||
| Spec | Detail |
|
| Spec | Detail |
|
||||||
|------|--------|
|
|------|--------|
|
||||||
| Role | Network management, monitoring, auth, DNS, git |
|
| Role | Network management, monitoring, auth, DNS, git, NexusAI orchestration |
|
||||||
| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea |
|
| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea, NexusAI orchestration |
|
||||||
| Storage | NVMe (OS only) |
|
| Storage | NVMe (OS only) |
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -155,7 +155,8 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
|
|||||||
|
|
||||||
| Service | Notes |
|
| Service | Notes |
|
||||||
|---------|-------|
|
|---------|-------|
|
||||||
| Ollama | Runs LLM inference using the RTX A4000. Also serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
|
| llama-server (llama.cpp) | Primary LLM inference using the RTX A4000. Started manually before the inference service. Serves the OpenAI-compatible API on port 8080. |
|
||||||
|
| Ollama | Serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -234,7 +235,7 @@ Phase 1 focused on establishing a stable, secure, and observable foundation:
|
|||||||
- ✅ Self-hosted git (Gitea)
|
- ✅ Self-hosted git (Gitea)
|
||||||
- ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud)
|
- ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud)
|
||||||
- ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent)
|
- ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent)
|
||||||
- ✅ NexusAI foundation services running (Qdrant, Ollama)
|
- ✅ NexusAI foundation services running (Qdrant, Ollama, llama.cpp)
|
||||||
- ✅ Container management across nodes (Portainer + agent)
|
- ✅ Container management across nodes (Portainer + agent)
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -249,6 +250,6 @@ Phase 2 shifts focus to resilience, security hardening, and smart home integrati
|
|||||||
- **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies
|
- **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies
|
||||||
- **IP webcam integration** — Add camera feeds into the homelab ecosystem
|
- **IP webcam integration** — Add camera feeds into the homelab ecosystem
|
||||||
- **Home Assistant** — Integrate smart home automation and sensor data
|
- **Home Assistant** — Integrate smart home automation and sensor data
|
||||||
- **Continued NexusAI development** — Entities layer, embedding service, inference and orchestration buildout
|
- **Continued NexusAI development** — Entity extraction pipeline, summaries layer, SettingsView implementation
|
||||||
|
|
||||||
> This section will be expanded as Phase 2 planning matures.
|
> This section will be expanded as Phase 2 planning matures.
|
||||||
283
docs/services/API-routes.md
Normal file
283
docs/services/API-routes.md
Normal file
@@ -0,0 +1,283 @@
|
|||||||
|
# API Routes
|
||||||
|
|
||||||
|
All HTTP endpoints across NexusAI services. Clients communicate only with
|
||||||
|
the orchestration service (port 4000) — memory service routes are listed
|
||||||
|
here for reference and direct debugging use.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Orchestration Service — port 4000
|
||||||
|
|
||||||
|
### Health
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /health | Service health check |
|
||||||
|
|
||||||
|
### Chat
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /chat | Send a message, receive full response |
|
||||||
|
| POST | /chat/stream | Send a message, receive SSE token stream |
|
||||||
|
|
||||||
|
**POST /chat and POST /chat/stream — request body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sessionId": "your-session-uuid",
|
||||||
|
"message": "Hello, my name is Tim.",
|
||||||
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
|
"temperature": 0.7
|
||||||
|
}
|
||||||
|
```
|
||||||
|
`model` and `temperature` are optional.
|
||||||
|
|
||||||
|
**POST /chat — response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sessionId": "your-session-uuid",
|
||||||
|
"response": "Hello Tim! How can I help you today?",
|
||||||
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
|
"tokenCount": 87
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**POST /chat/stream — response (SSE):**
|
||||||
|
```
|
||||||
|
data: {"text":"Hello"}
|
||||||
|
data: {"text":" Tim"}
|
||||||
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":87}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Sessions
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /sessions | Paginated session list |
|
||||||
|
| GET | /sessions/:sessionId/history | Paginated episode history for a session |
|
||||||
|
| PATCH | /sessions/:sessionId | Update session name and/or project assignment |
|
||||||
|
| DELETE | /sessions/:sessionId | Delete session and all its episodes |
|
||||||
|
|
||||||
|
**GET /sessions — query params:**
|
||||||
|
|
||||||
|
| Param | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| limit | 20 | Sessions per page |
|
||||||
|
| offset | 0 | Pagination offset |
|
||||||
|
| projectId | — | Filter by project (integer ID) |
|
||||||
|
|
||||||
|
**PATCH /sessions/:sessionId — body:**
|
||||||
|
```json
|
||||||
|
{ "name": "My Session", "projectId": 3 }
|
||||||
|
```
|
||||||
|
Either `name` or `projectId` is required. Both can be sent together.
|
||||||
|
Returns the updated session object.
|
||||||
|
|
||||||
|
**GET /sessions/:sessionId/history — query params:**
|
||||||
|
|
||||||
|
| Param | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| limit | 20 | Episodes per page |
|
||||||
|
| offset | 0 | Pagination offset |
|
||||||
|
|
||||||
|
Returns `{ sessionId, episodes: [...] }`. Episodes ordered newest first.
|
||||||
|
|
||||||
|
### Projects
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /projects | Get all projects |
|
||||||
|
| POST | /projects | Create a new project |
|
||||||
|
| PATCH | /projects/:id | Update a project |
|
||||||
|
| DELETE | /projects/:id | Delete a project (nulls session assignments) |
|
||||||
|
|
||||||
|
**POST /projects — body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "My Project",
|
||||||
|
"description": "Optional description",
|
||||||
|
"colour": "#3d3a79",
|
||||||
|
"icon": null,
|
||||||
|
"isolated": 0
|
||||||
|
}
|
||||||
|
```
|
||||||
|
`name` is required. All other fields optional. `isolated` is `0` or `1`.
|
||||||
|
Returns `201` with the created project object.
|
||||||
|
|
||||||
|
**PATCH /projects/:id — body:** same fields as POST, all optional.
|
||||||
|
|
||||||
|
### Models
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /models | Available models from `models.json` manifest |
|
||||||
|
|
||||||
|
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Memory Service — port 3002
|
||||||
|
|
||||||
|
Direct access is for debugging only. All client traffic goes through
|
||||||
|
orchestration.
|
||||||
|
|
||||||
|
### Health
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /health | Service health check |
|
||||||
|
|
||||||
|
### Sessions
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /sessions | Create a new session |
|
||||||
|
| GET | /sessions | Paginated session list with optional projectId filter |
|
||||||
|
| GET | /sessions/:id | Get session by internal ID |
|
||||||
|
| GET | /sessions/by-external/:externalId | Get session by external ID |
|
||||||
|
| PATCH | /sessions/by-external/:externalId | Update session fields |
|
||||||
|
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes) |
|
||||||
|
|
||||||
|
> Route ordering: `by-external/:externalId` must be defined before `/:id`
|
||||||
|
> to prevent `by-external` being captured as an ID param.
|
||||||
|
|
||||||
|
**POST /sessions — body:**
|
||||||
|
```json
|
||||||
|
{ "externalId": "unique-uuid", "metadata": {} }
|
||||||
|
```
|
||||||
|
|
||||||
|
**PATCH /sessions/by-external/:externalId — body:**
|
||||||
|
```json
|
||||||
|
{ "name": "Session Name", "projectId": 3 }
|
||||||
|
```
|
||||||
|
Both fields are optional. Only provided fields are updated — other fields
|
||||||
|
are not touched.
|
||||||
|
|
||||||
|
### Episodes
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /episodes | Create episode + auto-embed into Qdrant |
|
||||||
|
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
|
||||||
|
| GET | /episodes/:id | Get episode by ID |
|
||||||
|
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
|
||||||
|
| DELETE | /episodes/:id | Delete an episode |
|
||||||
|
|
||||||
|
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
|
||||||
|
|
||||||
|
**POST /episodes — body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sessionId": 1,
|
||||||
|
"userMessage": "Hello",
|
||||||
|
"aiResponse": "Hi there!",
|
||||||
|
"tokenCount": 10
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Projects
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /projects | Create a new project |
|
||||||
|
| GET | /projects | Get all projects |
|
||||||
|
| GET | /projects/:id | Get project by ID |
|
||||||
|
| PATCH | /projects/:id | Update a project |
|
||||||
|
| DELETE | /projects/:id | Delete project + null session assignments |
|
||||||
|
|
||||||
|
Same request/response shape as orchestration `/projects` above.
|
||||||
|
|
||||||
|
### Entities
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /entities | Upsert entity (creates or updates by name + type) |
|
||||||
|
| GET | /entities/by-type/:type | All entities of a given type |
|
||||||
|
| GET | /entities/:id | Get entity by ID |
|
||||||
|
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
|
||||||
|
|
||||||
|
> Route ordering: `/entities/by-type/:type` must be before `/entities/:id`.
|
||||||
|
|
||||||
|
**POST /entities — body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "NexusAI",
|
||||||
|
"type": "project",
|
||||||
|
"notes": "My AI memory project",
|
||||||
|
"metadata": {}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Relationships
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /relationships | Upsert a relationship between two entities |
|
||||||
|
| GET | /entities/:id/relationships | All relationships for an entity |
|
||||||
|
| DELETE | /relationships | Delete a specific relationship |
|
||||||
|
|
||||||
|
**POST /relationships — body:**
|
||||||
|
```json
|
||||||
|
{ "fromId": 1, "toId": 2, "label": "uses", "metadata": {} }
|
||||||
|
```
|
||||||
|
|
||||||
|
**DELETE /relationships — body:**
|
||||||
|
```json
|
||||||
|
{ "fromId": 1, "toId": 2, "label": "uses" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Relationships are identified by the composite key `(fromId, toId, label)`.
|
||||||
|
Delete uses request body rather than URL params since this three-part key
|
||||||
|
is awkward to encode in a path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Embedding Service — port 3003
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /health | Service health check |
|
||||||
|
| POST | /embed | Embed a single text string |
|
||||||
|
| POST | /embed/batch | Embed an array of text strings |
|
||||||
|
|
||||||
|
**POST /embed — body:**
|
||||||
|
```json
|
||||||
|
{ "text": "Hello from NexusAI" }
|
||||||
|
```
|
||||||
|
|
||||||
|
**POST /embed — response:**
|
||||||
|
```json
|
||||||
|
{ "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Inference Service — port 3001
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /health | Health check — reports active provider and model |
|
||||||
|
| POST | /complete | Full completion — awaits entire response |
|
||||||
|
| POST | /complete/stream | Streaming completion via SSE |
|
||||||
|
|
||||||
|
**POST /complete — body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"prompt": "What is the capital of France?",
|
||||||
|
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
||||||
|
"temperature": 0.7,
|
||||||
|
"maxTokens": 1024
|
||||||
|
}
|
||||||
|
```
|
||||||
|
All fields except `prompt` are optional.
|
||||||
|
|
||||||
|
**POST /complete — response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"text": "The capital of France is Paris.",
|
||||||
|
"model": "gemma-4-26B...gguf",
|
||||||
|
"done": true,
|
||||||
|
"evalCount": 8,
|
||||||
|
"promptEvalCount": 41
|
||||||
|
}
|
||||||
|
```
|
||||||
128
docs/services/Memory-isolation.md
Normal file
128
docs/services/Memory-isolation.md
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
# Memory Isolation
|
||||||
|
|
||||||
|
NexusAI implements project-scoped memory — sessions belonging to the same
|
||||||
|
project can share semantic context, and isolated projects can be restricted
|
||||||
|
from drawing on memory outside the project. This document describes how the
|
||||||
|
system works end-to-end.
|
||||||
|
|
||||||
|
## Concepts
|
||||||
|
|
||||||
|
**Session** — a single conversation thread. Identified by `external_id`.
|
||||||
|
|
||||||
|
**Project** — a named grouping of sessions. Has an `isolated` flag (0 or 1).
|
||||||
|
|
||||||
|
**Semantic search** — at inference time, the user's message is embedded and
|
||||||
|
compared against past episodes in Qdrant to surface relevant context. The
|
||||||
|
scope of this search is controlled by the project context.
|
||||||
|
|
||||||
|
## Semantic Search Scope
|
||||||
|
|
||||||
|
| Session state | Semantic search scope |
|
||||||
|
|---|---|
|
||||||
|
| No project | Own session's episodes only |
|
||||||
|
| Assigned to a non-isolated project | All episodes across all sessions in the project |
|
||||||
|
| Assigned to an isolated project | All episodes within the project only |
|
||||||
|
| Removed from a project | Own session's episodes only (from that point) |
|
||||||
|
|
||||||
|
Sessions with no project assigned behave the same as they always have —
|
||||||
|
only their own past episodes are searched.
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
### Step 1 — Project context resolution (orchestration)
|
||||||
|
|
||||||
|
In `chat/index.js`, immediately after session resolution:
|
||||||
|
|
||||||
|
```js
|
||||||
|
let projectSessionIds = null;
|
||||||
|
if (session.project_id) {
|
||||||
|
const project = await memory.getProject(session.project_id);
|
||||||
|
if (project) {
|
||||||
|
const projectSessions = await memory.getProjectSessions(session.project_id);
|
||||||
|
projectSessionIds = projectSessions.map(s => s.id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If the session belongs to any project (isolated or not), `projectSessionIds`
|
||||||
|
is populated with the internal integer IDs of all sessions in that project.
|
||||||
|
|
||||||
|
For **non-isolated projects**, this expands the search to all project sessions.
|
||||||
|
For **isolated projects**, the same set is used but the intent is restriction
|
||||||
|
— since `projectSessionIds` only contains project sessions, no external
|
||||||
|
episodes can appear.
|
||||||
|
|
||||||
|
Both cases use the same code path — the `isolated` flag does not change the
|
||||||
|
query logic, only the conceptual meaning.
|
||||||
|
|
||||||
|
### Step 2 — Qdrant filter construction
|
||||||
|
|
||||||
|
In `services/qdrant.js`, `searchEpisodes` builds the filter:
|
||||||
|
|
||||||
|
```js
|
||||||
|
if (projectSessionIds) {
|
||||||
|
body.filter = {
|
||||||
|
should: projectSessionIds.map(id => ({
|
||||||
|
key: 'sessionId', match: { value: id }
|
||||||
|
}))
|
||||||
|
};
|
||||||
|
} else if (sessionId) {
|
||||||
|
body.filter = { must: [{ key: 'sessionId', match: { value: sessionId } }] };
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`should` is Qdrant's "match any of" operator — equivalent to SQL
|
||||||
|
`WHERE sessionId IN (...)`. When `projectSessionIds` is set, the single-session
|
||||||
|
filter is not used.
|
||||||
|
|
||||||
|
### Step 3 — Episode payloads
|
||||||
|
|
||||||
|
Every episode upserted into Qdrant carries `{ sessionId, createdAt }` in its
|
||||||
|
payload. `sessionId` here is the **internal integer ID** from SQLite. This
|
||||||
|
is what the Qdrant filter matches against.
|
||||||
|
|
||||||
|
This means the filter works correctly regardless of when episodes were created
|
||||||
|
or when a session was added to a project — the payload is immutable.
|
||||||
|
|
||||||
|
## Important Behaviours
|
||||||
|
|
||||||
|
**Pre-existing episodes are included immediately.** When a session is added
|
||||||
|
to a project and a new message is sent, Qdrant can match all of that session's
|
||||||
|
existing episodes since the filter only requires the `sessionId` to be in the
|
||||||
|
project's session list.
|
||||||
|
|
||||||
|
**Removing a session from a project takes effect immediately.** On the next
|
||||||
|
message, `getProjectSessions` will not include that session's ID, so its
|
||||||
|
episodes disappear from the semantic search scope.
|
||||||
|
|
||||||
|
**New sessions created from ProjectView are assigned after the first message.**
|
||||||
|
The `useChat` hook writes the `project_id` assignment via `updateSession` after
|
||||||
|
`onDone` fires. There is a brief window during the first message where the
|
||||||
|
session has no project assigned. The project is correctly applied from the
|
||||||
|
second message onward.
|
||||||
|
|
||||||
|
## Isolated vs Non-Isolated
|
||||||
|
|
||||||
|
The `isolated` flag is stored on the project but does not currently change the
|
||||||
|
query logic — both isolated and non-isolated projects result in a
|
||||||
|
`projectSessionIds` filter. The distinction is semantic and enforced by
|
||||||
|
the project's membership:
|
||||||
|
|
||||||
|
- **Non-isolated** — intentionally draws from all sessions in the project,
|
||||||
|
creating a shared memory pool for related conversations
|
||||||
|
- **Isolated** — by design contains only sessions explicitly added to it,
|
||||||
|
so the same filter naturally restricts context to project-only episodes
|
||||||
|
|
||||||
|
If cross-project contamination became a concern (e.g. a session accidentally
|
||||||
|
added to the wrong project), removing it from the project immediately restores
|
||||||
|
isolation.
|
||||||
|
|
||||||
|
## Qdrant Payload Structure
|
||||||
|
|
||||||
|
Episodes are stored with this payload:
|
||||||
|
```json
|
||||||
|
{ "sessionId": 42, "createdAt": 1776080188 }
|
||||||
|
```
|
||||||
|
|
||||||
|
`sessionId` is the SQLite `sessions.id` integer, not the `external_id` UUID.
|
||||||
|
This is important when building filters — always use internal IDs.
|
||||||
@@ -55,10 +55,6 @@ VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
|
|||||||
during local development, bypassing Caddy and Authelia entirely:
|
during local development, bypassing Caddy and Authelia entirely:
|
||||||
|
|
||||||
```js
|
```js
|
||||||
// vite.config.js
|
|
||||||
import { defineConfig } from 'vite';
|
|
||||||
import react from '@vitejs/plugin-react';
|
|
||||||
|
|
||||||
export default defineConfig({
|
export default defineConfig({
|
||||||
plugins: [react()],
|
plugins: [react()],
|
||||||
server: {
|
server: {
|
||||||
@@ -72,7 +68,8 @@ export default defineConfig({
|
|||||||
});
|
});
|
||||||
```
|
```
|
||||||
|
|
||||||
If new routes are added to the orchestration service, add them here too.
|
When adding new top-level routes to the orchestration service, add a matching
|
||||||
|
entry here too.
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
@@ -93,12 +90,13 @@ src/
|
|||||||
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
|
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
|
||||||
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar
|
||||||
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
|
||||||
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata
|
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata (slide-in)
|
||||||
│ ├── SessionModal.jsx # Modal for session rename and delete confirmation
|
│ ├── SessionModal.jsx # Modal for session rename, project assignment, delete
|
||||||
│ ├── ProjectModal.jsx # Modal for project create, edit, and delete confirmation
|
│ ├── ProjectModal.jsx # Modal for project create, edit, delete
|
||||||
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
|
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
|
||||||
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
|
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
|
||||||
│ └── SettingsView.jsx # Settings placeholder (sections: Appearance, Memory, Models, About)
|
│ ├── ProjectView.jsx # Individual project — session list, new chat button
|
||||||
|
│ └── SettingsView.jsx # Settings placeholder (Appearance, Memory, Models, About)
|
||||||
├── index.css # Global reset, CSS variables, utility classes
|
├── index.css # Global reset, CSS variables, utility classes
|
||||||
└── main.jsx # React entry point
|
└── main.jsx # React entry point
|
||||||
```
|
```
|
||||||
@@ -107,9 +105,9 @@ src/
|
|||||||
|
|
||||||
## Layout
|
## Layout
|
||||||
|
|
||||||
The app uses a view-based layout. `App.jsx` manages a `view` state
|
The app uses a view-based layout. `App.jsx` manages a `view` state string
|
||||||
(`'chat' | 'all-chats' | 'all-projects' | 'settings'`) that controls which
|
that controls which main panel is rendered. The left sidebar and right info
|
||||||
main panel is rendered. The left sidebar and right info panel are always present.
|
panel are persistent across all views.
|
||||||
|
|
||||||
```
|
```
|
||||||
┌──────────────────┬──────────────────────────────┐
|
┌──────────────────┬──────────────────────────────┐
|
||||||
@@ -117,9 +115,9 @@ main panel is rendered. The left sidebar and right info panel are always present
|
|||||||
│ (collapsible) │ │
|
│ (collapsible) │ │
|
||||||
│ │ chat → ChatWindow │
|
│ │ chat → ChatWindow │
|
||||||
│ + New Chat │ all-chats → AllChatsView │
|
│ + New Chat │ all-chats → AllChatsView │
|
||||||
│ ⊞ New Project │ all-projects → AllProjectsView│
|
│ ⊞ View Projects │ all-projects → AllProjectsView│
|
||||||
│ │ settings → SettingsView │
|
│ │ project → ProjectView │
|
||||||
│ PROJECTS ▾ │ │
|
│ PROJECTS ▾ │ settings → SettingsView │
|
||||||
│ [tile] [tile] │ │
|
│ [tile] [tile] │ │
|
||||||
│ All Projects → │ │
|
│ All Projects → │ │
|
||||||
│ │ │
|
│ │ │
|
||||||
@@ -132,10 +130,22 @@ main panel is rendered. The left sidebar and right info panel are always present
|
|||||||
└──────────────────┴──────────────────────────────┘
|
└──────────────────┴──────────────────────────────┘
|
||||||
```
|
```
|
||||||
|
|
||||||
The sidebar collapses to a 48px icon rail. The right info panel (`InfoPanel`)
|
The sidebar collapses to a 48px icon rail. The right `InfoPanel` slides in
|
||||||
slides in from the right over the main area using `transform: translateX()` —
|
from the right using `transform: translateX()` — hidden by default, toggled
|
||||||
it is hidden by default (`rightOpen` starts `false`) and toggled via a button
|
via the `⊹` button in the `ChatWindow` header.
|
||||||
in the `ChatWindow` header.
|
|
||||||
|
## View Routing
|
||||||
|
|
||||||
|
| View | Component | Trigger |
|
||||||
|
|---|---|---|
|
||||||
|
| `'chat'` | `ChatWindow` | Default; selecting a session; new chat |
|
||||||
|
| `'all-chats'` | `AllChatsView` | "All Chats →" or ☰ icon in collapsed rail |
|
||||||
|
| `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
|
||||||
|
| `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
|
||||||
|
| `'settings'` | `SettingsView` | Settings button or ⚙ icon |
|
||||||
|
|
||||||
|
`activeProject` state in `App.jsx` tracks which project `ProjectView` is
|
||||||
|
displaying. Set via `onSelectProject` before navigating to `'project'`.
|
||||||
|
|
||||||
## CSS Architecture
|
## CSS Architecture
|
||||||
|
|
||||||
@@ -181,91 +191,47 @@ rules, inline styles for dynamic prop-driven values.
|
|||||||
| `.label-upper` | Uppercase section label style |
|
| `.label-upper` | Uppercase section label style |
|
||||||
| `.truncate` | Text overflow ellipsis |
|
| `.truncate` | Text overflow ellipsis |
|
||||||
|
|
||||||
## API Layer
|
|
||||||
|
|
||||||
All orchestration calls are centralised in `src/api/orchestration.js`:
|
|
||||||
|
|
||||||
| Function | Method | Path | Description |
|
|
||||||
|---|---|---|---|
|
|
||||||
| `fetchSessions` | GET | /sessions | Load session list for sidebar |
|
|
||||||
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
|
|
||||||
| `sendMessage` | POST | /chat | Send message, await full response |
|
|
||||||
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
|
|
||||||
| `fetchModels` | GET | /models | Load available models from manifest |
|
|
||||||
| `renameSession` | PATCH | /sessions/:id | Rename a session |
|
|
||||||
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
|
|
||||||
| `fetchProjects` | GET | /projects | Load project list |
|
|
||||||
| `createProject` | POST | /projects | Create a new project |
|
|
||||||
| `updateProject` | PATCH | /projects/:id | Update a project |
|
|
||||||
| `deleteProject` | DELETE | /projects/:id | Delete a project |
|
|
||||||
|
|
||||||
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
|
|
||||||
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
|
|
||||||
|
|
||||||
## Streaming
|
## Streaming
|
||||||
|
|
||||||
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events:
|
Messages are sent via `POST /chat/stream`. Tokens arrive as SSE events and
|
||||||
|
are written into the active assistant bubble token by token via
|
||||||
|
`updateLastMessage`. The blinking cursor in `MessageBubble` is shown while
|
||||||
|
`message.streaming === true`.
|
||||||
|
|
||||||
```
|
`useChat` accepts an optional `projectId` parameter in `sendMessage`. After
|
||||||
data: {"text":"Hello"}
|
the first message completes in a new session, if `projectId` is set,
|
||||||
data: {"text":" Tim"}
|
`updateSession` is called to write the project assignment to the backend.
|
||||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
|
||||||
```
|
|
||||||
|
|
||||||
An empty assistant bubble is appended immediately when the stream opens, then
|
|
||||||
updated token by token using `updateLastMessage`. The blinking cursor in
|
|
||||||
`MessageBubble` is shown while `message.streaming === true` and disappears
|
|
||||||
when the done event is received. Model name and token count from the done
|
|
||||||
event are stored in `useChat` state and displayed in the InfoPanel.
|
|
||||||
|
|
||||||
## Dynamic Model Selector
|
|
||||||
|
|
||||||
Available models are fetched from `GET /models` on mount via the `useModels` hook.
|
|
||||||
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
|
|
||||||
with the server response on success. If the fetch fails, the fallback list is used
|
|
||||||
silently — a warning is logged to the console.
|
|
||||||
|
|
||||||
To add a model, update `models.json` on the main PC — no client rebuild needed.
|
|
||||||
|
|
||||||
`FALLBACK_MODELS` in `constants.js` should be kept in sync with `models.json`
|
|
||||||
as a reasonable last-resort list in case the endpoint is unreachable.
|
|
||||||
|
|
||||||
## Session Management
|
## Session Management
|
||||||
|
|
||||||
Sessions are identified by `external_id` — a UUID generated client-side via the
|
Sessions are identified by `external_id` — a UUID generated client-side via
|
||||||
`uuid` package. New sessions are created locally and auto-registered in the memory
|
the `uuid` package. New sessions are created locally and auto-registered in
|
||||||
service on the first message. The session list refreshes after each completed
|
the memory service on the first message. The session list refreshes after
|
||||||
response to surface newly created sessions.
|
each completed response to surface newly created sessions.
|
||||||
|
|
||||||
### Session Name Display
|
### Auto-naming
|
||||||
|
|
||||||
The chat header and session rows both display `session.name` if set, falling back
|
After the first exchange completes, orchestration fires a secondary inference
|
||||||
to `session.external_id` if no name has been assigned:
|
call with a short naming prompt (max 20 tokens, temperature 0.3). The result
|
||||||
|
is written back as `session.name`. The client fires a second `refreshSessions`
|
||||||
|
after a 3-second delay to pick up the name once written.
|
||||||
|
|
||||||
```js
|
Manually renamed sessions are never overwritten — the `!session.name` guard
|
||||||
activeSession.name || activeSession.external_id
|
in `chat/index.js` prevents this.
|
||||||
```
|
|
||||||
|
|
||||||
### Session Actions
|
### Session Actions
|
||||||
|
|
||||||
Session rows in the sidebar support rename and delete via two entry points:
|
Session rows support rename, project assignment, and delete via:
|
||||||
|
- **Hover** — reveals ✎ and ✕ icon buttons alongside the row
|
||||||
|
- **Right-click** — context menu with the same actions
|
||||||
|
|
||||||
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons alongside the row
|
`SessionModal` handles rename and project assignment together in `settings`
|
||||||
- **Right-click** — opens a context menu with the same actions
|
mode, and delete confirmation in `confirm-delete` mode.
|
||||||
|
|
||||||
Both trigger `SessionModal` — a shared modal component with two modes:
|
|
||||||
|
|
||||||
| Mode | Trigger | Behaviour |
|
|
||||||
|---|---|---|
|
|
||||||
| `settings` | Rename button / context menu rename | Shows name input, saves on Enter or Save button |
|
|
||||||
| `confirm-delete` | Delete button / context menu delete | Shows confirmation dialog, requires explicit Delete click |
|
|
||||||
|
|
||||||
Actions are disabled on unsaved (new) sessions that haven't had a first message sent yet.
|
|
||||||
|
|
||||||
### Active Session Clearing on Delete
|
### Active Session Clearing on Delete
|
||||||
|
|
||||||
When the deleted session is the currently active one, `App.jsx` detects the match
|
When the deleted session is the currently active one, `App.jsx` clears the
|
||||||
and calls `selectSession(null)` to clear the chat window before refreshing the list:
|
chat window before refreshing the list:
|
||||||
|
|
||||||
```js
|
```js
|
||||||
function handleSessionsChange(deletedSession) {
|
function handleSessionsChange(deletedSession) {
|
||||||
@@ -276,53 +242,23 @@ function handleSessionsChange(deletedSession) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Context Menu
|
### Key Patterns
|
||||||
|
|
||||||
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and
|
- Button nesting: action icons are siblings of row buttons, not children — HTML forbids `<button>` inside `<button>`
|
||||||
attaches a `window` click listener to dismiss on any outside click. Rendered
|
- Context menu rendered outside sidebar via React fragment to avoid `overflow: hidden` clipping
|
||||||
outside the sidebar div via a React fragment to avoid being clipped by
|
- `useContextMenu` dismisses on a `window` click listener
|
||||||
`overflow: hidden`.
|
- Dynamic `updateSession` SQL builds `SET` clause from only the fields passed — prevents accidental overwrites
|
||||||
|
|
||||||
### Button Nesting
|
|
||||||
|
|
||||||
Session row action icons (✎ ✕) are rendered as siblings of the session
|
|
||||||
`<button>`, not children — HTML does not allow `<button>` inside `<button>`.
|
|
||||||
The outer `<div>` owns hover state and context menu; the inner `<button>` handles
|
|
||||||
session selection; action icon buttons sit alongside it in the same flex row.
|
|
||||||
|
|
||||||
## Project Management
|
## Project Management
|
||||||
|
|
||||||
Projects are a first-class concept in the UI. The `useProjects` hook fetches
|
`useProjects` fetches the project list from `GET /projects` on mount and
|
||||||
the project list from `GET /projects` on mount and exposes a `refreshProjects`
|
exposes `refreshProjects` for keeping the sidebar in sync after mutations.
|
||||||
callback for keeping the sidebar in sync after mutations.
|
|
||||||
|
|
||||||
### Project Actions
|
`ProjectModal` handles create, edit, and delete confirmation. Fields: name
|
||||||
|
(required), description (optional), colour picker, isolated toggle.
|
||||||
|
|
||||||
Projects are managed from `AllProjectsView` via `ProjectModal`:
|
`ProjectView` shows the project's name, description, isolated badge (if set),
|
||||||
|
and a filtered session list. The "+ New Chat" button creates a new session,
|
||||||
|
navigates to `'chat'`, and writes the project assignment after the first message.
|
||||||
|
|
||||||
| Mode | Behaviour |
|
For memory isolation behaviour, see `memory-isolation.md`.
|
||||||
|---|---|
|
|
||||||
| `create` | Name (required), description (optional), colour picker |
|
|
||||||
| `edit` | Same fields as create, pre-populated |
|
|
||||||
| `confirm-delete` | Confirmation dialog — sessions in the project are not deleted |
|
|
||||||
|
|
||||||
The sidebar Projects section shows up to 6 project tiles as coloured badge buttons.
|
|
||||||
Clicking any tile navigates to `AllProjectsView`. The "All Projects →" link is
|
|
||||||
always shown below the tiles.
|
|
||||||
|
|
||||||
After any create, edit, or delete in `AllProjectsView`, `onProjectsChange` is called
|
|
||||||
to trigger `refreshProjects` in `App.jsx`, keeping the sidebar tiles in sync.
|
|
||||||
|
|
||||||
## View Routing
|
|
||||||
|
|
||||||
`App.jsx` manages a `view` state string that controls which main panel renders:
|
|
||||||
|
|
||||||
| View | Component | Trigger |
|
|
||||||
|---|---|---|
|
|
||||||
| `'chat'` | `ChatWindow` | Default; selecting a session from sidebar or AllChatsView |
|
|
||||||
| `'all-chats'` | `AllChatsView` | "All Chats →" link or ☰ icon in collapsed rail |
|
|
||||||
| `'all-projects'` | `AllProjectsView` | "All Projects →" link, ⊞ icon, or New Project button |
|
|
||||||
| `'settings'` | `SettingsView` | Settings button or ⚙ icon in collapsed rail |
|
|
||||||
|
|
||||||
`AllChatsView` navigates back to `'chat'` on session row click, passing the selected
|
|
||||||
session to `selectSession` so history loads immediately.
|
|
||||||
@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
|
|||||||
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
||||||
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
||||||
|
|
||||||
|
> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
|
||||||
|
> from other services.
|
||||||
|
|
||||||
## Model
|
## Model
|
||||||
|
|
||||||
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
|
**nomic-embed-text** via Ollama produces **768-dimension** vectors with
|
||||||
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
|
**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
|
||||||
|
|
||||||
If the embedding model is changed, the Qdrant collections must be reinitialized
|
If the embedding model is changed, the Qdrant collections must be reinitialized
|
||||||
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
|
with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
|
||||||
the single change required to keep everything consistent.
|
is the single change required to keep everything consistent.
|
||||||
|
|
||||||
## Ollama API
|
## Ollama API
|
||||||
|
|
||||||
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
|
Uses the `/api/embed` endpoint (Ollama v0.4+):
|
||||||
|
|
||||||
```json
|
```json
|
||||||
|
// Request
|
||||||
{ "model": "nomic-embed-text", "input": "text to embed" }
|
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||||
```
|
|
||||||
Response key is `embeddings[0]` — an array of 768 floats.
|
|
||||||
|
|
||||||
## Endpoints
|
// Response key
|
||||||
|
embeddings[0] // array of 768 floats
|
||||||
### Health
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /health | Service health check |
|
|
||||||
|
|
||||||
### Embed
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /embed | Embed a single text string |
|
|
||||||
| POST | /embed/batch | Embed an array of text strings |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**POST /embed**
|
|
||||||
|
|
||||||
Embeds a single text string and returns the vector.
|
|
||||||
|
|
||||||
Request body:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"text": "Hello from NexusAI"
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Response:
|
> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
|
||||||
```json
|
> returned `embedding` (singular). Use `/api/embed`, `input`, and
|
||||||
{
|
> `embeddings[0]` for Ollama v0.4+.
|
||||||
"embedding": [0.123, -0.456, ...],
|
|
||||||
"model": "nomic-embed-text",
|
|
||||||
"dimensions": 768
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
## Usage in NexusAI
|
||||||
|
|
||||||
**POST /embed/batch**
|
The embedding service is called in two places:
|
||||||
|
|
||||||
Embeds an array of strings sequentially and returns all vectors in the same order.
|
1. **Memory service** — after each episode is saved to SQLite, the combined
|
||||||
Ollama does not natively parallelize embeddings, so requests are processed one at a time.
|
`User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
|
||||||
|
This is fire-and-forget — failures are logged but don't affect the response.
|
||||||
|
|
||||||
Request body:
|
2. **Orchestration service** — the user's message is embedded at the start of
|
||||||
```json
|
the chat pipeline to perform semantic search against past episodes.
|
||||||
{
|
|
||||||
"texts": ["first sentence", "second sentence"]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Response:
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
```json
|
|
||||||
{
|
|
||||||
"embeddings": [[0.123, ...], [0.456, ...]],
|
|
||||||
"model": "nomic-embed-text",
|
|
||||||
"dimensions": 768,
|
|
||||||
"count": 2
|
|
||||||
}
|
|
||||||
```
|
|
||||||
@@ -24,20 +24,19 @@ to switch inference backends without changes to the rest of the system.
|
|||||||
| Variable | Required | Default | Description |
|
| Variable | Required | Default | Description |
|
||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| PORT | No | 3001 | Port to listen on |
|
| PORT | No | 3001 | Port to listen on |
|
||||||
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) |
|
| INFERENCE_PROVIDER | No | llamacpp | Active provider (`ollama` or `llamacpp`) |
|
||||||
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
|
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
|
||||||
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
|
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
|
||||||
|
|
||||||
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
|
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
|
||||||
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to
|
> service. The orchestration service uses `INFERENCE_SERVICE_URL` to reach
|
||||||
> reach this service on port 3001.
|
> this service on port 3001.
|
||||||
|
|
||||||
## Provider Architecture
|
## Provider Architecture
|
||||||
|
|
||||||
The inference service uses a provider pattern to abstract the underlying
|
The active provider is selected at startup via `INFERENCE_PROVIDER` and
|
||||||
LLM runtime. The active provider is selected at startup via `INFERENCE_PROVIDER`
|
loaded from `src/providers/`. Both providers expose identical function
|
||||||
and loaded from `src/providers/`. Both providers expose identical function
|
signatures.
|
||||||
signatures, so the rest of the service is unaware of which backend is active.
|
|
||||||
|
|
||||||
### Supported Providers
|
### Supported Providers
|
||||||
|
|
||||||
@@ -46,28 +45,36 @@ signatures, so the rest of the service is unaware of which backend is active.
|
|||||||
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
|
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
|
||||||
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
|
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
|
||||||
|
|
||||||
Switching providers requires only a `.env` change — no code modifications needed:
|
Switching providers requires only a `.env` change — no code modifications:
|
||||||
```
|
```
|
||||||
INFERENCE_PROVIDER=llamacpp
|
INFERENCE_PROVIDER=llamacpp
|
||||||
INFERENCE_URL=http://localhost:8080
|
INFERENCE_URL=http://localhost:8080
|
||||||
```
|
```
|
||||||
|
|
||||||
### Provider Validation
|
The provider loader throws immediately on an unknown value, preventing silent
|
||||||
|
misconfiguration.
|
||||||
|
|
||||||
|
## Internal Structure
|
||||||
|
|
||||||
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
|
|
||||||
if an unknown value is set — prevents silent misconfiguration:
|
|
||||||
```
|
```
|
||||||
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp
|
src/
|
||||||
|
├── providers/
|
||||||
|
│ ├── ollama.js # Ollama provider
|
||||||
|
│ └── llamacpp.js # llama.cpp provider (OpenAI-compatible REST)
|
||||||
|
├── routes/
|
||||||
|
│ └── inference.js # /complete and /complete/stream route handlers
|
||||||
|
├── infer.js # Provider loader — selects and re-exports active provider
|
||||||
|
└── index.js # Express app + route definitions
|
||||||
```
|
```
|
||||||
|
|
||||||
## llama.cpp Provider
|
## llama.cpp Provider
|
||||||
|
|
||||||
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`.
|
Uses the OpenAI-compatible REST API exposed by `llama-server`.
|
||||||
|
|
||||||
### Starting llama-server
|
### Starting llama-server
|
||||||
|
|
||||||
`llama-server` must be started manually on the main PC before the inference service
|
Must be started manually on the main PC before the inference service can
|
||||||
can handle requests. It loads a single model at startup:
|
handle requests:
|
||||||
|
|
||||||
```powershell
|
```powershell
|
||||||
.\llama-gpu\llama-server.exe `
|
.\llama-gpu\llama-server.exe `
|
||||||
@@ -79,40 +86,29 @@ can handle requests. It loads a single model at startup:
|
|||||||
-c 64000
|
-c 64000
|
||||||
```
|
```
|
||||||
|
|
||||||
Key flags:
|
|
||||||
|
|
||||||
| Flag | Description |
|
| Flag | Description |
|
||||||
|---|---|
|
|---|---|
|
||||||
| `-m` | Path to the `.gguf` model file |
|
|
||||||
| `-ngl 99` | Offload as many layers as possible to GPU |
|
| `-ngl 99` | Offload as many layers as possible to GPU |
|
||||||
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models |
|
| `--reasoning off` | Disables thinking delay on Gemma 4 models |
|
||||||
| `--host 0.0.0.0` | Allows connections from other machines on the LAN |
|
| `--host 0.0.0.0` | Allows LAN connections |
|
||||||
| `--port 8080` | Port for the llama-server HTTP API |
|
|
||||||
| `-c 64000` | Context window size in tokens |
|
| `-c 64000` | Context window size in tokens |
|
||||||
|
|
||||||
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds,
|
> `-c 64000` is intentionally large. NexusAI's memory architecture handles
|
||||||
> reduce this value. The NexusAI memory architecture handles context injection
|
> context injection so 6–8K is often sufficient if VRAM pressure builds.
|
||||||
> so a smaller window (6–8K) is often sufficient.
|
|
||||||
|
|
||||||
### Model Naming
|
### Model Naming
|
||||||
|
|
||||||
The model name sent in API requests must match the name as reported by
|
The model name in requests must match the name reported by `llama-server`
|
||||||
`llama-server` — including the `.gguf` extension. The reported name can be
|
including the `.gguf` extension:
|
||||||
verified with:
|
|
||||||
|
|
||||||
```powershell
|
```powershell
|
||||||
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
|
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
|
||||||
```
|
```
|
||||||
|
|
||||||
Set `DEFAULT_MODEL` in `.env` to the exact reported name:
|
Set `DEFAULT_MODEL` in `.env` to the exact reported name.
|
||||||
```
|
|
||||||
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
|
|
||||||
```
|
|
||||||
|
|
||||||
### Inference Parameters
|
### Inference Parameters
|
||||||
|
|
||||||
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
|
|
||||||
|
|
||||||
| NexusAI option | API field | Default |
|
| NexusAI option | API field | Default |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `temperature` | `temperature` | 0.7 |
|
| `temperature` | `temperature` | 0.7 |
|
||||||
@@ -122,18 +118,6 @@ The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
|
|||||||
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
| `repeatPenalty` | `repeat_penalty` | 1.1 |
|
||||||
| `seed` | `seed` | null (random) |
|
| `seed` | `seed` | null (random) |
|
||||||
|
|
||||||
## Internal Structure
|
|
||||||
```
|
|
||||||
src/
|
|
||||||
├── providers/
|
|
||||||
│ ├── ollama.js # Ollama provider — uses ollama npm package
|
|
||||||
│ └── llamacpp.js # llama.cpp provider — uses OpenAI-compatible REST API
|
|
||||||
├── routes/
|
|
||||||
│ └── inference.js # /complete and /complete/stream route handlers
|
|
||||||
├── infer.js # Provider loader — selects and re-exports active provider
|
|
||||||
└── index.js # Express app + route definitions
|
|
||||||
```
|
|
||||||
|
|
||||||
## Streaming Response Format
|
## Streaming Response Format
|
||||||
|
|
||||||
The llama.cpp provider yields chunks in this shape:
|
The llama.cpp provider yields chunks in this shape:
|
||||||
@@ -143,7 +127,7 @@ The llama.cpp provider yields chunks in this shape:
|
|||||||
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
|
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
|
||||||
```
|
```
|
||||||
|
|
||||||
The inference route re-emits these as SSE events:
|
The inference route re-emits as SSE:
|
||||||
```
|
```
|
||||||
data: {"response":"token text"}
|
data: {"response":"token text"}
|
||||||
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
|
data: {"done":true,"model":"model-name.gguf","tokenCount":42}
|
||||||
@@ -151,66 +135,6 @@ data: [DONE]
|
|||||||
```
|
```
|
||||||
|
|
||||||
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
|
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
|
||||||
chunk (`usage.completion_tokens`) and emitted on the done event so the
|
chunk and emitted on the done event.
|
||||||
orchestration layer can forward them to the client.
|
|
||||||
|
|
||||||
## Endpoints
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
|
|
||||||
### Health
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /health | Service health check — reports active provider and model |
|
|
||||||
|
|
||||||
### Inference
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /complete | Standard completion — returns full response when done |
|
|
||||||
| POST | /complete/stream | Streaming completion via Server-Sent Events |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**POST /complete**
|
|
||||||
|
|
||||||
Request body:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"prompt": "What is the capital of France?",
|
|
||||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
|
||||||
"temperature": 0.7,
|
|
||||||
"maxTokens": 1024
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
`model` is optional — falls back to `DEFAULT_MODEL` if omitted.
|
|
||||||
`maxTokens` is optional — defaults to 1024.
|
|
||||||
`temperature` is optional — defaults to 0.7.
|
|
||||||
|
|
||||||
Response:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"text": "The capital of France is Paris.",
|
|
||||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
|
||||||
"done": true,
|
|
||||||
"evalCount": 8,
|
|
||||||
"promptEvalCount": 41
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**POST /complete/stream**
|
|
||||||
|
|
||||||
Same request body as `/complete`.
|
|
||||||
|
|
||||||
Response is a stream of Server-Sent Events:
|
|
||||||
```
|
|
||||||
data: {"response":"The"}
|
|
||||||
data: {"response":" capital of France is Paris."}
|
|
||||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
|
|
||||||
data: [DONE]
|
|
||||||
```
|
|
||||||
|
|
||||||
Clients should accumulate `response` fields to build the full response string.
|
|
||||||
The `done` event carries `model` and `tokenCount` for display in the UI.
|
|
||||||
@@ -43,48 +43,34 @@ src/
|
|||||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||||
├── entities/
|
├── entities/
|
||||||
│ └── index.js # Entity + relationship CRUD
|
│ └── index.js # Entity + relationship CRUD
|
||||||
└── index.js # Express app + route definitions
|
└── index.js # Express app + all route definitions
|
||||||
```
|
```
|
||||||
|
|
||||||
## SQLite Schema
|
## SQLite Schema
|
||||||
|
|
||||||
Six core tables:
|
Six core tables:
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers, identified by an `external_id`, optional `name`, and optional `project_id`
|
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||||
- **entities** — named things the system learns about (people, places, concepts)
|
- **entities** — named things the system learns about (people, places, concepts)
|
||||||
- **relationships** — directional labeled links between entities
|
- **relationships** — directional labeled links between entities
|
||||||
- **summaries** — condensed episode groups for efficient context retrieval
|
- **summaries** — condensed episode groups for efficient context retrieval
|
||||||
- **projects** — named groupings of sessions with optional description, colour, and icon
|
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`
|
||||||
|
|
||||||
### Migrations
|
### Migrations
|
||||||
|
|
||||||
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied
|
Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
|
||||||
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore
|
idempotent migrations in `db/index.js` at startup:
|
||||||
already-applied changes:
|
|
||||||
|
|
||||||
```js
|
```js
|
||||||
try {
|
try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
|
||||||
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`);
|
try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
|
||||||
} catch {}
|
try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
||||||
try {
|
|
||||||
db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`);
|
|
||||||
} catch {}
|
|
||||||
|
|
||||||
try {
|
|
||||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`);
|
|
||||||
} catch {}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This pattern is idempotent — safe to run on every startup. New migrations should
|
New migrations are always appended here — never modify the schema file for
|
||||||
always be appended here rather than modifying the schema file, since `ALTER TABLE`
|
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||||
and index creation on existing tables cannot use `IF NOT EXISTS` guards in SQLite.
|
|
||||||
|
|
||||||
Current migrations:
|
|
||||||
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
|
|
||||||
- `ALTER TABLE sessions ADD COLUMN project_id INTEGER` — links sessions to projects
|
|
||||||
- `CREATE INDEX idx_sessions_project` — index on the new project_id column
|
|
||||||
|
|
||||||
### FTS5 Full-Text Search
|
### FTS5 Full-Text Search
|
||||||
|
|
||||||
@@ -96,11 +82,27 @@ keep the FTS index automatically in sync with the episodes table.
|
|||||||
|
|
||||||
- `journal_mode = WAL` — non-blocking reads during writes
|
- `journal_mode = WAL` — non-blocking reads during writes
|
||||||
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
- `foreign_keys = ON` — enforces referential integrity and cascade deletes
|
||||||
- PRAGMAs are set via `db.pragma()` separately from `db.exec()`
|
- PRAGMAs set via `db.pragma()`, not `db.exec()`
|
||||||
|
|
||||||
|
### Dynamic Session Updates
|
||||||
|
|
||||||
|
`updateSession` builds its `SET` clause dynamically from only the fields
|
||||||
|
passed — prevents partial updates from overwriting fields that weren't
|
||||||
|
touched:
|
||||||
|
|
||||||
|
```js
|
||||||
|
function updateSession(id, { name, projectId } = {}) {
|
||||||
|
const updates = [];
|
||||||
|
const values = [];
|
||||||
|
if (name !== undefined) { updates.push('name = ?'); values.push(name ?? null); }
|
||||||
|
if (projectId !== undefined) { updates.push('project_id = ?'); values.push(projectId ?? null); }
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## Qdrant / Semantic Layer
|
## Qdrant / Semantic Layer
|
||||||
|
|
||||||
Three collections are initialized on service startup (created if they don't already exist):
|
Three Qdrant collections are initialized on service startup:
|
||||||
|
|
||||||
| Collection | Purpose |
|
| Collection | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
@@ -108,208 +110,50 @@ Three collections are initialized on service startup (created if they don't alre
|
|||||||
| `entities` | Embeddings for named entities |
|
| `entities` | Embeddings for named entities |
|
||||||
| `summaries` | Embeddings for condensed episode summaries |
|
| `summaries` | Embeddings for condensed episode summaries |
|
||||||
|
|
||||||
All collections use **768-dimension vectors** with **Cosine similarity**, matching the
|
All collections use **768-dimension vectors** with **Cosine similarity**,
|
||||||
output of the `nomic-embed-text` embedding model via Ollama.
|
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
|
||||||
|
defined in `@nexusai/shared` — not hardcoded here.
|
||||||
|
|
||||||
Vector dimension and distance metric are defined in `@nexusai/shared` constants
|
Each collection exposes three operations in `src/semantic/index.js`:
|
||||||
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service.
|
upsert, search (with optional Qdrant filter), and delete. The `wait: true`
|
||||||
|
flag is used on all writes.
|
||||||
### Semantic Layer Operations
|
|
||||||
|
|
||||||
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
|
|
||||||
|
|
||||||
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
|
|
||||||
lookups back to the full content after a vector search
|
|
||||||
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
|
|
||||||
- **Delete** — removes a vector point by ID
|
|
||||||
|
|
||||||
The `wait: true` flag is used on all write operations so the caller receives confirmation
|
|
||||||
only after Qdrant has committed the change.
|
|
||||||
|
|
||||||
## Embedding Write Path
|
## Embedding Write Path
|
||||||
|
|
||||||
When a new episode is created, the memory service automatically generates and stores
|
When a new episode is created:
|
||||||
a vector embedding in Qdrant via the embedding service:
|
|
||||||
|
|
||||||
1. Episode is saved to SQLite synchronously — the response is returned immediately
|
1. Episode saved to SQLite synchronously — response returned immediately
|
||||||
2. Both sides of the exchange are combined into a single text:
|
2. User message + AI response combined: `User: ...\nAssistant: ...`
|
||||||
```
|
3. Text sent to embedding service (`POST /embed`)
|
||||||
User: {userMessage}
|
4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
|
||||||
Assistant: {aiResponse}
|
|
||||||
```
|
|
||||||
3. This text is sent to the embedding service (`POST /embed`)
|
|
||||||
4. The returned vector is upserted into the `episodes` Qdrant collection with a
|
|
||||||
payload of `{ sessionId, createdAt }` for filtering and lookups
|
|
||||||
|
|
||||||
The embedding step is **fire-and-forget** — it runs asynchronously after the SQLite
|
This step is **fire-and-forget** — if embedding fails, the episode is still
|
||||||
insert succeeds. If embedding fails, the episode is still saved and searchable via
|
saved and searchable via FTS. The error is logged but not surfaced.
|
||||||
FTS. The error is logged but does not affect the API response.
|
|
||||||
|
|
||||||
### Hybrid Retrieval Pattern
|
> The Qdrant payload stores `sessionId` (the internal integer ID). This is
|
||||||
|
> used for per-session and per-project filtering during semantic search. See
|
||||||
Qdrant and SQLite work as a pair — neither operates in isolation:
|
> `memory-isolation.md` for how project-level filtering works.
|
||||||
|
|
||||||
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
|
|
||||||
2. IDs are used to fetch full content from SQLite
|
|
||||||
3. Results are ranked and assembled into a context package
|
|
||||||
|
|
||||||
## Entity Layer
|
## Entity Layer
|
||||||
|
|
||||||
Entities and relationships are stored in SQLite with two key constraints:
|
Entities and relationships use upsert semantics with composite unique
|
||||||
|
constraints to prevent duplicates:
|
||||||
|
|
||||||
- `UNIQUE(name, type)` on entities — ensures no duplicates; upsert updates existing records
|
- `UNIQUE(name, type)` on entities
|
||||||
- `UNIQUE(from_id, to_id, label)` on relationships — prevents duplicate edges
|
- `UNIQUE(from_id, to_id, label)` on relationships
|
||||||
- `ON DELETE CASCADE` on both `from_id` and `to_id` — deleting an entity automatically
|
- `ON DELETE CASCADE` on relationship foreign keys
|
||||||
removes all relationships where it appears on either end
|
|
||||||
|
|
||||||
## Endpoints
|
## Project Delete Behaviour
|
||||||
|
|
||||||
### Health
|
Deleting a project runs as a transaction — it first nulls out `project_id`
|
||||||
|
on all assigned sessions, then deletes the project. This avoids a foreign
|
||||||
|
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
|
||||||
|
|
||||||
| Method | Path | Description |
|
```js
|
||||||
|---|---|---|
|
const doDelete = db.transaction(() => {
|
||||||
| GET | /health | Service health check |
|
db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
|
||||||
|
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
|
||||||
### Sessions
|
});
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /sessions | Create a new session |
|
|
||||||
| GET | /sessions | Get paginated list of all sessions |
|
|
||||||
| GET | /sessions/:id | Get session by internal ID |
|
|
||||||
| GET | /sessions/by-external/:externalId | Get session by external ID |
|
|
||||||
| PATCH | /sessions/by-external/:externalId | Update session name |
|
|
||||||
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
|
|
||||||
|
|
||||||
> Route ordering matters in Express: `by-external/:externalId` must be defined before
|
|
||||||
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
|
|
||||||
|
|
||||||
**POST /sessions body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"externalId": "unique-session-id",
|
|
||||||
"metadata": {}
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**PATCH /sessions/by-external/:externalId body:**
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "My Renamed Session"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Returns the updated session object. `name` is required and must be non-empty.
|
|
||||||
|
|
||||||
**DELETE /sessions/by-external/:externalId**
|
|
||||||
|
|
||||||
Returns `204 No Content` on success. Cascades to delete all associated episodes
|
|
||||||
and summaries via SQLite `ON DELETE CASCADE`.
|
|
||||||
|
|
||||||
### Episodes
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /episodes | Create episode + auto-embed into Qdrant |
|
|
||||||
| GET | /episodes/search?q=&limit= | Full-text search across episodes |
|
|
||||||
| GET | /episodes/:id | Get episode by ID |
|
|
||||||
| GET | /sessions/:id/episodes?limit=&offset= | Get paginated episodes for a session |
|
|
||||||
| DELETE | /episodes/:id | Delete an episode |
|
|
||||||
|
|
||||||
**POST /episodes body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"sessionId": 1,
|
|
||||||
"userMessage": "Hello",
|
|
||||||
"aiResponse": "Hi there!",
|
|
||||||
"tokenCount": 10,
|
|
||||||
"metadata": {}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
> Note: `/episodes/search` must be defined before `/episodes/:id` in Express to prevent
|
|
||||||
> the word `search` being captured as an ID parameter.
|
|
||||||
|
|
||||||
### Projects
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /projects | Create a new project |
|
|
||||||
| GET | /projects | Get all projects |
|
|
||||||
| GET | /projects/:id | Get project by ID |
|
|
||||||
| PATCH | /projects/:id | Update a project |
|
|
||||||
| DELETE | /projects/:id | Delete a project |
|
|
||||||
|
|
||||||
**POST /projects body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "My Project",
|
|
||||||
"description": "Optional description",
|
|
||||||
"colour": "#3d3a79",
|
|
||||||
"icon": null
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
`name` is required. `description`, `colour`, and `icon` are optional.
|
|
||||||
|
|
||||||
Returns `201` with the created project object on success.
|
|
||||||
|
|
||||||
**PATCH /projects/:id body:** same fields as POST, all optional.
|
|
||||||
|
|
||||||
**DELETE /projects/:id**
|
|
||||||
|
|
||||||
Returns `204 No Content`. Sessions assigned to the project are not deleted —
|
|
||||||
their `project_id` foreign key is left as-is (nullable, no cascade).
|
|
||||||
|
|
||||||
### Entities
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /entities | Upsert an entity (creates or updates by name + type) |
|
|
||||||
| GET | /entities/by-type/:type | Get all entities of a given type |
|
|
||||||
| GET | /entities/:id | Get entity by internal ID |
|
|
||||||
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
|
|
||||||
|
|
||||||
**POST /entities body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "NexusAI",
|
|
||||||
"type": "project",
|
|
||||||
"notes": "My AI memory project",
|
|
||||||
"metadata": {}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
> Note: `/entities/by-type/:type` must be defined before `/entities/:id` in Express to
|
|
||||||
> prevent `by-type` being captured as an ID parameter.
|
|
||||||
|
|
||||||
### Relationships
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /relationships | Upsert a relationship between two entities |
|
|
||||||
| GET | /entities/:id/relationships | Get all relationships originating from an entity |
|
|
||||||
| DELETE | /relationships | Delete a specific relationship |
|
|
||||||
|
|
||||||
**POST /relationships body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"fromId": 1,
|
|
||||||
"toId": 2,
|
|
||||||
"label": "uses",
|
|
||||||
"metadata": {}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**DELETE /relationships body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"fromId": 1,
|
|
||||||
"toId": 2,
|
|
||||||
"label": "uses"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
> Relationships are identified by the composite key `(fromId, toId, label)`. Delete uses
|
|
||||||
> the request body rather than URL params as this three-part key is awkward to express
|
|
||||||
> cleanly in a path.
|
|
||||||
@@ -39,56 +39,58 @@ src/
|
|||||||
│ ├── memory.js # HTTP client for memory service
|
│ ├── memory.js # HTTP client for memory service
|
||||||
│ ├── inference.js # HTTP client for inference service
|
│ ├── inference.js # HTTP client for inference service
|
||||||
│ ├── embedding.js # HTTP client for embedding service
|
│ ├── embedding.js # HTTP client for embedding service
|
||||||
│ └── qdrant.js # HTTP client for Qdrant vector search
|
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
|
||||||
├── chat/
|
├── chat/
|
||||||
│ └── index.js # Core pipeline logic — context assembly and coordination
|
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
||||||
├── routes/
|
├── routes/
|
||||||
│ ├── chat.js # POST /chat and POST /chat/stream route handlers
|
│ ├── chat.js # POST /chat and POST /chat/stream
|
||||||
│ ├── sessions.js # Session list, history, rename, and delete routes
|
│ ├── sessions.js # Session CRUD proxy
|
||||||
│ ├── projects.js # Project CRUD routes — proxies to memory service
|
│ ├── projects.js # Project CRUD proxy
|
||||||
│ └── models.js # GET /models — reads models.json manifest from disk
|
│ └── models.js # GET /models — reads models.json from disk
|
||||||
└── index.js # Express app entry point
|
└── index.js # Express app entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
The `services/` layer wraps all downstream HTTP calls in named functions,
|
The `services/` layer wraps all downstream HTTP calls in named functions.
|
||||||
keeping the pipeline logic in `chat/index.js` readable and ensuring that
|
|
||||||
URL or endpoint changes have a single place to be updated.
|
URL or endpoint changes have a single place to be updated.
|
||||||
|
|
||||||
## Chat Pipeline
|
## Chat Pipeline
|
||||||
|
|
||||||
Both `POST /chat` and `POST /chat/stream` share the same context assembly
|
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
||||||
steps. The only difference is how the inference response is delivered to
|
difference is how the inference response is delivered to the client.
|
||||||
the client.
|
|
||||||
|
|
||||||
1. **Session resolution** — looks up the session by `externalId` in the memory
|
### Steps
|
||||||
service. If not found, auto-creates a new session. Clients can generate a
|
|
||||||
UUID for new conversations and pass it directly — no pre-creation step needed.
|
|
||||||
|
|
||||||
2. **Recent episode retrieval** — fetches the most recent episodes for the session
|
1. **Session resolution** — look up session by `externalId`. Auto-create if
|
||||||
(default: 5) from the memory service.
|
not found. Clients generate a UUID for new conversations — no pre-creation
|
||||||
|
step needed.
|
||||||
|
|
||||||
3. **Semantic search** — embeds the user message via the embedding service, then
|
2. **Project context resolution** — if the session has a `project_id`, fetch
|
||||||
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75).
|
the project and all its session IDs. Used to scope semantic search. See
|
||||||
Results are deduplicated against the recent episode set using a `Set` of IDs.
|
`memory-isolation.md` for full behaviour.
|
||||||
Full episode content is fetched from the memory service by ID. This step is
|
|
||||||
non-critical — if it fails, a warning is logged and the pipeline continues with
|
3. **Recent episode retrieval** — fetch the most recent episodes for the
|
||||||
|
session (`RECENT_EPISODE_LIMIT`, default 5).
|
||||||
|
|
||||||
|
4. **Semantic search** — embed the user message, query Qdrant for the top-5
|
||||||
|
most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
|
||||||
|
recent episodes. Non-critical — if it fails, pipeline continues with
|
||||||
recency-only context.
|
recency-only context.
|
||||||
|
|
||||||
4. **Prompt assembly** — combines the system prompt, semantic episodes (if any),
|
5. **Prompt assembly** — combine system prompt, semantic episodes, recent
|
||||||
recent episodes, and the current user message into a single prompt string.
|
episodes, and user message.
|
||||||
|
|
||||||
5. **Inference** — sends the assembled prompt to the inference service. `/chat`
|
6. **Inference** — send to inference service. `/chat` awaits full response;
|
||||||
awaits the full response; `/chat/stream` opens an SSE connection and pipes
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
chunks to the client as they arrive.
|
|
||||||
|
|
||||||
6. **Episode write** — writes the new exchange (user message + AI response)
|
7. **Episode write** — write the exchange back to memory. Fire-and-forget
|
||||||
back to the memory service as a fire-and-forget operation. For streaming,
|
for `/chat`; awaited for `/chat/stream` to ensure the full text is
|
||||||
the full response text is accumulated across chunks before writing.
|
accumulated before saving.
|
||||||
|
|
||||||
7. **Response** — returns the AI response, model name, session ID, and token
|
8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
||||||
count to the client.
|
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
||||||
|
write the result back as `session.name`. Fully fire-and-forget.
|
||||||
|
|
||||||
## Prompt Structure
|
### Prompt Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
[System prompt]
|
[System prompt]
|
||||||
@@ -108,212 +110,67 @@ User: {current message}
|
|||||||
Assistant:
|
Assistant:
|
||||||
```
|
```
|
||||||
|
|
||||||
Semantic episodes appear before recent episodes so the model encounters
|
Semantic episodes appear before recent episodes so the model sees
|
||||||
long-range relevant context before the immediate conversation flow.
|
long-range context before the immediate conversation flow.
|
||||||
|
|
||||||
## SSE Stream Format
|
## SSE Stream Format
|
||||||
|
|
||||||
The inference service emits chunks from the llama.cpp provider in this format:
|
Inference service → orchestration:
|
||||||
```
|
```
|
||||||
data: {"response":"Hello","done":false}
|
data: {"response":"Hello","done":false}
|
||||||
data: {"response":"!","done":false}
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
||||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
|
||||||
data: [DONE]
|
data: [DONE]
|
||||||
```
|
```
|
||||||
|
|
||||||
The orchestration service re-emits to the client as:
|
Orchestration → client:
|
||||||
```
|
```
|
||||||
data: {"text":"Hello"}
|
data: {"text":"Hello"}
|
||||||
data: {"text":"!"}
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
||||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
The `[DONE]` sentinel from the inference service is consumed internally
|
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
|
||||||
and not forwarded. The client stream is terminated by `res.end()` after
|
is terminated by `res.end()` after the done event.
|
||||||
the done event. Model name and token count are included on the done event
|
|
||||||
so the client can display them in the UI.
|
|
||||||
|
|
||||||
## Models Manifest
|
## Models Manifest
|
||||||
|
|
||||||
The `/models` endpoint reads a `models.json` file from disk at the path
|
`GET /models` reads `models.json` fresh on each request from
|
||||||
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside
|
`MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
|
||||||
the model files, and is accessible to orchestration via a network share
|
accessible via an SMB mount at `/mnt/nexus-models`.
|
||||||
mounted at `/mnt/nexus-models`.
|
|
||||||
|
|
||||||
The manifest is read fresh on each request — no restart needed when models
|
|
||||||
are added or removed.
|
|
||||||
|
|
||||||
**models.json format:**
|
|
||||||
```json
|
```json
|
||||||
[
|
[
|
||||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
- `value` — must match the model name as reported by `llama-server` (including `.gguf` extension)
|
`value` must match the model name as reported by `llama-server` (including
|
||||||
- `label` — display name shown in the UI
|
`.gguf` extension). No service restart needed when models are added or removed.
|
||||||
|
|
||||||
## Endpoints
|
## Sessions Route Behaviour
|
||||||
|
|
||||||
### Health
|
`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
|
||||||
|
The validation guard only rejects requests where neither is provided:
|
||||||
|
|
||||||
| Method | Path | Description |
|
```js
|
||||||
|---|---|---|
|
if (!name?.trim() && projectId === undefined) {
|
||||||
| GET | /health | Service health check — reports downstream service URLs |
|
return res.status(400).json({ error: 'name or projectId is required' });
|
||||||
|
|
||||||
### Chat
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| POST | /chat | Send a message and receive a complete response |
|
|
||||||
| POST | /chat/stream | Send a message and receive a streaming SSE response |
|
|
||||||
|
|
||||||
### Sessions
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /sessions | Get paginated list of all sessions |
|
|
||||||
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
|
|
||||||
| PATCH | /sessions/:sessionId | Rename a session |
|
|
||||||
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
|
|
||||||
|
|
||||||
### Projects
|
|
||||||
|
|
||||||
Projects are proxied directly from the memory service with no transformation.
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /projects | Get all projects |
|
|
||||||
| POST | /projects | Create a new project |
|
|
||||||
| PATCH | /projects/:id | Update a project |
|
|
||||||
| DELETE | /projects/:id | Delete a project |
|
|
||||||
|
|
||||||
### Models
|
|
||||||
|
|
||||||
| Method | Path | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| GET | /models | Get list of available models from manifest file |
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**POST /chat**
|
|
||||||
|
|
||||||
Request body:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"sessionId": "your-session-uuid",
|
|
||||||
"message": "Hello, my name is Tim.",
|
|
||||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
|
||||||
"temperature": 0.7
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
`model` and `temperature` are optional — fall back to inference service defaults
|
This allows `useChat` to write project assignment separately from rename
|
||||||
if omitted.
|
operations.
|
||||||
|
|
||||||
Response:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"sessionId": "your-session-uuid",
|
|
||||||
"response": "Hello Tim! How can I help you today?",
|
|
||||||
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
|
|
||||||
"tokenCount": 87
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**POST /chat/stream**
|
|
||||||
|
|
||||||
Same request body as `POST /chat`.
|
|
||||||
|
|
||||||
Response is a stream of Server-Sent Events:
|
|
||||||
```
|
|
||||||
data: {"text":"Hello"}
|
|
||||||
data: {"text":" Tim"}
|
|
||||||
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**PATCH /sessions/:sessionId**
|
|
||||||
|
|
||||||
Request body:
|
|
||||||
```json
|
|
||||||
{ "name": "My Renamed Session" }
|
|
||||||
```
|
|
||||||
|
|
||||||
Returns the updated session object. `name` is required and trimmed of whitespace.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**DELETE /sessions/:sessionId**
|
|
||||||
|
|
||||||
Returns `204 No Content`. Cascades to delete all episodes for the session.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**GET /sessions/:sessionId/history**
|
|
||||||
|
|
||||||
Query parameters:
|
|
||||||
|
|
||||||
| Parameter | Default | Description |
|
|
||||||
|---|---|---|
|
|
||||||
| limit | 20 | Maximum number of episodes to return |
|
|
||||||
| offset | 0 | Number of episodes to skip (for pagination) |
|
|
||||||
|
|
||||||
Response:
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"sessionId": "your-session-uuid",
|
|
||||||
"episodes": [
|
|
||||||
{
|
|
||||||
"id": 42,
|
|
||||||
"session_id": 1,
|
|
||||||
"user_message": "Hello, my name is Tim.",
|
|
||||||
"ai_response": "Hello Tim! How can I help you today?",
|
|
||||||
"token_count": 87,
|
|
||||||
"created_at": 1712345678,
|
|
||||||
"metadata": null
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
Episodes are ordered newest first.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
**GET /models**
|
|
||||||
|
|
||||||
Returns the parsed contents of `models.json`:
|
|
||||||
```json
|
|
||||||
[
|
|
||||||
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
Returns `500` if the manifest file cannot be read or parsed.
|
|
||||||
|
|
||||||
## Caddy Configuration
|
## Caddy Configuration
|
||||||
|
|
||||||
The Caddy reverse proxy on Mini PC 2 must have a handle block for each route
|
Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
|
||||||
prefix the client needs to reach. Current required blocks:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
handle /chat* {
|
handle /chat* { reverse_proxy localhost:4000 }
|
||||||
reverse_proxy localhost:4000
|
handle /sessions* { reverse_proxy localhost:4000 }
|
||||||
}
|
handle /models* { reverse_proxy localhost:4000 }
|
||||||
handle /sessions* {
|
handle /projects* { reverse_proxy localhost:4000 }
|
||||||
reverse_proxy localhost:4000
|
|
||||||
}
|
|
||||||
handle /models* {
|
|
||||||
reverse_proxy localhost:4000
|
|
||||||
}
|
|
||||||
handle /projects* {
|
|
||||||
reverse_proxy localhost:4000
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
When adding new top-level routes to the orchestration service, add a matching
|
After updating: `caddy reload --config /path/to/Caddyfile`
|
||||||
block here and reload Caddy: `caddy reload --config /path/to/Caddyfile`
|
|
||||||
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
Reference in New Issue
Block a user