update documentation

This commit is contained in:
Storme-bit
2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions

BIN
.vs/slnx.sqlite Normal file

Binary file not shown.

BIN
.vs/slnx.sqlite-journal Normal file

Binary file not shown.

View File

@@ -1,13 +1,23 @@
# NexusAI Documentation # NexusAI Documentation
## Contents ## Architecture
- [Architecture Overview](architecture/overview.md) - [Architecture Overview](architecture/overview.md)
- [Services](services/)
- [Shared Package](services/shared.md) ## Services
- [Memory Service](services/memory-service.md)
- [Embedding Service](services/embedding-service.md) - [Shared Package](services/shared.md)
- [Inference Service](services/inference-service.md) - [Memory Service](services/memory-service.md)
- [Orchestration Service](services/orchestration-service.md) - [Embedding Service](services/embedding-service.md)
- [Chat Client](services/chat-client.md) - [Inference Service](services/inference-service.md)
- [Deployment](deployment/homelab.md) - [Orchestration Service](services/orchestration-service.md)
- [Chat Client](services/chat-client.md)
## Reference
- [API Routes](reference/api-routes.md) — all HTTP endpoints across all services
- [Memory Isolation](reference/memory-isolation.md) — project-scoped memory model
## Deployment
- [Homelab](deployment/homelab.md)

View File

@@ -1,56 +1,80 @@
# Architecture Overview # Architecture Overview
NexusAI is a modular, memory-centric AI system designed for persistent, context-aware conversations. It separates concerns across different services that can be independently deployed and evolved. NexusAI is a modular, memory-centric AI assistant designed for persistent,
context-aware conversations. It separates concerns across independent services
that can be evolved and deployed separately.
## Core Design Principles ## Core Design Principles
- **Decoupled layers:** memory, inference, and orchestration are independent of each other - **Decoupled layers** memory, inference, and orchestration are independent of each other
- **Hybrid retrieval:** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly - **Hybrid retrieval** semantic similarity (Qdrant) combined with structured storage (SQLite) for flexible, ranked context assembly
- **Home lab:** services are distributed across nodes according to available hardware and resources - **Project-scoped memory** — sessions can be grouped into projects with shared or isolated memory pools
- **Home lab first** — services are distributed across nodes according to available hardware
## Memory Model ## Memory Model
Memory is split between SQLite and Qdrant, which work together as a pair: Memory is split between SQLite and Qdrant, which always work as a pair:
- **SQLite:** episodic interactions, entities, relationships, summaries - **SQLite** episodic interactions, entities, relationships, summaries, sessions, projects
- **Qdrant:** vector embeddings for semantic similarity search - **Qdrant** vector embeddings for semantic similarity search
When recalling memory, Qdrant returns IDs and similarity scores, which are used to fetch When recalling memory, Qdrant returns IDs and similarity scores, which are used
full content from SQLite. Neither SQLite nor Qdrant work in isolation. to fetch full content from SQLite. Neither store works in isolation.
Episode embeddings carry a `{ sessionId, createdAt }` payload in Qdrant,
enabling per-session and per-project filtering at search time. See
`memory-isolation.md` for how project-scoped retrieval works.
## Hardware Layout ## Hardware Layout
| Node | Address | Role | | Node | Address | Role |
|---|---|---| |---|---|---|
| Main PC | local | Primary inference (RTX A4000 16GB) | | Main PC | 192.168.0.79 | Primary inference RTX A4000 16GB |
| Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant | | Mini PC 1 | 192.168.0.81 | Memory service, Embedding service, Qdrant, Ollama |
| Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Gitea | | Mini PC 2 | 192.168.0.205 | Orchestration service, Chat Client, Caddy, Authelia, Gitea |
## Service Communication ## Service Communication
All services expose a REST HTTP API. The orchestration service is the single entry point — All services expose a REST HTTP API. The orchestration service is the single
clients do not talk directly to the memory or inference services. entry point — clients never talk directly to memory or inference services.
``` ```
Client Client (browser)
└─► Orchestration (:4000) └─► Caddy (HTTPS + Authelia SSO)
─► Chat Client (static files, /srv/nexusai) ─► Orchestration (:4000) — Mini PC 2
├─► Memory Service (:3002) ├─► Memory Service (:3002) — Mini PC 1
│ ├─► Qdrant (:6333) │ ├─► SQLite (local file)
│ └─► SQLite │ └─► Qdrant (:6333) — Mini PC 1
├─► Embedding Service (:3003) ├─► Embedding Service (:3003) — Mini PC 1
│ └─► Ollama │ └─► Ollama (:11434) — Mini PC 1
─► Inference Service (:3001) ─► Inference Service (:3001) — Main PC
└─► Ollama └─► llama-server (:8080) — Main PC
└─► Qdrant (:6333) — Mini PC 1 (direct — semantic search)
``` ```
Note: Orchestration queries Qdrant directly for semantic search (bypassing
the memory service) but always fetches full episode content from the memory
service by ID after the vector search.
## Technology Choices ## Technology Choices
| Concern | Choice | Reason | | Concern | Choice | Reason |
|---|---|---| |---|---|---|
| Language | Node.js (JavaScript) | Familiar stack, async I/O suits service architecture | | Language | Node.js (CommonJS) | Familiar stack, async I/O suits service architecture |
| Package management | npm workspaces | Monorepo with shared code, no publishing needed | | Package management | npm workspaces | Monorepo with shared code, no publishing needed |
| Vector store | Qdrant | Mature, Docker-native, excellent Node.js client | | Vector store | Qdrant | Mature, Docker-native, excellent Node.js client |
| Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user | | Relational store | SQLite (better-sqlite3) | Zero-ops, fast, sufficient for single-user scale |
| LLM runtime | Ollama | Easiest local LLM management, serves embeddings too | | LLM inference | llama.cpp (`llama-server`) | Maximum GPU utilisation on RTX A4000, OpenAI-compatible API |
| Version control | Gitea (self-hosted) | Code stays on local network | | Embeddings | Ollama (`nomic-embed-text`) | Co-located with memory service on Mini PC 1, 768-dim Cosine |
| Reverse proxy | Caddy + Authelia | Automatic HTTPS, SSO/MFA for all exposed services |
| Version control | Gitea (self-hosted) | Code stays on local network |
## Current State
The core four-service architecture is complete and operational. Key capabilities:
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
- **Projects** — sessions grouped with shared or isolated memory pools
- **Auto-naming** — sessions named automatically from first exchange via inference
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
- **Chat client** — view-based UI with sidebar navigation, project views, session management

View File

@@ -7,50 +7,73 @@ services appropriate for its hardware.
## Mini PC 1 — 192.168.0.81 ## Mini PC 1 — 192.168.0.81
Runs: Qdrant, Memory Service, Embedding Service Runs: Qdrant, Memory Service, Embedding Service, Ollama
```bash ```bash
ssh username@192.168.0.81 ssh storme@192.168.0.81
cd ~/nexusai
docker compose -f docker-compose.mini1.yml up -d # Qdrant docker compose -f docker-compose.mini1.yml up -d # Qdrant
npm run memory npm run memory # port 3002
npm run embedding npm run embedding # port 3003
ollama serve # port 11434 — must bind 0.0.0.0 (OLLAMA_HOST=0.0.0.0)
``` ```
> Ollama must be started with `OLLAMA_HOST=0.0.0.0` to accept connections
> from other services on the LAN. Without this, embedding requests from the
> memory service will be refused.
## Mini PC 2 — 192.168.0.205 ## Mini PC 2 — 192.168.0.205
Runs: Gitea, Orchestration Service, Chat Client (via Caddy) Runs: Orchestration Service, Chat Client (via Caddy), Gitea, Caddy, Authelia
```bash
ssh username@192.168.0.205
cd ~/gitea ```bash
docker compose up -d # Gitea ssh storme@192.168.0.205
cd /opt/stacks/network cd /opt/stacks/network
docker compose up -d # Caddy, Authelia, and other network services docker compose up -d # Caddy, Authelia, and other network services
cd ~/nexusai cd ~/nexusAI
npm run orchestration npm run orchestration # port 4000
``` ```
## Main PC ## Main PC — 192.168.0.79
Runs: Ollama, Inference Service Runs: Inference Service, llama-server
```bash
ollama serve ```powershell
npm run inference # Start llama-server first — inference service depends on it
.\llama-gpu\llama-server.exe `
-m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
-ngl 99 --reasoning off --host 0.0.0.0 --port 8080 -c 64000
# Then start inference service
npm run inference # port 3001
``` ```
## Chat Client Deployment ## Chat Client Deployment
The chat client is a React + Vite app build to static files and served by Caddy on Mini PC 2 (Infrastructure node). It does not run as a Node process The chat client is a React + Vite app built to static files and served by
Caddy on Mini PC 2. It does not run as a Node process.
```bash ```bash
# On dev machine or Mini PC 2 after git pull # On Mini PC 2 after git pull
cd ~/nexusAI/packages/chat-client cd ~/nexusAI/packages/chat-client
npm run build
# Set production URL before building
VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com npm run build
# Output lands in packages/chat-client/dist/ # Output lands in packages/chat-client/dist/
# Caddy serves this directory directly via volume mount # Caddy serves this directory directly via Docker volume mount
``` ```
Caddy config (`/opt/docker/caddy/Caddyfile`):
> Do NOT set `VITE_ORCHESTRATION_URL` during local dev — Vite's proxy handles
> routing and setting the HTTPS domain will cause Authelia to intercept API
> requests, producing confusing JSON parse errors.
## Caddy Configuration
The Caddyfile on Mini PC 2 must include a handle block for each route prefix
the client needs to reach. Current required blocks for NexusAI:
```caddy ```caddy
nexus.jellystorm.com { nexus.jellystorm.com {
import authelia import authelia
@@ -63,6 +86,14 @@ nexus.jellystorm.com {
reverse_proxy 192.168.0.205:4000 reverse_proxy 192.168.0.205:4000
} }
handle /models* {
reverse_proxy 192.168.0.205:4000
}
handle /projects* {
reverse_proxy 192.168.0.205:4000
}
handle { handle {
root * /srv/nexusai root * /srv/nexusai
try_files {path} /index.html try_files {path} /index.html
@@ -71,18 +102,45 @@ nexus.jellystorm.com {
} }
``` ```
The Caddy container mounts the dist directory via Docker volume: When adding new top-level routes to the orchestration service, add a matching
handle block here and reload Caddy:
```bash
caddy reload --config /path/to/Caddyfile
```
The Caddy container mounts the `dist` directory via Docker volume:
```yaml ```yaml
- /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai - /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
``` ```
> After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy` > After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
> is required. Caddyfile-only changes only need `docker compose restart caddy`. > is required. Caddyfile-only changes only need `caddy reload`.
## Environment Files ## Environment Files
Each node needs a `.env` file in the relevant service package directory. Each service needs a `.env` file in its package directory. These are not
These are not committed to git. See each service's documentation for committed to git. See each service's documentation for required variables.
required variables.
| Service | Location | Key Variables |
|---|---|---|
| Memory | `packages/memory-service/.env` | `SQLITE_PATH`, `QDRANT_URL`, `EMBEDDING_SERVICE_URL` |
| Embedding | `packages/embedding-service/.env` | `OLLAMA_URL`, `EMBEDDING_MODEL` |
| Inference | `packages/inference-service/.env` | `INFERENCE_PROVIDER`, `INFERENCE_URL`, `DEFAULT_MODEL` |
| Orchestration | `packages/orchestration-service/src/.env` | `MEMORY_SERVICE_URL`, `EMBEDDING_SERVICE_URL`, `INFERENCE_SERVICE_URL`, `QDRANT_URL`, `MODELS_MANIFEST_PATH` |
| Chat client | `packages/chat-client/.env` | `VITE_ORCHESTRATION_URL` (production builds only) |
## Models Manifest
The models manifest (`models.json`) lives on the Main PC alongside the model
files, accessible to orchestration via an SMB mount at `/mnt/nexus-models`.
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
`value` must exactly match the model name as reported by `llama-server`
(including `.gguf` extension). No service restart needed to pick up changes.

View File

@@ -39,21 +39,21 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
|------|--------| |------|--------|
| GPU | NVIDIA RTX A4000 | | GPU | NVIDIA RTX A4000 |
| Role | Primary AI inference node | | Role | Primary AI inference node |
| Key Services | Ollama (inference) | | Key Services | llama-server (llama.cpp), Inference Service |
### Mini PC 1 — Media Node (`192.168.0.81`) ### Mini PC 1 — Media Node (`192.168.0.81`)
| Spec | Detail | | Spec | Detail |
|------|--------| |------|--------|
| GPU | NVIDIA RTX 5050 | | GPU | NVIDIA RTX 5050 |
| Role | Media services, embeddings, vector storage | | Role | Media services, embeddings, vector storage |
| Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding | | Key Services | Jellyfin, Nextcloud, Qdrant, arr stack, NexusAI memory/embedding, Ollama |
| Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) | | Storage | NVMe (OS) + 3x external HDDs (see [Storage Layout](#storage-layout)) |
### Mini PC 2 — Infrastructure Node (`192.168.0.205`) ### Mini PC 2 — Infrastructure Node (`192.168.0.205`)
| Spec | Detail | | Spec | Detail |
|------|--------| |------|--------|
| Role | Network management, monitoring, auth, DNS, git | | Role | Network management, monitoring, auth, DNS, git, NexusAI orchestration |
| Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea | | Key Services | Caddy, Authelia, Tailscale, Pihole, Grafana, Gitea, NexusAI orchestration |
| Storage | NVMe (OS only) | | Storage | NVMe (OS only) |
--- ---
@@ -155,7 +155,8 @@ All external access is routed through **Caddy** (reverse proxy) with **Authelia*
| Service | Notes | | Service | Notes |
|---------|-------| |---------|-------|
| Ollama | Runs LLM inference using the RTX A4000. Also serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. | | llama-server (llama.cpp) | Primary LLM inference using the RTX A4000. Started manually before the inference service. Serves the OpenAI-compatible API on port 8080. |
| Ollama | Serves `nomic-embed-text` embeddings (768-dim vectors) consumed by NexusAI's embedding service on Mini PC 1. |
--- ---
@@ -234,7 +235,7 @@ Phase 1 focused on establishing a stable, secure, and observable foundation:
- ✅ Self-hosted git (Gitea) - ✅ Self-hosted git (Gitea)
- ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud) - ✅ Media stack fully operational (Jellyfin, arr stack, Nextcloud)
- ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent) - ✅ Download pipeline with VPN isolation (Gluetun + qBittorrent)
- ✅ NexusAI foundation services running (Qdrant, Ollama) - ✅ NexusAI foundation services running (Qdrant, Ollama, llama.cpp)
- ✅ Container management across nodes (Portainer + agent) - ✅ Container management across nodes (Portainer + agent)
--- ---
@@ -249,6 +250,6 @@ Phase 2 shifts focus to resilience, security hardening, and smart home integrati
- **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies - **Additional security hardening** — Audit exposed services, tighten firewall rules, review Authelia policies
- **IP webcam integration** — Add camera feeds into the homelab ecosystem - **IP webcam integration** — Add camera feeds into the homelab ecosystem
- **Home Assistant** — Integrate smart home automation and sensor data - **Home Assistant** — Integrate smart home automation and sensor data
- **Continued NexusAI development** — Entities layer, embedding service, inference and orchestration buildout - **Continued NexusAI development** — Entity extraction pipeline, summaries layer, SettingsView implementation
> This section will be expanded as Phase 2 planning matures. > This section will be expanded as Phase 2 planning matures.

283
docs/services/API-routes.md Normal file
View File

@@ -0,0 +1,283 @@
# API Routes
All HTTP endpoints across NexusAI services. Clients communicate only with
the orchestration service (port 4000) — memory service routes are listed
here for reference and direct debugging use.
---
## Orchestration Service — port 4000
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Chat
| Method | Path | Description |
|---|---|---|
| POST | /chat | Send a message, receive full response |
| POST | /chat/stream | Send a message, receive SSE token stream |
**POST /chat and POST /chat/stream — request body:**
```json
{
"sessionId": "your-session-uuid",
"message": "Hello, my name is Tim.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7
}
```
`model` and `temperature` are optional.
**POST /chat — response:**
```json
{
"sessionId": "your-session-uuid",
"response": "Hello Tim! How can I help you today?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"tokenCount": 87
}
```
**POST /chat/stream — response (SSE):**
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":87}
```
### Sessions
| Method | Path | Description |
|---|---|---|
| GET | /sessions | Paginated session list |
| GET | /sessions/:sessionId/history | Paginated episode history for a session |
| PATCH | /sessions/:sessionId | Update session name and/or project assignment |
| DELETE | /sessions/:sessionId | Delete session and all its episodes |
**GET /sessions — query params:**
| Param | Default | Description |
|---|---|---|
| limit | 20 | Sessions per page |
| offset | 0 | Pagination offset |
| projectId | — | Filter by project (integer ID) |
**PATCH /sessions/:sessionId — body:**
```json
{ "name": "My Session", "projectId": 3 }
```
Either `name` or `projectId` is required. Both can be sent together.
Returns the updated session object.
**GET /sessions/:sessionId/history — query params:**
| Param | Default | Description |
|---|---|---|
| limit | 20 | Episodes per page |
| offset | 0 | Pagination offset |
Returns `{ sessionId, episodes: [...] }`. Episodes ordered newest first.
### Projects
| Method | Path | Description |
|---|---|---|
| GET | /projects | Get all projects |
| POST | /projects | Create a new project |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project (nulls session assignments) |
**POST /projects — body:**
```json
{
"name": "My Project",
"description": "Optional description",
"colour": "#3d3a79",
"icon": null,
"isolated": 0
}
```
`name` is required. All other fields optional. `isolated` is `0` or `1`.
Returns `201` with the created project object.
**PATCH /projects/:id — body:** same fields as POST, all optional.
### Models
| Method | Path | Description |
|---|---|---|
| GET | /models | Available models from `models.json` manifest |
Returns array: `[{ "value": "model-name.gguf", "label": "Display Name" }]`
---
## Memory Service — port 3002
Direct access is for debugging only. All client traffic goes through
orchestration.
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Sessions
| Method | Path | Description |
|---|---|---|
| POST | /sessions | Create a new session |
| GET | /sessions | Paginated session list with optional projectId filter |
| GET | /sessions/:id | Get session by internal ID |
| GET | /sessions/by-external/:externalId | Get session by external ID |
| PATCH | /sessions/by-external/:externalId | Update session fields |
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes) |
> Route ordering: `by-external/:externalId` must be defined before `/:id`
> to prevent `by-external` being captured as an ID param.
**POST /sessions — body:**
```json
{ "externalId": "unique-uuid", "metadata": {} }
```
**PATCH /sessions/by-external/:externalId — body:**
```json
{ "name": "Session Name", "projectId": 3 }
```
Both fields are optional. Only provided fields are updated — other fields
are not touched.
### Episodes
| Method | Path | Description |
|---|---|---|
| POST | /episodes | Create episode + auto-embed into Qdrant |
| GET | /episodes/search?q=&limit= | FTS keyword search across all episodes |
| GET | /episodes/:id | Get episode by ID |
| GET | /sessions/:id/episodes?limit=&offset= | Paginated episodes for a session |
| DELETE | /episodes/:id | Delete an episode |
> Route ordering: `/episodes/search` must be defined before `/episodes/:id`.
**POST /episodes — body:**
```json
{
"sessionId": 1,
"userMessage": "Hello",
"aiResponse": "Hi there!",
"tokenCount": 10
}
```
### Projects
| Method | Path | Description |
|---|---|---|
| POST | /projects | Create a new project |
| GET | /projects | Get all projects |
| GET | /projects/:id | Get project by ID |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete project + null session assignments |
Same request/response shape as orchestration `/projects` above.
### Entities
| Method | Path | Description |
|---|---|---|
| POST | /entities | Upsert entity (creates or updates by name + type) |
| GET | /entities/by-type/:type | All entities of a given type |
| GET | /entities/:id | Get entity by ID |
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
> Route ordering: `/entities/by-type/:type` must be before `/entities/:id`.
**POST /entities — body:**
```json
{
"name": "NexusAI",
"type": "project",
"notes": "My AI memory project",
"metadata": {}
}
```
### Relationships
| Method | Path | Description |
|---|---|---|
| POST | /relationships | Upsert a relationship between two entities |
| GET | /entities/:id/relationships | All relationships for an entity |
| DELETE | /relationships | Delete a specific relationship |
**POST /relationships — body:**
```json
{ "fromId": 1, "toId": 2, "label": "uses", "metadata": {} }
```
**DELETE /relationships — body:**
```json
{ "fromId": 1, "toId": 2, "label": "uses" }
```
Relationships are identified by the composite key `(fromId, toId, label)`.
Delete uses request body rather than URL params since this three-part key
is awkward to encode in a path.
---
## Embedding Service — port 3003
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |
**POST /embed — body:**
```json
{ "text": "Hello from NexusAI" }
```
**POST /embed — response:**
```json
{ "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 }
```
---
## Inference Service — port 3001
| Method | Path | Description |
|---|---|---|
| GET | /health | Health check — reports active provider and model |
| POST | /complete | Full completion — awaits entire response |
| POST | /complete/stream | Streaming completion via SSE |
**POST /complete — body:**
```json
{
"prompt": "What is the capital of France?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
}
```
All fields except `prompt` are optional.
**POST /complete — response:**
```json
{
"text": "The capital of France is Paris.",
"model": "gemma-4-26B...gguf",
"done": true,
"evalCount": 8,
"promptEvalCount": 41
}
```

View File

@@ -0,0 +1,128 @@
# Memory Isolation
NexusAI implements project-scoped memory — sessions belonging to the same
project can share semantic context, and isolated projects can be restricted
from drawing on memory outside the project. This document describes how the
system works end-to-end.
## Concepts
**Session** — a single conversation thread. Identified by `external_id`.
**Project** — a named grouping of sessions. Has an `isolated` flag (0 or 1).
**Semantic search** — at inference time, the user's message is embedded and
compared against past episodes in Qdrant to surface relevant context. The
scope of this search is controlled by the project context.
## Semantic Search Scope
| Session state | Semantic search scope |
|---|---|
| No project | Own session's episodes only |
| Assigned to a non-isolated project | All episodes across all sessions in the project |
| Assigned to an isolated project | All episodes within the project only |
| Removed from a project | Own session's episodes only (from that point) |
Sessions with no project assigned behave the same as they always have —
only their own past episodes are searched.
## How It Works
### Step 1 — Project context resolution (orchestration)
In `chat/index.js`, immediately after session resolution:
```js
let projectSessionIds = null;
if (session.project_id) {
const project = await memory.getProject(session.project_id);
if (project) {
const projectSessions = await memory.getProjectSessions(session.project_id);
projectSessionIds = projectSessions.map(s => s.id);
}
}
```
If the session belongs to any project (isolated or not), `projectSessionIds`
is populated with the internal integer IDs of all sessions in that project.
For **non-isolated projects**, this expands the search to all project sessions.
For **isolated projects**, the same set is used but the intent is restriction
— since `projectSessionIds` only contains project sessions, no external
episodes can appear.
Both cases use the same code path — the `isolated` flag does not change the
query logic, only the conceptual meaning.
### Step 2 — Qdrant filter construction
In `services/qdrant.js`, `searchEpisodes` builds the filter:
```js
if (projectSessionIds) {
body.filter = {
should: projectSessionIds.map(id => ({
key: 'sessionId', match: { value: id }
}))
};
} else if (sessionId) {
body.filter = { must: [{ key: 'sessionId', match: { value: sessionId } }] };
}
```
`should` is Qdrant's "match any of" operator — equivalent to SQL
`WHERE sessionId IN (...)`. When `projectSessionIds` is set, the single-session
filter is not used.
### Step 3 — Episode payloads
Every episode upserted into Qdrant carries `{ sessionId, createdAt }` in its
payload. `sessionId` here is the **internal integer ID** from SQLite. This
is what the Qdrant filter matches against.
This means the filter works correctly regardless of when episodes were created
or when a session was added to a project — the payload is immutable.
## Important Behaviours
**Pre-existing episodes are included immediately.** When a session is added
to a project and a new message is sent, Qdrant can match all of that session's
existing episodes since the filter only requires the `sessionId` to be in the
project's session list.
**Removing a session from a project takes effect immediately.** On the next
message, `getProjectSessions` will not include that session's ID, so its
episodes disappear from the semantic search scope.
**New sessions created from ProjectView are assigned after the first message.**
The `useChat` hook writes the `project_id` assignment via `updateSession` after
`onDone` fires. There is a brief window during the first message where the
session has no project assigned. The project is correctly applied from the
second message onward.
## Isolated vs Non-Isolated
The `isolated` flag is stored on the project but does not currently change the
query logic — both isolated and non-isolated projects result in a
`projectSessionIds` filter. The distinction is semantic and enforced by
the project's membership:
- **Non-isolated** — intentionally draws from all sessions in the project,
creating a shared memory pool for related conversations
- **Isolated** — by design contains only sessions explicitly added to it,
so the same filter naturally restricts context to project-only episodes
If cross-project contamination became a concern (e.g. a session accidentally
added to the wrong project), removing it from the project immediately restores
isolation.
## Qdrant Payload Structure
Episodes are stored with this payload:
```json
{ "sessionId": 42, "createdAt": 1776080188 }
```
`sessionId` is the SQLite `sessions.id` integer, not the `external_id` UUID.
This is important when building filters — always use internal IDs.

View File

@@ -55,10 +55,6 @@ VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com
during local development, bypassing Caddy and Authelia entirely: during local development, bypassing Caddy and Authelia entirely:
```js ```js
// vite.config.js
import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';
export default defineConfig({ export default defineConfig({
plugins: [react()], plugins: [react()],
server: { server: {
@@ -72,7 +68,8 @@ export default defineConfig({
}); });
``` ```
If new routes are added to the orchestration service, add them here too. When adding new top-level routes to the orchestration service, add a matching
entry here too.
## Internal Structure ## Internal Structure
@@ -93,12 +90,13 @@ src/
│ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation │ ├── Sidebar.jsx # Left sidebar — projects, recent chats, navigation
│ ├── ChatWindow.jsx # Centre panel — message thread and input bar │ ├── ChatWindow.jsx # Centre panel — message thread and input bar
│ ├── MessageBubble.jsx # Individual message bubble (user or assistant) │ ├── MessageBubble.jsx # Individual message bubble (user or assistant)
│ ├── InfoPanel.jsx # Right panel — model selector and session metadata │ ├── InfoPanel.jsx # Right panel — model selector and session metadata (slide-in)
│ ├── SessionModal.jsx # Modal for session rename and delete confirmation │ ├── SessionModal.jsx # Modal for session rename, project assignment, delete
│ ├── ProjectModal.jsx # Modal for project create, edit, and delete confirmation │ ├── ProjectModal.jsx # Modal for project create, edit, delete
│ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete │ ├── AllChatsView.jsx # Full paginated session list with multi-select bulk delete
│ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete │ ├── AllProjectsView.jsx # Project tile grid with create/edit/delete
── SettingsView.jsx # Settings placeholder (sections: Appearance, Memory, Models, About) ── ProjectView.jsx # Individual project — session list, new chat button
│ └── SettingsView.jsx # Settings placeholder (Appearance, Memory, Models, About)
├── index.css # Global reset, CSS variables, utility classes ├── index.css # Global reset, CSS variables, utility classes
└── main.jsx # React entry point └── main.jsx # React entry point
``` ```
@@ -107,9 +105,9 @@ src/
## Layout ## Layout
The app uses a view-based layout. `App.jsx` manages a `view` state The app uses a view-based layout. `App.jsx` manages a `view` state string
(`'chat' | 'all-chats' | 'all-projects' | 'settings'`) that controls which that controls which main panel is rendered. The left sidebar and right info
main panel is rendered. The left sidebar and right info panel are always present. panel are persistent across all views.
``` ```
┌──────────────────┬──────────────────────────────┐ ┌──────────────────┬──────────────────────────────┐
@@ -117,9 +115,9 @@ main panel is rendered. The left sidebar and right info panel are always present
│ (collapsible) │ │ │ (collapsible) │ │
│ │ chat → ChatWindow │ │ │ chat → ChatWindow │
│ + New Chat │ all-chats → AllChatsView │ │ + New Chat │ all-chats → AllChatsView │
│ ⊞ New Project │ all-projects → AllProjectsView│ │ ⊞ View Projects │ all-projects → AllProjectsView│
│ │ settings → SettingsView │ │ │ project → ProjectView
│ PROJECTS ▾ │ │ PROJECTS ▾ │ settings → SettingsView
│ [tile] [tile] │ │ │ [tile] [tile] │ │
│ All Projects → │ │ │ All Projects → │ │
│ │ │ │ │ │
@@ -132,10 +130,22 @@ main panel is rendered. The left sidebar and right info panel are always present
└──────────────────┴──────────────────────────────┘ └──────────────────┴──────────────────────────────┘
``` ```
The sidebar collapses to a 48px icon rail. The right info panel (`InfoPanel`) The sidebar collapses to a 48px icon rail. The right `InfoPanel` slides in
slides in from the right over the main area using `transform: translateX()` from the right using `transform: translateX()` hidden by default, toggled
it is hidden by default (`rightOpen` starts `false`) and toggled via a button via the `⊹` button in the `ChatWindow` header.
in the `ChatWindow` header.
## View Routing
| View | Component | Trigger |
|---|---|---|
| `'chat'` | `ChatWindow` | Default; selecting a session; new chat |
| `'all-chats'` | `AllChatsView` | "All Chats →" or ☰ icon in collapsed rail |
| `'all-projects'` | `AllProjectsView` | "View Projects" button or ⊞ icon |
| `'project'` | `ProjectView` | Clicking a project tile in the sidebar |
| `'settings'` | `SettingsView` | Settings button or ⚙ icon |
`activeProject` state in `App.jsx` tracks which project `ProjectView` is
displaying. Set via `onSelectProject` before navigating to `'project'`.
## CSS Architecture ## CSS Architecture
@@ -181,91 +191,47 @@ rules, inline styles for dynamic prop-driven values.
| `.label-upper` | Uppercase section label style | | `.label-upper` | Uppercase section label style |
| `.truncate` | Text overflow ellipsis | | `.truncate` | Text overflow ellipsis |
## API Layer
All orchestration calls are centralised in `src/api/orchestration.js`:
| Function | Method | Path | Description |
|---|---|---|---|
| `fetchSessions` | GET | /sessions | Load session list for sidebar |
| `fetchSessionHistory` | GET | /sessions/:id/history | Load episode history on session select |
| `sendMessage` | POST | /chat | Send message, await full response |
| `streamMessage` | POST | /chat/stream | Send message, receive SSE token stream |
| `fetchModels` | GET | /models | Load available models from manifest |
| `renameSession` | PATCH | /sessions/:id | Rename a session |
| `deleteSession` | DELETE | /sessions/:id | Delete a session |
| `fetchProjects` | GET | /projects | Load project list |
| `createProject` | POST | /projects | Create a new project |
| `updateProject` | PATCH | /projects/:id | Update a project |
| `deleteProject` | DELETE | /projects/:id | Delete a project |
`streamMessage` returns an abort function — call it to cancel a stream mid-flight.
Uses a buffer pattern to handle SSE chunks that may span multiple network packets.
## Streaming ## Streaming
The chat input sends messages via `POST /chat/stream`. Tokens arrive as SSE events: Messages are sent via `POST /chat/stream`. Tokens arrive as SSE events and
are written into the active assistant bubble token by token via
`updateLastMessage`. The blinking cursor in `MessageBubble` is shown while
`message.streaming === true`.
``` `useChat` accepts an optional `projectId` parameter in `sendMessage`. After
data: {"text":"Hello"} the first message completes in a new session, if `projectId` is set,
data: {"text":" Tim"} `updateSession` is called to write the project assignment to the backend.
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
An empty assistant bubble is appended immediately when the stream opens, then
updated token by token using `updateLastMessage`. The blinking cursor in
`MessageBubble` is shown while `message.streaming === true` and disappears
when the done event is received. Model name and token count from the done
event are stored in `useChat` state and displayed in the InfoPanel.
## Dynamic Model Selector
Available models are fetched from `GET /models` on mount via the `useModels` hook.
The hook initialises with `FALLBACK_MODELS` from `constants.js` and replaces them
with the server response on success. If the fetch fails, the fallback list is used
silently — a warning is logged to the console.
To add a model, update `models.json` on the main PC — no client rebuild needed.
`FALLBACK_MODELS` in `constants.js` should be kept in sync with `models.json`
as a reasonable last-resort list in case the endpoint is unreachable.
## Session Management ## Session Management
Sessions are identified by `external_id` — a UUID generated client-side via the Sessions are identified by `external_id` — a UUID generated client-side via
`uuid` package. New sessions are created locally and auto-registered in the memory the `uuid` package. New sessions are created locally and auto-registered in
service on the first message. The session list refreshes after each completed the memory service on the first message. The session list refreshes after
response to surface newly created sessions. each completed response to surface newly created sessions.
### Session Name Display ### Auto-naming
The chat header and session rows both display `session.name` if set, falling back After the first exchange completes, orchestration fires a secondary inference
to `session.external_id` if no name has been assigned: call with a short naming prompt (max 20 tokens, temperature 0.3). The result
is written back as `session.name`. The client fires a second `refreshSessions`
after a 3-second delay to pick up the name once written.
```js Manually renamed sessions are never overwritten — the `!session.name` guard
activeSession.name || activeSession.external_id in `chat/index.js` prevents this.
```
### Session Actions ### Session Actions
Session rows in the sidebar support rename and delete via two entry points: Session rows support rename, project assignment, and delete via:
- **Hover** — reveals ✎ and ✕ icon buttons alongside the row
- **Right-click** — context menu with the same actions
- **Hover** — reveals ✎ (rename) and ✕ (delete) icon buttons alongside the row `SessionModal` handles rename and project assignment together in `settings`
- **Right-click** — opens a context menu with the same actions mode, and delete confirmation in `confirm-delete` mode.
Both trigger `SessionModal` — a shared modal component with two modes:
| Mode | Trigger | Behaviour |
|---|---|---|
| `settings` | Rename button / context menu rename | Shows name input, saves on Enter or Save button |
| `confirm-delete` | Delete button / context menu delete | Shows confirmation dialog, requires explicit Delete click |
Actions are disabled on unsaved (new) sessions that haven't had a first message sent yet.
### Active Session Clearing on Delete ### Active Session Clearing on Delete
When the deleted session is the currently active one, `App.jsx` detects the match When the deleted session is the currently active one, `App.jsx` clears the
and calls `selectSession(null)` to clear the chat window before refreshing the list: chat window before refreshing the list:
```js ```js
function handleSessionsChange(deletedSession) { function handleSessionsChange(deletedSession) {
@@ -276,53 +242,23 @@ function handleSessionsChange(deletedSession) {
} }
``` ```
### Context Menu ### Key Patterns
Implemented via `useContextMenu` hook — tracks `{ x, y, session }` state and - Button nesting: action icons are siblings of row buttons, not children — HTML forbids `<button>` inside `<button>`
attaches a `window` click listener to dismiss on any outside click. Rendered - Context menu rendered outside sidebar via React fragment to avoid `overflow: hidden` clipping
outside the sidebar div via a React fragment to avoid being clipped by - `useContextMenu` dismisses on a `window` click listener
`overflow: hidden`. - Dynamic `updateSession` SQL builds `SET` clause from only the fields passed — prevents accidental overwrites
### Button Nesting
Session row action icons (✎ ✕) are rendered as siblings of the session
`<button>`, not children — HTML does not allow `<button>` inside `<button>`.
The outer `<div>` owns hover state and context menu; the inner `<button>` handles
session selection; action icon buttons sit alongside it in the same flex row.
## Project Management ## Project Management
Projects are a first-class concept in the UI. The `useProjects` hook fetches `useProjects` fetches the project list from `GET /projects` on mount and
the project list from `GET /projects` on mount and exposes a `refreshProjects` exposes `refreshProjects` for keeping the sidebar in sync after mutations.
callback for keeping the sidebar in sync after mutations.
### Project Actions `ProjectModal` handles create, edit, and delete confirmation. Fields: name
(required), description (optional), colour picker, isolated toggle.
Projects are managed from `AllProjectsView` via `ProjectModal`: `ProjectView` shows the project's name, description, isolated badge (if set),
and a filtered session list. The "+ New Chat" button creates a new session,
navigates to `'chat'`, and writes the project assignment after the first message.
| Mode | Behaviour | For memory isolation behaviour, see `memory-isolation.md`.
|---|---|
| `create` | Name (required), description (optional), colour picker |
| `edit` | Same fields as create, pre-populated |
| `confirm-delete` | Confirmation dialog — sessions in the project are not deleted |
The sidebar Projects section shows up to 6 project tiles as coloured badge buttons.
Clicking any tile navigates to `AllProjectsView`. The "All Projects →" link is
always shown below the tiles.
After any create, edit, or delete in `AllProjectsView`, `onProjectsChange` is called
to trigger `refreshProjects` in `App.jsx`, keeping the sidebar tiles in sync.
## View Routing
`App.jsx` manages a `view` state string that controls which main panel renders:
| View | Component | Trigger |
|---|---|---|
| `'chat'` | `ChatWindow` | Default; selecting a session from sidebar or AllChatsView |
| `'all-chats'` | `AllChatsView` | "All Chats →" link or ☰ icon in collapsed rail |
| `'all-projects'` | `AllProjectsView` | "All Projects →" link, ⊞ icon, or New Project button |
| `'settings'` | `SettingsView` | Settings button or ⚙ icon in collapsed rail |
`AllChatsView` navigates back to `'chat'` on session row click, passing the selected
session to `selectSession` so history loads immediately.

View File

@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL | | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use | | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
> from other services.
## Model ## Model
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**. **nomic-embed-text** via Ollama produces **768-dimension** vectors with
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`. **Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
If the embedding model is changed, the Qdrant collections must be reinitialized If the embedding model is changed, the Qdrant collections must be reinitialized
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
the single change required to keep everything consistent. is the single change required to keep everything consistent.
## Ollama API ## Ollama API
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape: Uses the `/api/embed` endpoint (Ollama v0.4+):
```json ```json
// Request
{ "model": "nomic-embed-text", "input": "text to embed" } { "model": "nomic-embed-text", "input": "text to embed" }
```
Response key is `embeddings[0]` — an array of 768 floats.
## Endpoints // Response key
embeddings[0] // array of 768 floats
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |
### Embed
| Method | Path | Description |
|---|---|---|
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |
---
**POST /embed**
Embeds a single text string and returns the vector.
Request body:
```json
{
"text": "Hello from NexusAI"
}
``` ```
Response: > Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
```json > returned `embedding` (singular). Use `/api/embed`, `input`, and
{ > `embeddings[0]` for Ollama v0.4+.
"embedding": [0.123, -0.456, ...],
"model": "nomic-embed-text",
"dimensions": 768
}
```
--- ## Usage in NexusAI
**POST /embed/batch** The embedding service is called in two places:
Embeds an array of strings sequentially and returns all vectors in the same order. 1. **Memory service** — after each episode is saved to SQLite, the combined
Ollama does not natively parallelize embeddings, so requests are processed one at a time. `User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
This is fire-and-forget — failures are logged but don't affect the response.
Request body: 2. **Orchestration service** — the user's message is embedded at the start of
```json the chat pipeline to perform semantic search against past episodes.
{
"texts": ["first sentence", "second sentence"]
}
```
Response: For all HTTP endpoints, see `api-routes.md`.
```json
{
"embeddings": [[0.123, ...], [0.456, ...]],
"model": "nomic-embed-text",
"dimensions": 768,
"count": 2
}
```

View File

@@ -24,20 +24,19 @@ to switch inference backends without changes to the rest of the system.
| Variable | Required | Default | Description | | Variable | Required | Default | Description |
|---|---|---|---| |---|---|---|---|
| PORT | No | 3001 | Port to listen on | | PORT | No | 3001 | Port to listen on |
| INFERENCE_PROVIDER | No | llamacpp | Active inference provider (`ollama` or `llamacpp`) | | INFERENCE_PROVIDER | No | llamacpp | Active provider (`ollama` or `llamacpp`) |
| INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime | | INFERENCE_URL | No | http://localhost:8080 | URL of the inference runtime |
| DEFAULT_MODEL | No | local-model | Default model name passed to the provider | | DEFAULT_MODEL | No | local-model | Default model name passed to the provider |
> `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this > `INFERENCE_URL` points to `llama-server` directly (port 8080), not to this
> service itself. The orchestration service uses `INFERENCE_SERVICE_URL` to > service. The orchestration service uses `INFERENCE_SERVICE_URL` to reach
> reach this service on port 3001. > this service on port 3001.
## Provider Architecture ## Provider Architecture
The inference service uses a provider pattern to abstract the underlying The active provider is selected at startup via `INFERENCE_PROVIDER` and
LLM runtime. The active provider is selected at startup via `INFERENCE_PROVIDER` loaded from `src/providers/`. Both providers expose identical function
and loaded from `src/providers/`. Both providers expose identical function signatures.
signatures, so the rest of the service is unaware of which backend is active.
### Supported Providers ### Supported Providers
@@ -46,28 +45,36 @@ signatures, so the rest of the service is unaware of which backend is active.
| llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** | | llama.cpp | `llamacpp` | llama.cpp server (OpenAI-compatible API) — **current default** |
| Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback | | Ollama | `ollama` | Ollama via the `ollama` npm package — available as fallback |
Switching providers requires only a `.env` change — no code modifications needed: Switching providers requires only a `.env` change — no code modifications:
``` ```
INFERENCE_PROVIDER=llamacpp INFERENCE_PROVIDER=llamacpp
INFERENCE_URL=http://localhost:8080 INFERENCE_URL=http://localhost:8080
``` ```
### Provider Validation The provider loader throws immediately on an unknown value, preventing silent
misconfiguration.
## Internal Structure
The provider loader validates `INFERENCE_PROVIDER` at startup and throws immediately
if an unknown value is set — prevents silent misconfiguration:
``` ```
Error: Unknown inference provider: "foo". Valid options: ollama, llamacpp src/
├── providers/
│ ├── ollama.js # Ollama provider
│ └── llamacpp.js # llama.cpp provider (OpenAI-compatible REST)
├── routes/
│ └── inference.js # /complete and /complete/stream route handlers
├── infer.js # Provider loader — selects and re-exports active provider
└── index.js # Express app + route definitions
``` ```
## llama.cpp Provider ## llama.cpp Provider
The llama.cpp provider uses the OpenAI-compatible REST API exposed by `llama-server`. Uses the OpenAI-compatible REST API exposed by `llama-server`.
### Starting llama-server ### Starting llama-server
`llama-server` must be started manually on the main PC before the inference service Must be started manually on the main PC before the inference service can
can handle requests. It loads a single model at startup: handle requests:
```powershell ```powershell
.\llama-gpu\llama-server.exe ` .\llama-gpu\llama-server.exe `
@@ -79,40 +86,29 @@ can handle requests. It loads a single model at startup:
-c 64000 -c 64000
``` ```
Key flags:
| Flag | Description | | Flag | Description |
|---|---| |---|---|
| `-m` | Path to the `.gguf` model file |
| `-ngl 99` | Offload as many layers as possible to GPU | | `-ngl 99` | Offload as many layers as possible to GPU |
| `--reasoning off` | Disables thinking/reasoning delay on Gemma 4 models | | `--reasoning off` | Disables thinking delay on Gemma 4 models |
| `--host 0.0.0.0` | Allows connections from other machines on the LAN | | `--host 0.0.0.0` | Allows LAN connections |
| `--port 8080` | Port for the llama-server HTTP API |
| `-c 64000` | Context window size in tokens | | `-c 64000` | Context window size in tokens |
> `-c 64000` is intentionally large. Monitor VRAM usage — if pressure builds, > `-c 64000` is intentionally large. NexusAI's memory architecture handles
> reduce this value. The NexusAI memory architecture handles context injection > context injection so 68K is often sufficient if VRAM pressure builds.
> so a smaller window (68K) is often sufficient.
### Model Naming ### Model Naming
The model name sent in API requests must match the name as reported by The model name in requests must match the name reported by `llama-server`
`llama-server`including the `.gguf` extension. The reported name can be including the `.gguf` extension:
verified with:
```powershell ```powershell
Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models" Invoke-RestMethod -Uri "http://192.168.0.79:8080/v1/models"
``` ```
Set `DEFAULT_MODEL` in `.env` to the exact reported name: Set `DEFAULT_MODEL` in `.env` to the exact reported name.
```
DEFAULT_MODEL=gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf
```
### Inference Parameters ### Inference Parameters
The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
| NexusAI option | API field | Default | | NexusAI option | API field | Default |
|---|---|---| |---|---|---|
| `temperature` | `temperature` | 0.7 | | `temperature` | `temperature` | 0.7 |
@@ -122,18 +118,6 @@ The llamacpp provider maps NexusAI options to OpenAI-compatible fields:
| `repeatPenalty` | `repeat_penalty` | 1.1 | | `repeatPenalty` | `repeat_penalty` | 1.1 |
| `seed` | `seed` | null (random) | | `seed` | `seed` | null (random) |
## Internal Structure
```
src/
├── providers/
│ ├── ollama.js # Ollama provider — uses ollama npm package
│ └── llamacpp.js # llama.cpp provider — uses OpenAI-compatible REST API
├── routes/
│ └── inference.js # /complete and /complete/stream route handlers
├── infer.js # Provider loader — selects and re-exports active provider
└── index.js # Express app + route definitions
```
## Streaming Response Format ## Streaming Response Format
The llama.cpp provider yields chunks in this shape: The llama.cpp provider yields chunks in this shape:
@@ -143,7 +127,7 @@ The llama.cpp provider yields chunks in this shape:
{ response: '', done: true, model: "model-name.gguf", tokenCount: 42 } { response: '', done: true, model: "model-name.gguf", tokenCount: 42 }
``` ```
The inference route re-emits these as SSE events: The inference route re-emits as SSE:
``` ```
data: {"response":"token text"} data: {"response":"token text"}
data: {"done":true,"model":"model-name.gguf","tokenCount":42} data: {"done":true,"model":"model-name.gguf","tokenCount":42}
@@ -151,66 +135,6 @@ data: [DONE]
``` ```
`model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop` `model` and `tokenCount` are captured from the llama.cpp `finish_reason: stop`
chunk (`usage.completion_tokens`) and emitted on the done event so the chunk and emitted on the done event.
orchestration layer can forward them to the client.
## Endpoints For all HTTP endpoints, see `api-routes.md`.
### Health
| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check — reports active provider and model |
### Inference
| Method | Path | Description |
|---|---|---|
| POST | /complete | Standard completion — returns full response when done |
| POST | /complete/stream | Streaming completion via Server-Sent Events |
---
**POST /complete**
Request body:
```json
{
"prompt": "What is the capital of France?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7,
"maxTokens": 1024
}
```
`model` is optional — falls back to `DEFAULT_MODEL` if omitted.
`maxTokens` is optional — defaults to 1024.
`temperature` is optional — defaults to 0.7.
Response:
```json
{
"text": "The capital of France is Paris.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"done": true,
"evalCount": 8,
"promptEvalCount": 41
}
```
---
**POST /complete/stream**
Same request body as `/complete`.
Response is a stream of Server-Sent Events:
```
data: {"response":"The"}
data: {"response":" capital of France is Paris."}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":8}
data: [DONE]
```
Clients should accumulate `response` fields to build the full response string.
The `done` event carries `model` and `tokenCount` for display in the UI.

View File

@@ -43,48 +43,34 @@ src/
│ └── index.js # Qdrant collection management, upsert, search, delete │ └── index.js # Qdrant collection management, upsert, search, delete
├── entities/ ├── entities/
│ └── index.js # Entity + relationship CRUD │ └── index.js # Entity + relationship CRUD
└── index.js # Express app + route definitions └── index.js # Express app + all route definitions
``` ```
## SQLite Schema ## SQLite Schema
Six core tables: Six core tables:
- **sessions** — top-level conversation containers, identified by an `external_id`, optional `name`, and optional `project_id` - **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
- **episodes** — individual exchanges (user message + AI response) tied to a session - **episodes** — individual exchanges (user message + AI response) tied to a session
- **entities** — named things the system learns about (people, places, concepts) - **entities** — named things the system learns about (people, places, concepts)
- **relationships** — directional labeled links between entities - **relationships** — directional labeled links between entities
- **summaries** — condensed episode groups for efficient context retrieval - **summaries** — condensed episode groups for efficient context retrieval
- **projects** — named groupings of sessions with optional description, colour, and icon - **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`
### Migrations ### Migrations
Schema changes that cannot be expressed in `CREATE TABLE IF NOT EXISTS` are applied Schema changes that cannot use `CREATE TABLE IF NOT EXISTS` are applied as
as migrations in `db/index.js` at startup, wrapped in try/catch to safely ignore idempotent migrations in `db/index.js` at startup:
already-applied changes:
```js ```js
try { try { db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); } catch {}
db.exec(`ALTER TABLE sessions ADD COLUMN name TEXT`); try { db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`); } catch {}
} catch {} try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`); } catch {}
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
try {
db.exec(`ALTER TABLE sessions ADD COLUMN project_id INTEGER REFERENCES projects(id)`);
} catch {}
try {
db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(project_id)`);
} catch {}
``` ```
This pattern is idempotent — safe to run on every startup. New migrations should New migrations are always appended here — never modify the schema file for
always be appended here rather than modifying the schema file, since `ALTER TABLE` existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
and index creation on existing tables cannot use `IF NOT EXISTS` guards in SQLite.
Current migrations:
- `ALTER TABLE sessions ADD COLUMN name TEXT` — adds display name to sessions
- `ALTER TABLE sessions ADD COLUMN project_id INTEGER` — links sessions to projects
- `CREATE INDEX idx_sessions_project` — index on the new project_id column
### FTS5 Full-Text Search ### FTS5 Full-Text Search
@@ -96,11 +82,27 @@ keep the FTS index automatically in sync with the episodes table.
- `journal_mode = WAL` — non-blocking reads during writes - `journal_mode = WAL` — non-blocking reads during writes
- `foreign_keys = ON` — enforces referential integrity and cascade deletes - `foreign_keys = ON` — enforces referential integrity and cascade deletes
- PRAGMAs are set via `db.pragma()` separately from `db.exec()` - PRAGMAs set via `db.pragma()`, not `db.exec()`
### Dynamic Session Updates
`updateSession` builds its `SET` clause dynamically from only the fields
passed — prevents partial updates from overwriting fields that weren't
touched:
```js
function updateSession(id, { name, projectId } = {}) {
const updates = [];
const values = [];
if (name !== undefined) { updates.push('name = ?'); values.push(name ?? null); }
if (projectId !== undefined) { updates.push('project_id = ?'); values.push(projectId ?? null); }
// ...
}
```
## Qdrant / Semantic Layer ## Qdrant / Semantic Layer
Three collections are initialized on service startup (created if they don't already exist): Three Qdrant collections are initialized on service startup:
| Collection | Purpose | | Collection | Purpose |
|---|---| |---|---|
@@ -108,208 +110,50 @@ Three collections are initialized on service startup (created if they don't alre
| `entities` | Embeddings for named entities | | `entities` | Embeddings for named entities |
| `summaries` | Embeddings for condensed episode summaries | | `summaries` | Embeddings for condensed episode summaries |
All collections use **768-dimension vectors** with **Cosine similarity**, matching the All collections use **768-dimension vectors** with **Cosine similarity**,
output of the `nomic-embed-text` embedding model via Ollama. matching `nomic-embed-text` via Ollama. Vector size and distance metric are
defined in `@nexusai/shared` — not hardcoded here.
Vector dimension and distance metric are defined in `@nexusai/shared` constants Each collection exposes three operations in `src/semantic/index.js`:
(`QDRANT.VECTOR_SIZE`, `QDRANT.DISTANCE_METRIC`) — not hardcoded in this service. upsert, search (with optional Qdrant filter), and delete. The `wait: true`
flag is used on all writes.
### Semantic Layer Operations
Each collection exposes three operations via helper functions in `src/semantic/index.js`:
- **Upsert** — stores a vector with a payload containing the SQLite row ID, enabling
lookups back to the full content after a vector search
- **Search** — returns the top-k most similar vectors, with optional Qdrant filter
- **Delete** — removes a vector point by ID
The `wait: true` flag is used on all write operations so the caller receives confirmation
only after Qdrant has committed the change.
## Embedding Write Path ## Embedding Write Path
When a new episode is created, the memory service automatically generates and stores When a new episode is created:
a vector embedding in Qdrant via the embedding service:
1. Episode is saved to SQLite synchronously — the response is returned immediately 1. Episode saved to SQLite synchronously — response returned immediately
2. Both sides of the exchange are combined into a single text: 2. User message + AI response combined: `User: ...\nAssistant: ...`
``` 3. Text sent to embedding service (`POST /embed`)
User: {userMessage} 4. Vector upserted into `episodes` Qdrant collection with payload `{ sessionId, createdAt }`
Assistant: {aiResponse}
```
3. This text is sent to the embedding service (`POST /embed`)
4. The returned vector is upserted into the `episodes` Qdrant collection with a
payload of `{ sessionId, createdAt }` for filtering and lookups
The embedding step is **fire-and-forget** — it runs asynchronously after the SQLite This step is **fire-and-forget** — if embedding fails, the episode is still
insert succeeds. If embedding fails, the episode is still saved and searchable via saved and searchable via FTS. The error is logged but not surfaced.
FTS. The error is logged but does not affect the API response.
### Hybrid Retrieval Pattern > The Qdrant payload stores `sessionId` (the internal integer ID). This is
> used for per-session and per-project filtering during semantic search. See
Qdrant and SQLite work as a pair — neither operates in isolation: > `memory-isolation.md` for how project-level filtering works.
1. Query is embedded and searched in Qdrant → returns IDs + similarity scores
2. IDs are used to fetch full content from SQLite
3. Results are ranked and assembled into a context package
## Entity Layer ## Entity Layer
Entities and relationships are stored in SQLite with two key constraints: Entities and relationships use upsert semantics with composite unique
constraints to prevent duplicates:
- `UNIQUE(name, type)` on entities — ensures no duplicates; upsert updates existing records - `UNIQUE(name, type)` on entities
- `UNIQUE(from_id, to_id, label)` on relationships — prevents duplicate edges - `UNIQUE(from_id, to_id, label)` on relationships
- `ON DELETE CASCADE` on both `from_id` and `to_id` — deleting an entity automatically - `ON DELETE CASCADE` on relationship foreign keys
removes all relationships where it appears on either end
## Endpoints ## Project Delete Behaviour
### Health Deleting a project runs as a transaction — it first nulls out `project_id`
on all assigned sessions, then deletes the project. This avoids a foreign
key constraint failure since `sessions.project_id` has no `ON DELETE` rule:
| Method | Path | Description | ```js
|---|---|---| const doDelete = db.transaction(() => {
| GET | /health | Service health check | db.prepare(`UPDATE sessions SET project_id = NULL WHERE project_id = ?`).run(id);
db.prepare(`DELETE FROM projects WHERE id = ?`).run(id);
### Sessions });
| Method | Path | Description |
|---|---|---|
| POST | /sessions | Create a new session |
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:id | Get session by internal ID |
| GET | /sessions/by-external/:externalId | Get session by external ID |
| PATCH | /sessions/by-external/:externalId | Update session name |
| DELETE | /sessions/by-external/:externalId | Delete session (cascades to episodes + summaries) |
> Route ordering matters in Express: `by-external/:externalId` must be defined before
> `/:id` to prevent the literal string `by-external` being captured as an ID parameter.
**POST /sessions body:**
```json
{
"externalId": "unique-session-id",
"metadata": {}
}
``` ```
**PATCH /sessions/by-external/:externalId body:** For all HTTP endpoints, see `api-routes.md`.
```json
{
"name": "My Renamed Session"
}
```
Returns the updated session object. `name` is required and must be non-empty.
**DELETE /sessions/by-external/:externalId**
Returns `204 No Content` on success. Cascades to delete all associated episodes
and summaries via SQLite `ON DELETE CASCADE`.
### Episodes
| Method | Path | Description |
|---|---|---|
| POST | /episodes | Create episode + auto-embed into Qdrant |
| GET | /episodes/search?q=&limit= | Full-text search across episodes |
| GET | /episodes/:id | Get episode by ID |
| GET | /sessions/:id/episodes?limit=&offset= | Get paginated episodes for a session |
| DELETE | /episodes/:id | Delete an episode |
**POST /episodes body:**
```json
{
"sessionId": 1,
"userMessage": "Hello",
"aiResponse": "Hi there!",
"tokenCount": 10,
"metadata": {}
}
```
> Note: `/episodes/search` must be defined before `/episodes/:id` in Express to prevent
> the word `search` being captured as an ID parameter.
### Projects
| Method | Path | Description |
|---|---|---|
| POST | /projects | Create a new project |
| GET | /projects | Get all projects |
| GET | /projects/:id | Get project by ID |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project |
**POST /projects body:**
```json
{
"name": "My Project",
"description": "Optional description",
"colour": "#3d3a79",
"icon": null
}
```
`name` is required. `description`, `colour`, and `icon` are optional.
Returns `201` with the created project object on success.
**PATCH /projects/:id body:** same fields as POST, all optional.
**DELETE /projects/:id**
Returns `204 No Content`. Sessions assigned to the project are not deleted —
their `project_id` foreign key is left as-is (nullable, no cascade).
### Entities
| Method | Path | Description |
|---|---|---|
| POST | /entities | Upsert an entity (creates or updates by name + type) |
| GET | /entities/by-type/:type | Get all entities of a given type |
| GET | /entities/:id | Get entity by internal ID |
| DELETE | /entities/:id | Delete entity (cascades to relationships) |
**POST /entities body:**
```json
{
"name": "NexusAI",
"type": "project",
"notes": "My AI memory project",
"metadata": {}
}
```
> Note: `/entities/by-type/:type` must be defined before `/entities/:id` in Express to
> prevent `by-type` being captured as an ID parameter.
### Relationships
| Method | Path | Description |
|---|---|---|
| POST | /relationships | Upsert a relationship between two entities |
| GET | /entities/:id/relationships | Get all relationships originating from an entity |
| DELETE | /relationships | Delete a specific relationship |
**POST /relationships body:**
```json
{
"fromId": 1,
"toId": 2,
"label": "uses",
"metadata": {}
}
```
**DELETE /relationships body:**
```json
{
"fromId": 1,
"toId": 2,
"label": "uses"
}
```
> Relationships are identified by the composite key `(fromId, toId, label)`. Delete uses
> the request body rather than URL params as this three-part key is awkward to express
> cleanly in a path.

View File

@@ -39,56 +39,58 @@ src/
│ ├── memory.js # HTTP client for memory service │ ├── memory.js # HTTP client for memory service
│ ├── inference.js # HTTP client for inference service │ ├── inference.js # HTTP client for inference service
│ ├── embedding.js # HTTP client for embedding service │ ├── embedding.js # HTTP client for embedding service
│ └── qdrant.js # HTTP client for Qdrant vector search │ └── qdrant.js # HTTP client for Qdrant (direct vector search)
├── chat/ ├── chat/
│ └── index.js # Core pipeline logic — context assembly and coordination │ └── index.js # Core pipeline — context assembly, isolation, auto-naming
├── routes/ ├── routes/
│ ├── chat.js # POST /chat and POST /chat/stream route handlers │ ├── chat.js # POST /chat and POST /chat/stream
│ ├── sessions.js # Session list, history, rename, and delete routes │ ├── sessions.js # Session CRUD proxy
│ ├── projects.js # Project CRUD routes — proxies to memory service │ ├── projects.js # Project CRUD proxy
│ └── models.js # GET /models — reads models.json manifest from disk │ └── models.js # GET /models — reads models.json from disk
└── index.js # Express app entry point └── index.js # Express app entry point
``` ```
The `services/` layer wraps all downstream HTTP calls in named functions, The `services/` layer wraps all downstream HTTP calls in named functions.
keeping the pipeline logic in `chat/index.js` readable and ensuring that
URL or endpoint changes have a single place to be updated. URL or endpoint changes have a single place to be updated.
## Chat Pipeline ## Chat Pipeline
Both `POST /chat` and `POST /chat/stream` share the same context assembly Both `POST /chat` and `POST /chat/stream` share the same steps. The only
steps. The only difference is how the inference response is delivered to difference is how the inference response is delivered to the client.
the client.
1. **Session resolution** — looks up the session by `externalId` in the memory ### Steps
service. If not found, auto-creates a new session. Clients can generate a
UUID for new conversations and pass it directly — no pre-creation step needed.
2. **Recent episode retrieval** — fetches the most recent episodes for the session 1. **Session resolution** — look up session by `externalId`. Auto-create if
(default: 5) from the memory service. not found. Clients generate a UUID for new conversations — no pre-creation
step needed.
3. **Semantic search**embeds the user message via the embedding service, then 2. **Project context resolution**if the session has a `project_id`, fetch
queries Qdrant for the top-5 most similar past episodes (score threshold: 0.75). the project and all its session IDs. Used to scope semantic search. See
Results are deduplicated against the recent episode set using a `Set` of IDs. `memory-isolation.md` for full behaviour.
Full episode content is fetched from the memory service by ID. This step is
non-critical — if it fails, a warning is logged and the pipeline continues with 3. **Recent episode retrieval** — fetch the most recent episodes for the
session (`RECENT_EPISODE_LIMIT`, default 5).
4. **Semantic search** — embed the user message, query Qdrant for the top-5
most similar past episodes (`SCORE_THRESHOLD` 0.75). Deduplicated against
recent episodes. Non-critical — if it fails, pipeline continues with
recency-only context. recency-only context.
4. **Prompt assembly** — combines the system prompt, semantic episodes (if any), 5. **Prompt assembly** — combine system prompt, semantic episodes, recent
recent episodes, and the current user message into a single prompt string. episodes, and user message.
5. **Inference** — sends the assembled prompt to the inference service. `/chat` 6. **Inference** — send to inference service. `/chat` awaits full response;
awaits the full response; `/chat/stream` opens an SSE connection and pipes `/chat/stream` pipes SSE chunks to the client.
chunks to the client as they arrive.
6. **Episode write** — writes the new exchange (user message + AI response) 7. **Episode write** — write the exchange back to memory. Fire-and-forget
back to the memory service as a fire-and-forget operation. For streaming, for `/chat`; awaited for `/chat/stream` to ensure the full text is
the full response text is accumulated across chunks before writing. accumulated before saving.
7. **Response** — returns the AI response, model name, session ID, and token 8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
count to the client. inference call with a naming prompt (max 20 tokens, temperature 0.3) and
write the result back as `session.name`. Fully fire-and-forget.
## Prompt Structure ### Prompt Structure
``` ```
[System prompt] [System prompt]
@@ -108,212 +110,67 @@ User: {current message}
Assistant: Assistant:
``` ```
Semantic episodes appear before recent episodes so the model encounters Semantic episodes appear before recent episodes so the model sees
long-range relevant context before the immediate conversation flow. long-range context before the immediate conversation flow.
## SSE Stream Format ## SSE Stream Format
The inference service emits chunks from the llama.cpp provider in this format: Inference service → orchestration:
``` ```
data: {"response":"Hello","done":false} data: {"response":"Hello","done":false}
data: {"response":"!","done":false} data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
data: [DONE] data: [DONE]
``` ```
The orchestration service re-emits to the client as: Orchestration client:
``` ```
data: {"text":"Hello"} data: {"text":"Hello"}
data: {"text":"!"} data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":42}
``` ```
The `[DONE]` sentinel from the inference service is consumed internally The `[DONE]` sentinel is consumed internally and not forwarded. The stream
and not forwarded. The client stream is terminated by `res.end()` after is terminated by `res.end()` after the done event.
the done event. Model name and token count are included on the done event
so the client can display them in the UI.
## Models Manifest ## Models Manifest
The `/models` endpoint reads a `models.json` file from disk at the path `GET /models` reads `models.json` fresh on each request from
specified by `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside `MODELS_MANIFEST_PATH`. The file lives on the main PC alongside model files,
the model files, and is accessible to orchestration via a network share accessible via an SMB mount at `/mnt/nexus-models`.
mounted at `/mnt/nexus-models`.
The manifest is read fresh on each request — no restart needed when models
are added or removed.
**models.json format:**
```json ```json
[ [
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" } { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
] ]
``` ```
- `value` must match the model name as reported by `llama-server` (including `.gguf` extension) `value` must match the model name as reported by `llama-server` (including
- `label` — display name shown in the UI `.gguf` extension). No service restart needed when models are added or removed.
## Endpoints ## Sessions Route Behaviour
### Health `PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
The validation guard only rejects requests where neither is provided:
| Method | Path | Description | ```js
|---|---|---| if (!name?.trim() && projectId === undefined) {
| GET | /health | Service health check — reports downstream service URLs | return res.status(400).json({ error: 'name or projectId is required' });
### Chat
| Method | Path | Description |
|---|---|---|
| POST | /chat | Send a message and receive a complete response |
| POST | /chat/stream | Send a message and receive a streaming SSE response |
### Sessions
| Method | Path | Description |
|---|---|---|
| GET | /sessions | Get paginated list of all sessions |
| GET | /sessions/:sessionId/history | Get paginated episode history for a session |
| PATCH | /sessions/:sessionId | Rename a session |
| DELETE | /sessions/:sessionId | Delete a session and all its episodes |
### Projects
Projects are proxied directly from the memory service with no transformation.
| Method | Path | Description |
|---|---|---|
| GET | /projects | Get all projects |
| POST | /projects | Create a new project |
| PATCH | /projects/:id | Update a project |
| DELETE | /projects/:id | Delete a project |
### Models
| Method | Path | Description |
|---|---|---|
| GET | /models | Get list of available models from manifest file |
---
**POST /chat**
Request body:
```json
{
"sessionId": "your-session-uuid",
"message": "Hello, my name is Tim.",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"temperature": 0.7
} }
``` ```
`model` and `temperature` are optional — fall back to inference service defaults This allows `useChat` to write project assignment separately from rename
if omitted. operations.
Response:
```json
{
"sessionId": "your-session-uuid",
"response": "Hello Tim! How can I help you today?",
"model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
"tokenCount": 87
}
```
---
**POST /chat/stream**
Same request body as `POST /chat`.
Response is a stream of Server-Sent Events:
```
data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf","tokenCount":87}
```
---
**PATCH /sessions/:sessionId**
Request body:
```json
{ "name": "My Renamed Session" }
```
Returns the updated session object. `name` is required and trimmed of whitespace.
---
**DELETE /sessions/:sessionId**
Returns `204 No Content`. Cascades to delete all episodes for the session.
---
**GET /sessions/:sessionId/history**
Query parameters:
| Parameter | Default | Description |
|---|---|---|
| limit | 20 | Maximum number of episodes to return |
| offset | 0 | Number of episodes to skip (for pagination) |
Response:
```json
{
"sessionId": "your-session-uuid",
"episodes": [
{
"id": 42,
"session_id": 1,
"user_message": "Hello, my name is Tim.",
"ai_response": "Hello Tim! How can I help you today?",
"token_count": 87,
"created_at": 1712345678,
"metadata": null
}
]
}
```
Episodes are ordered newest first.
---
**GET /models**
Returns the parsed contents of `models.json`:
```json
[
{ "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
]
```
Returns `500` if the manifest file cannot be read or parsed.
## Caddy Configuration ## Caddy Configuration
The Caddy reverse proxy on Mini PC 2 must have a handle block for each route Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
prefix the client needs to reach. Current required blocks:
``` ```
handle /chat* { handle /chat* { reverse_proxy localhost:4000 }
reverse_proxy localhost:4000 handle /sessions* { reverse_proxy localhost:4000 }
} handle /models* { reverse_proxy localhost:4000 }
handle /sessions* { handle /projects* { reverse_proxy localhost:4000 }
reverse_proxy localhost:4000
}
handle /models* {
reverse_proxy localhost:4000
}
handle /projects* {
reverse_proxy localhost:4000
}
``` ```
When adding new top-level routes to the orchestration service, add a matching After updating: `caddy reload --config /path/to/Caddyfile`
block here and reload Caddy: `caddy reload --config /path/to/Caddyfile`
For all HTTP endpoints, see `api-routes.md`.