Compare commits
43 Commits
588e8395f8
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
e4908193bd | ||
|
|
b58a4e4692 | ||
|
|
055683424d | ||
|
|
27ad614130 | ||
|
|
8ade5c68ca | ||
|
|
49982a85de | ||
|
|
9c6c5c9a42 | ||
|
|
c9cbac87ac | ||
|
|
1a97b19280 | ||
|
|
9fe8e568cf | ||
|
|
5ad01c6ad8 | ||
|
|
aac0923351 | ||
|
|
54218894c0 | ||
|
|
66a95f4479 | ||
| 78476e166f | |||
|
|
696ead29f8 | ||
|
|
45db47a584 | ||
|
|
095c9a623e | ||
|
|
f5011fddca | ||
|
|
86e78cc4c6 | ||
|
|
c86b565eed | ||
|
|
be1c38b654 | ||
|
|
4f3b18de08 | ||
|
|
43fa12899c | ||
|
|
84f01ef209 | ||
|
|
a50a748bcf | ||
|
|
32e8a83233 | ||
|
|
855de6d0af | ||
|
|
fcaf0e651f | ||
|
|
6cdee72af2 | ||
|
|
4c6bd1df2d | ||
|
|
2429fedb2c | ||
|
|
bdc5947fcb | ||
|
|
785047a824 | ||
|
|
acda21317b | ||
|
|
32365e67f4 | ||
|
|
59918d5733 | ||
|
|
01f35b7b82 | ||
|
|
21a7e5f3b5 | ||
|
|
c81a1cb20e | ||
|
|
781bf8a615 | ||
|
|
b44d35e7cb | ||
|
|
22686fca3c |
1
.gitignore
vendored
1
.gitignore
vendored
@@ -5,4 +5,5 @@ data/
|
|||||||
.env
|
.env
|
||||||
.env.*
|
.env.*
|
||||||
*.db
|
*.db
|
||||||
|
.claude/settings.local.json
|
||||||
EOF
|
EOF
|
||||||
2
.vscode/settings.json
vendored
2
.vscode/settings.json
vendored
@@ -1,2 +0,0 @@
|
|||||||
{
|
|
||||||
}
|
|
||||||
108
CLAUDE.md
Normal file
108
CLAUDE.md
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Development Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start individual services
|
||||||
|
npm run memory # Memory Service (port 3002)
|
||||||
|
npm run embedding # Embedding Service (port 3003)
|
||||||
|
npm run inference # Inference Service (port 3001)
|
||||||
|
npm run orchestration # Orchestration Service (port 4000)
|
||||||
|
npm run mini1 # Start memory + embedding concurrently
|
||||||
|
|
||||||
|
# Per-service dev mode (with --watch)
|
||||||
|
npm -w packages/<service-name> run dev
|
||||||
|
|
||||||
|
# Chat client
|
||||||
|
npm -w packages/chat-client run dev # Vite dev server (port 5173)
|
||||||
|
npm -w packages/chat-client run build # Production build
|
||||||
|
```
|
||||||
|
|
||||||
|
No test framework or linter is configured.
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
NexusAI is a **modular AI assistant** with persistent, project-scoped memory. It's a Node.js monorepo (`npm workspaces`) with 4 independent backend services, 1 React frontend, and 1 shared package.
|
||||||
|
|
||||||
|
### Services
|
||||||
|
|
||||||
|
| Package | Port | Role |
|
||||||
|
|---|---|---|
|
||||||
|
| `orchestration-service` | 4000 | Central gateway; coordinates all others |
|
||||||
|
| `memory-service` | 3002 | SQLite + Qdrant hybrid memory |
|
||||||
|
| `embedding-service` | 3003 | Text embeddings via Ollama (`nomic-embed-text`, 768-dim) |
|
||||||
|
| `inference-service` | 3001 | LLM inference (Ollama or llama.cpp) |
|
||||||
|
| `chat-client` | 5173 | React/Vite frontend |
|
||||||
|
| `shared` | — | Constants, env helpers, logger, formatters |
|
||||||
|
|
||||||
|
All inter-service communication is **REST HTTP only** — no message queues or WebSockets.
|
||||||
|
|
||||||
|
### Chat Request Flow
|
||||||
|
|
||||||
|
1. Client POSTs to orchestration `/chat/stream`
|
||||||
|
2. Orchestration resolves session, fetches **recent episodes** (SQLite) + **semantic episodes** (Qdrant vector search) + **entities** (Qdrant, scoped by project)
|
||||||
|
3. Embedding computed for user message (embedding-service)
|
||||||
|
4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
|
||||||
|
5. Inference streams response (inference-service)
|
||||||
|
6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
|
||||||
|
7. Entity extraction triggered async (qwen2.5:3b via inference-service)
|
||||||
|
8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
|
||||||
|
9. Auto-naming on first message (temp 0.3, 20 tokens max)
|
||||||
|
|
||||||
|
### Memory Model
|
||||||
|
|
||||||
|
**Dual store — neither works alone:**
|
||||||
|
- **SQLite** (`better-sqlite3`, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
|
||||||
|
- **Qdrant** — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
|
||||||
|
|
||||||
|
Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
|
||||||
|
|
||||||
|
**Project-scoped isolation:** Sessions grouped into projects; Qdrant queries use `should` filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
|
||||||
|
|
||||||
|
### Key File Locations
|
||||||
|
|
||||||
|
**Orchestration** (`packages/orchestration-service/src/`):
|
||||||
|
- `chat/index.js` — Core prompt building and memory assembly
|
||||||
|
- `routes/` — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
|
||||||
|
- `services/` — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
|
||||||
|
- `config/settings.js` — Loads/saves `data/settings.json` (user-tunable: model params, thresholds, system prompt)
|
||||||
|
|
||||||
|
**Memory** (`packages/memory-service/src/`):
|
||||||
|
- `db/schema.js` — SQLite table definitions (source of truth for data model)
|
||||||
|
- `episodic/` — Episode CRUD
|
||||||
|
- `semantic/` — Qdrant operations
|
||||||
|
- `entities/` — Entity extraction + CRUD
|
||||||
|
- `summarization/` — Project summary generation
|
||||||
|
|
||||||
|
**Shared** (`packages/shared/src/`):
|
||||||
|
- `config/constants.js` — All tunables (ports, thresholds, model names, vector size)
|
||||||
|
- `config/env.js` — `getEnv()` helper with fallback to constants
|
||||||
|
- `utils.js` — `parseRow()`, `formatEpisodeText()`, `logger`
|
||||||
|
|
||||||
|
**Frontend** (`packages/chat-client/src/`):
|
||||||
|
- `App.jsx` — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
|
||||||
|
- `hooks/` — `useChat`, `useSession`, `useModels`, `useProjects`, `useSettings`, `useContextMenu`
|
||||||
|
- `api/orchestration.js` — Fetch wrapper for all API calls
|
||||||
|
- Vite proxy points to `192.168.0.205:4000` (Mini PC 2 / orchestration)
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Each service uses `.env` via `dotenv`, falling back to `packages/shared/src/config/constants.js`. The orchestration service also serves `data/settings.json` to the frontend via `/settings` — this is the single source of truth for user-facing inference parameters and system prompt.
|
||||||
|
|
||||||
|
### Deployment
|
||||||
|
|
||||||
|
Home lab across 3 nodes, managed with Docker Compose:
|
||||||
|
- **Main PC** — RTX A4000 (inference via llama.cpp)
|
||||||
|
- **Mini PC 1** — memory + embedding services, Qdrant, Ollama
|
||||||
|
- **Mini PC 2** — orchestration + chat client, Caddy reverse proxy + Authelia SSO
|
||||||
|
|
||||||
|
Docker Compose files: `docker-compose.mini1.yml`, `docker-compose.mini2.yml`. All services expose `/health`. Deployment docs: `docs/deployment/homelab.md`.
|
||||||
|
|
||||||
|
## Key Development Principles
|
||||||
|
|
||||||
|
- **Layer-by-layer validation** — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
|
||||||
|
- **New orchestration routes require changes in four places**: route file, `orchestration-service/src/index.js`, Caddyfile on Mini PC 2 (`192.168.0.205`), and `vite.config.js` in the chat client
|
||||||
|
- **All services read settings on every request** — no restart required for config changes
|
||||||
|
- **Backend-first development** — data layer → service endpoints → orchestration proxy → frontend
|
||||||
@@ -73,8 +73,8 @@ service by ID after the vector search.
|
|||||||
|
|
||||||
The core four-service architecture is complete and operational. Key capabilities:
|
The core four-service architecture is complete and operational. Key capabilities:
|
||||||
|
|
||||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
- **Retrieval fusion** — Reciprocal Rank Fusion (RRF) merges semantic (Qdrant vector search) and keyword (SQLite FTS5) episode retrieval into a single ranked result set. Weights are configurable per strategy via settings; keyword search is off by default (`keywordWeight: 0`) and can be enabled without a service restart
|
||||||
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
|
- **Entity layer + Knowledge graph** — automatic extraction of named entities and relationships from conversations via qwen2.5:3b. Entities and relationships are stored in SQLite with `mention_count` tracking. A graph traversal layer expands Qdrant entity search hits into a 1-hop neighborhood subgraph, injecting structured connected knowledge into every prompt
|
||||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||||
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||||
|
|||||||
@@ -120,6 +120,38 @@ all projects use isolated memory. Returns `201` with the created project object.
|
|||||||
|
|
||||||
Only provided fields are updated — omitted fields are not touched.
|
Only provided fields are updated — omitted fields are not touched.
|
||||||
|
|
||||||
|
### Summaries
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /summaries/session/:sessionId | Get all summaries for a session (by external UUID) |
|
||||||
|
| GET | /summaries/project/:projectId | Get all summaries for a project |
|
||||||
|
|
||||||
|
**GET /summaries/session/:sessionId** — resolves the external UUID to an
|
||||||
|
internal session ID, then fetches summaries from the memory service.
|
||||||
|
Returns an array of summary objects ordered by `created_at` ascending.
|
||||||
|
|
||||||
|
**GET /summaries/project/:projectId** — proxies directly to the memory
|
||||||
|
service project summaries endpoint.
|
||||||
|
|
||||||
|
**Summary object shape:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": 8,
|
||||||
|
"session_id": 72,
|
||||||
|
"project_id": null,
|
||||||
|
"content": "The user asked about...",
|
||||||
|
"token_count": 579,
|
||||||
|
"episode_range": "246-251",
|
||||||
|
"created_at": 1776766518,
|
||||||
|
"updated_at": 1776766518
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
> **Proxy requirement:** `/summaries` must be added to both the Caddyfile
|
||||||
|
> reverse proxy and the Vite dev proxy config alongside the other route
|
||||||
|
> prefixes. See `orchestration-service.md` for the Caddy block pattern.
|
||||||
|
|
||||||
### Models
|
### Models
|
||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
@@ -170,7 +202,9 @@ Returns `503` if llama-server is unreachable.
|
|||||||
|---|---|---|---|
|
|---|---|---|---|
|
||||||
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | integer | 1–20 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
| `semanticLimit` | integer | 1–20 | Max semantic search results |
|
||||||
| `scoreThreshold` | float | 0–1 | Minimum similarity score |
|
| `scoreThreshold` | float | 0–1 | Minimum similarity score for Qdrant results |
|
||||||
|
| `semanticWeight` | float | 0–5 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | float | 0–5 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
| `modelsFolderPath` | string | — | Path to folder containing .gguf files |
|
||||||
| `temperature` | float | 0–2 | Inference randomness |
|
| `temperature` | float | 0–2 | Inference randomness |
|
||||||
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
| `repeatPenalty` | float | 1–2 | Repeat token penalty |
|
||||||
@@ -269,6 +303,29 @@ Both fields are optional. Only provided fields are updated.
|
|||||||
|
|
||||||
Same request/response shape as orchestration `/projects` above.
|
Same request/response shape as orchestration `/projects` above.
|
||||||
|
|
||||||
|
### Summaries
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /summaries | Create a new summary |
|
||||||
|
| GET | /sessions/:id/summaries | Get all summaries for a session (internal ID) |
|
||||||
|
| GET | /projects/:id/summaries | Get all summaries for a project |
|
||||||
|
| PATCH | /summaries/:id | Update a summary (content, tokenCount, episodeRange) |
|
||||||
|
| DELETE | /summaries/:id | Delete a summary |
|
||||||
|
|
||||||
|
**POST /summaries — body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"sessionId": 72,
|
||||||
|
"content": "The user discussed...",
|
||||||
|
"tokenCount": 579,
|
||||||
|
"episodeRange": "246-251"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
`content` is required. Either `sessionId` or `projectId` is required.
|
||||||
|
|
||||||
|
**PATCH /summaries/:id — body:** any subset of `content`, `tokenCount`, `episodeRange`.
|
||||||
|
|
||||||
### Entities
|
### Entities
|
||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
@@ -305,13 +362,34 @@ Same request/response shape as orchestration `/projects` above.
|
|||||||
|
|
||||||
**DELETE /relationships — body:**
|
**DELETE /relationships — body:**
|
||||||
```json
|
```json
|
||||||
{ "fromId": 1, "toId": 2, "label": "uses" }
|
{ "fromId": 1, "toId": 2, "label": "works_on", "notes": "Alice is the primary developer.", "metadata": {} }
|
||||||
```
|
```
|
||||||
|
notes is optional. label should be a snake_case verb. Relationship is identified by the composite key (fromId, toId, label) — re-submitting with the same key increments mention_count and preserves existing notes if the new value is null.
|
||||||
|
|
||||||
Relationships are identified by the composite key `(fromId, toId, label)`.
|
Relationships are identified by the composite key `(fromId, toId, label)`.
|
||||||
Delete uses request body rather than URL params since this three-part key
|
Delete uses request body rather than URL params since this three-part key
|
||||||
is awkward to encode in a path.
|
is awkward to encode in a path.
|
||||||
|
|
||||||
|
### Graph
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | /graph/neighborhood/:entityId | Entity neighborhood — nodes + edges within N hops |
|
||||||
|
| POST | /graph/neighbors | Bulk 1-hop neighborhood for a set of entity IDs |
|
||||||
|
|
||||||
|
**GET /graph/neighborhood/:entityId — query params:**
|
||||||
|
|
||||||
|
| Param | Default | Max | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| depth | 1 | 3 | Traversal depth |
|
||||||
|
|
||||||
|
Returns `{ entity, neighborhood: { nodes, edges } }`. Returns `404` if entity not found.
|
||||||
|
|
||||||
|
**POST /graph/neighbors — body:**
|
||||||
|
```json
|
||||||
|
{ "entityIds": [5, 8, 12] }
|
||||||
|
Returns { nodes: [...], edges: [...] }. Used internally by orchestration — not a client-facing endpoint.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Embedding Service — port 3003
|
## Embedding Service — port 3003
|
||||||
|
|||||||
228
docs/roadmap.md
Normal file
228
docs/roadmap.md
Normal file
@@ -0,0 +1,228 @@
|
|||||||
|
# NexusAI — Master Roadmap
|
||||||
|
|
||||||
|
> A modular, memory-centric AI assistant and personal second brain.
|
||||||
|
> Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
|
||||||
|
> Repo: `https://gitea.jellystorm.com/storme/nexusAI`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current State (Completed)
|
||||||
|
|
||||||
|
### Backend — Core Four Services
|
||||||
|
- ✅ **Shared package** — `getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`)
|
||||||
|
- ✅ **Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
|
||||||
|
- ✅ **Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch`
|
||||||
|
- ✅ **Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE)
|
||||||
|
- ✅ **Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back
|
||||||
|
|
||||||
|
### Memory System
|
||||||
|
- ✅ Episodic memory — full conversation history in SQLite
|
||||||
|
- ✅ Semantic memory — Qdrant vector search across episodes and entities
|
||||||
|
- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
|
||||||
|
- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates
|
||||||
|
- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions
|
||||||
|
|
||||||
|
### Chat Client
|
||||||
|
- ✅ React/Vite frontend served via Caddy
|
||||||
|
- ✅ Sidebar navigation — recent chats, projects, settings
|
||||||
|
- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView
|
||||||
|
- ✅ Session management — auto-naming, project assignment, SessionModal
|
||||||
|
- ✅ Streaming chat interface — SSE token-by-token rendering
|
||||||
|
- ✅ Memory viewer — episode browsing, deletion, health panel
|
||||||
|
- ✅ Settings panel — models section, configuration
|
||||||
|
|
||||||
|
### Infrastructure
|
||||||
|
- ✅ Caddy reverse proxy with Authelia SSO
|
||||||
|
- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
|
||||||
|
- ✅ npm workspaces monorepo
|
||||||
|
- ✅ Gitea self-hosted repo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Loose Ends & Stability - COMPLETE ✅
|
||||||
|
*Target: Next development session (Saturday)*
|
||||||
|
|
||||||
|
### Bug Fixes
|
||||||
|
✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
||||||
|
✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
||||||
|
|
||||||
|
### Tech Debt
|
||||||
|
✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
||||||
|
✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
||||||
|
✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
||||||
|
✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Memory System Upgrades
|
||||||
|
*The core intelligence layer*
|
||||||
|
|
||||||
|
### 1. Knowledge Graph (SQLite) ✅
|
||||||
|
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
|
||||||
|
- [x] Graph schema — `nodes` and `edges` tables with typed relationships
|
||||||
|
- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
|
||||||
|
- [x] Relationship traversal queries
|
||||||
|
- [x] Graph-aware context assembly in orchestration
|
||||||
|
|
||||||
|
### 2. Retrieval Fusion + Full-Text Search ✅
|
||||||
|
Multi-strategy retrieval merged into a single ranked result set.
|
||||||
|
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||||
|
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
|
||||||
|
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
|
||||||
|
|
||||||
|
### 3. Memory Consolidation Lifecycle
|
||||||
|
Prevents long-term memory degradation and enables compression.
|
||||||
|
- [ ] Episode aging — score/weight episodes by recency and access frequency
|
||||||
|
- [ ] Consolidation pass — merge related low-weight episodes into summary nodes
|
||||||
|
- [ ] Orphan cleanup — remove entities no longer referenced by active episodes
|
||||||
|
|
||||||
|
### 4. User Preference Model
|
||||||
|
Automatically maintained profile injected into every system prompt.
|
||||||
|
- [ ] Preference schema — communication style, interests, known facts, tone preferences
|
||||||
|
- [ ] Auto-update from conversation history
|
||||||
|
- [ ] Manual override / review UI
|
||||||
|
|
||||||
|
### 5. Confidence-Based Routing *(inspired by acid2lake)*
|
||||||
|
Short-circuit simple requests before they reach the LLM.
|
||||||
|
- [ ] Intent classifier in orchestration — categorise incoming messages
|
||||||
|
- [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
|
||||||
|
- [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls
|
||||||
|
|
||||||
|
### 6. Smarter Context Assembly *(inspired by acid2lake)*
|
||||||
|
Budget-aware context selection instead of dumping all relevant memory into the prompt.
|
||||||
|
- [ ] Token budget manager in orchestration
|
||||||
|
- [ ] Priority scoring — recency × relevance × entity weight
|
||||||
|
- [ ] Configurable context budget via env var
|
||||||
|
|
||||||
|
### 7. Procedural Memory Store *(inspired by acid2lake)*
|
||||||
|
Learns "how NexusAI has successfully handled this type of request before."
|
||||||
|
- [ ] Procedural memory schema — trigger pattern, steps, success count, confidence
|
||||||
|
- [ ] Auto-population from successful interaction traces
|
||||||
|
- [ ] Procedural context injection for matched request types
|
||||||
|
|
||||||
|
### 8. Reflection / Self-Summarization
|
||||||
|
NexusAI periodically reviews and synthesises its own memory.
|
||||||
|
- [ ] Scheduled reflection pass — background job, configurable interval
|
||||||
|
- [ ] Cross-session insight extraction
|
||||||
|
- [ ] Summary nodes written back to knowledge graph
|
||||||
|
- *Requires: Knowledge graph + consolidation lifecycle*
|
||||||
|
|
||||||
|
### 9. Proactive Agent Loop
|
||||||
|
The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.
|
||||||
|
- [ ] Tool calling framework in orchestration
|
||||||
|
- [ ] Built-in tools — memory search, entity lookup, summarize, web fetch
|
||||||
|
- [ ] Reasoning loop — think → act → observe → respond
|
||||||
|
- [ ] Agent mode toggle per session
|
||||||
|
- *Requires: All Phase 2 items above*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — Client Features
|
||||||
|
*Making the daily driver experience excellent*
|
||||||
|
|
||||||
|
### Core Chat Enhancements
|
||||||
|
- [ ] Message regeneration — re-roll last AI response
|
||||||
|
- [ ] Edit & resend — edit a previous message, clear subsequent history
|
||||||
|
- [ ] Copy message button — hover icon per message
|
||||||
|
- [ ] Message timestamps — subtle, toggleable
|
||||||
|
- [ ] Token count display — per-response usage indicator
|
||||||
|
|
||||||
|
### Memory Visibility
|
||||||
|
- [ ] **"What I remember" panel** — show which episodes/entities were injected into context
|
||||||
|
- [ ] Memory pinning — mark episodes as always-include
|
||||||
|
- [x] Session summary view — on-demand or auto-generated session summary
|
||||||
|
- [ ] Memory attribution — subtle indicator on responses that were memory-informed
|
||||||
|
|
||||||
|
### Session & Project Management
|
||||||
|
- [ ] Session search — full-text search across all sessions
|
||||||
|
- [ ] Session tagging — freeform tags beyond project assignment
|
||||||
|
- [ ] Session export — download as markdown or JSON
|
||||||
|
- [ ] Pinned sessions — pin frequently used sessions to sidebar top
|
||||||
|
- [ ] Bulk session actions — delete, move to project
|
||||||
|
|
||||||
|
### Model & Persona Controls *(high priority — circles back to companion origins)*
|
||||||
|
- [ ] Per-session model switching — override default model per session
|
||||||
|
- [x] System prompt editor — per-project custom prompts
|
||||||
|
- [ ] System prompt editor — per-session custom prompts
|
||||||
|
- [ ] Persona profiles — saved configurations (model + system prompt + temperature)
|
||||||
|
- Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
|
||||||
|
- [ ] Temperature / parameter sliders — collapsible panel for power users
|
||||||
|
|
||||||
|
### Second Brain Features
|
||||||
|
- [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat
|
||||||
|
- [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships
|
||||||
|
- [ ] Memory search page — dedicated search UI across all episodes and entities
|
||||||
|
- [ ] Daily digest — generated summary of recent activity and learned facts
|
||||||
|
|
||||||
|
### Quality of Life
|
||||||
|
- [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send
|
||||||
|
- [ ] Dark/light theme toggle
|
||||||
|
- [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs
|
||||||
|
- [ ] Notification support — browser notifications for long completions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Coding Copilot
|
||||||
|
*After core is feature-complete*
|
||||||
|
|
||||||
|
### Project Directory Awareness
|
||||||
|
- [ ] Directory watcher service — monitors a VS Code workspace for changes
|
||||||
|
- [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
|
||||||
|
- [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save
|
||||||
|
- [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag
|
||||||
|
|
||||||
|
### Coding-Specific Memory
|
||||||
|
- [ ] Procedural patterns per language/framework — stored in procedural memory layer
|
||||||
|
- [ ] Skill compilation — successful coding solutions abstracted into reusable patterns
|
||||||
|
- [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Stretch Goals
|
||||||
|
|
||||||
|
### Voice Layer
|
||||||
|
- [ ] TTS output — text-to-speech for AI responses
|
||||||
|
- [ ] STT input — speech-to-text for voice messages
|
||||||
|
- [ ] Hardware-dependent — deferred until appropriate hardware available
|
||||||
|
- *Architecturally clean addition — new input/output modality only*
|
||||||
|
|
||||||
|
### Homelab Enhancements
|
||||||
|
- [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data
|
||||||
|
- [ ] Security hardening — network segmentation, service-level auth
|
||||||
|
- [ ] IP webcam integration
|
||||||
|
- [ ] Home Assistant integration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Reference
|
||||||
|
|
||||||
|
### Services & Nodes
|
||||||
|
|
||||||
|
| Service | Host | Port | Role |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` |
|
||||||
|
| Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD |
|
||||||
|
| Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation |
|
||||||
|
| Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections |
|
||||||
|
| Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management |
|
||||||
|
| Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy |
|
||||||
|
| Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO |
|
||||||
|
|
||||||
|
### Primary Models
|
||||||
|
|
||||||
|
| Role | Model | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical |
|
||||||
|
| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | |
|
||||||
|
| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | |
|
||||||
|
| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization |
|
||||||
|
|
||||||
|
### Key Design Principles
|
||||||
|
- **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer
|
||||||
|
- **Fire-and-forget async** — embedding and entity extraction never block the chat response
|
||||||
|
- **All services read settings on every request** — no restart required for config changes
|
||||||
|
- **Backend-first development** — data layer → endpoints → orchestration proxy → frontend
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last updated: April 2026*
|
||||||
140
docs/services/entity-extraction.md
Normal file
140
docs/services/entity-extraction.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
# Entity Extraction
|
||||||
|
|
||||||
|
**Location:** `packages/memory-service/src/entities/extraction.js`
|
||||||
|
**Triggered by:** Episode creation (`POST /episodes`)
|
||||||
|
**Model:** `qwen2.5:3b` via Ollama (configurable via `EXTRACTION_MODEL` env var)
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
After each episode is saved to SQLite, the extraction pipeline runs
|
||||||
|
asynchronously in the background to identify named entities and the
|
||||||
|
relationships between them. Results are written back to SQLite and
|
||||||
|
embedded into Qdrant — the episode response is never delayed.
|
||||||
|
|
||||||
|
## Trigger
|
||||||
|
|
||||||
|
`createEpisode()` in `episodic/index.js` calls `extractAndStoreEntities()`
|
||||||
|
immediately after the SQLite insert, without awaiting it:
|
||||||
|
|
||||||
|
```js
|
||||||
|
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||||
|
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||||
|
```
|
||||||
|
|
||||||
|
If extraction throws, the episode is unaffected — the error is logged and
|
||||||
|
swallowed.
|
||||||
|
|
||||||
|
## Model Settings
|
||||||
|
|
||||||
|
| Setting | Value | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Model | `qwen2.5:3b` | Ollama, configurable via `EXTRACTION_MODEL` |
|
||||||
|
| Temperature | 0.1 | Low for consistent, deterministic output |
|
||||||
|
| `num_predict` | 1500 | Higher ceiling to accommodate entity + relationship JSON |
|
||||||
|
| `format` | `'json'` | Ollama constrained decoding — enforces valid JSON output |
|
||||||
|
| Prompt format | ChatML | `<\|im_start\|>` / `<\|im_end\|>` tokens |
|
||||||
|
|
||||||
|
## Prompt Structure
|
||||||
|
|
||||||
|
The prompt is built by `buildExtractionPrompt()`. It includes:
|
||||||
|
|
||||||
|
1. **System message** — declares the model's role as an entity and relationship extractor
|
||||||
|
2. **Instructions** — entity types, field rules, relationship label format, required JSON schema
|
||||||
|
3. **Known entities block** — last 20 entities from SQLite, by `rowid DESC`, used to encourage consistent name/type pairs across conversations
|
||||||
|
4. **Conversation** — the raw user message and AI response, delimited clearly
|
||||||
|
|
||||||
|
```
|
||||||
|
<|im_start|>system
|
||||||
|
You are a named entity and relationship extractor. You output only valid JSON.
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>user
|
||||||
|
Read the conversation below and extract all named entities and the relationships between them.
|
||||||
|
Entity types: person, place, project, technology, concept, organization
|
||||||
|
...
|
||||||
|
Return this exact JSON structure:
|
||||||
|
{ "entities": [...], "relationships": [...] }
|
||||||
|
|
||||||
|
Already known entities (use these exact name and type values if the same entity appears):
|
||||||
|
- "NexusAI" (project)
|
||||||
|
- "Alice" (person)
|
||||||
|
|
||||||
|
--- CONVERSATION ---
|
||||||
|
User: ...
|
||||||
|
Assistant: ...
|
||||||
|
--- END CONVERSATION ---
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
## Expected JSON Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entities": [
|
||||||
|
{ "name": "Alice", "type": "person", "notes": "Software engineer working on NexusAI." },
|
||||||
|
{ "name": "NexusAI", "type": "project", "notes": "A modular AI assistant with persistent memory." }
|
||||||
|
],
|
||||||
|
"relationships": [
|
||||||
|
{
|
||||||
|
"from": "Alice", "fromType": "person",
|
||||||
|
"to": "NexusAI", "toType": "project",
|
||||||
|
"label": "works_on",
|
||||||
|
"notes": "Alice is the primary developer."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Relationship labels use **snake_case verbs** (e.g. `works_on`, `manages`, `uses`,
|
||||||
|
`knows`, `located_in`, `part_of`, `created_by`).
|
||||||
|
|
||||||
|
## JSON Parsing
|
||||||
|
|
||||||
|
The raw model response is matched with `/\{[\s\S]*\}/` before parsing — this
|
||||||
|
tolerates any preamble or trailing prose the model emits alongside the JSON.
|
||||||
|
If the match fails or `JSON.parse` throws, the function logs a warning and
|
||||||
|
returns without writing anything.
|
||||||
|
|
||||||
|
## Entity Processing
|
||||||
|
|
||||||
|
For each entity in `parsed.entities`:
|
||||||
|
|
||||||
|
1. Validate `name`, `type` (must be in `ENTITY_TYPES`), and not in `IGNORED_NAMES`
|
||||||
|
2. Call `upsertEntity(name, type, notes)`:
|
||||||
|
- **Insert**: creates new row with `mention_count = 1`, `source = 'extraction'`
|
||||||
|
- **Conflict** on `(name, type)`: increments `mention_count`, updates `last_seen_at`, preserves existing `notes` if new extraction returns null
|
||||||
|
3. Add to `entityMap` keyed by `"${name}::${type}"` — used for relationship resolution below
|
||||||
|
4. Call `linkEntityToEpisode(entity.id, episodeId)` — writes to `entity_episodes` join table
|
||||||
|
5. Fire-and-forget: embed as `"${name} (${type}): ${notes}"` → store to Qdrant `entities` collection with `{ name, type, notes, projectId }` in payload
|
||||||
|
|
||||||
|
**Valid entity types:** `person`, `place`, `project`, `technology`, `concept`, `organization`
|
||||||
|
|
||||||
|
**Stoplist (ignored names):** `good morning`, `good night`, `hello`, `goodbye`, `thanks`, `thank you`
|
||||||
|
|
||||||
|
## Relationship Processing
|
||||||
|
|
||||||
|
After all entities are saved, relationships are processed:
|
||||||
|
|
||||||
|
1. For each entry in `parsed.relationships`, look up both endpoints in `entityMap` using `"${from}::${fromType}"` and `"${to}::${toType}"` as keys
|
||||||
|
2. If either endpoint is missing (filtered out, invalid type, or not in this extraction), the relationship is silently skipped
|
||||||
|
3. Call `upsertRelationship(fromId, toId, label, notes)`:
|
||||||
|
- **Insert**: creates new row with `mention_count = 1`
|
||||||
|
- **Conflict** on `(from_id, to_id, label)`: increments `mention_count`, preserves existing `notes` if new is null
|
||||||
|
|
||||||
|
Relationships are unidirectional in storage. Bidirectionality is handled at
|
||||||
|
query time by the graph traversal layer.
|
||||||
|
|
||||||
|
## Project Scoping
|
||||||
|
|
||||||
|
`projectId` is threaded through from the episode creation call. It is stored
|
||||||
|
in the Qdrant entity payload, which enables project-scoped entity search in
|
||||||
|
orchestration. SQLite entities and relationships are global — scoping only
|
||||||
|
applies at the Qdrant retrieval layer.
|
||||||
|
|
||||||
|
## Error Behaviour
|
||||||
|
|
||||||
|
All steps after the initial model call are wrapped in a single outer try/catch.
|
||||||
|
If Ollama is unreachable, returns a non-200 status, or the JSON cannot be
|
||||||
|
parsed, the function logs at `warn` level and returns. There is no retry logic.
|
||||||
|
Individual entity embedding failures are caught per-entity and logged at `warn`
|
||||||
|
level without affecting other entities in the same batch.
|
||||||
213
docs/services/knowledge-graph.md
Normal file
213
docs/services/knowledge-graph.md
Normal file
@@ -0,0 +1,213 @@
|
|||||||
|
# Knowledge Graph
|
||||||
|
|
||||||
|
**Location:** `packages/memory-service/src/graph/index.js`
|
||||||
|
**Schema additions:** `entity_episodes` table; new columns on `entities` and `relationships`
|
||||||
|
**Exposed via:** `GET /graph/neighborhood/:entityId`, `POST /graph/neighbors`
|
||||||
|
**Consumed by:** Orchestration service context assembly
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
The knowledge graph transforms NexusAI from "remembers conversations" to
|
||||||
|
"understands relationships between things." Rather than injecting a flat
|
||||||
|
list of entity facts into every prompt, orchestration now retrieves a
|
||||||
|
1-hop subgraph of connected entities and their relationships, giving the
|
||||||
|
model structured, linked knowledge about people, projects, technologies,
|
||||||
|
and concepts that have appeared across conversations.
|
||||||
|
|
||||||
|
## Schema
|
||||||
|
|
||||||
|
### `entity_episodes` (join table)
|
||||||
|
|
||||||
|
Tracks which episodes contributed to each entity's knowledge. Defined in
|
||||||
|
`schema.js` — exists on all installs.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||||
|
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||||
|
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (entity_id, episode_id)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Both FKs cascade on delete — removing an entity or episode automatically
|
||||||
|
cleans up its join rows.
|
||||||
|
|
||||||
|
### New columns on `entities`
|
||||||
|
|
||||||
|
Added via migration in `db/index.js`:
|
||||||
|
|
||||||
|
| Column | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `mention_count` | INTEGER | 1 | How many times this entity has been extracted across conversations |
|
||||||
|
| `confidence` | REAL | 1.0 | Reserved for future confidence scoring |
|
||||||
|
| `source` | TEXT | `'extraction'` | `'extraction'` (auto) or `'manual'` |
|
||||||
|
| `last_seen_at` | INTEGER | NULL | Unix timestamp of most recent extraction hit |
|
||||||
|
|
||||||
|
### New columns on `relationships`
|
||||||
|
|
||||||
|
| Column | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `mention_count` | INTEGER | 1 | How many times this edge has been extracted |
|
||||||
|
| `notes` | TEXT | NULL | Relationship context sentence from extraction |
|
||||||
|
|
||||||
|
## Entity Promotion Model
|
||||||
|
|
||||||
|
Entities are not created equal — some are mentioned once in passing, others
|
||||||
|
recur across many conversations. `mention_count` is the signal:
|
||||||
|
|
||||||
|
- Every time `upsertEntity` is called for an existing `(name, type)` pair, `mention_count` is incremented and `last_seen_at` is updated.
|
||||||
|
- `ENTITIES.PROMOTION_THRESHOLD` (default: **3**) is the `mention_count` at which an entity is considered "well-established" — referenced in the codebase for future filtering and scoring logic.
|
||||||
|
- Currently `mention_count` is stored and incremented but not yet used to gate retrieval. It provides the foundation for future features such as orphan cleanup (entities never re-extracted) and confidence-weighted graph traversal.
|
||||||
|
|
||||||
|
The same pattern applies to relationships — `mention_count` rises each time
|
||||||
|
the same `(from_id, to_id, label)` triple is extracted.
|
||||||
|
|
||||||
|
## Graph Traversal
|
||||||
|
|
||||||
|
`src/graph/index.js` exports two functions built on SQLite's `WITH RECURSIVE`
|
||||||
|
CTE support. No external graph database is needed.
|
||||||
|
|
||||||
|
### `getNeighborhood(entityId, depth)`
|
||||||
|
|
||||||
|
Traverses the graph from a single entity, following edges in **both directions**,
|
||||||
|
up to `depth` hops. Returns `{ nodes: [...entities], edges: [...relationships] }`.
|
||||||
|
|
||||||
|
Default depth: `ENTITIES.GRAPH_HOP_DEPTH` (1). Maximum enforced at HTTP layer: 3.
|
||||||
|
|
||||||
|
**SQLite query:**
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||||
|
SELECT ?, 0
|
||||||
|
UNION
|
||||||
|
SELECT
|
||||||
|
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||||
|
t.depth + 1
|
||||||
|
FROM relationships r
|
||||||
|
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||||
|
WHERE t.depth < ?
|
||||||
|
)
|
||||||
|
SELECT DISTINCT entity_id FROM traverse
|
||||||
|
```
|
||||||
|
|
||||||
|
`UNION` (not `UNION ALL`) eliminates duplicate visits and naturally handles
|
||||||
|
cycles — a node already in the traversal set is not re-visited.
|
||||||
|
|
||||||
|
After collecting node IDs, two follow-up queries fetch:
|
||||||
|
- All entity rows for those IDs
|
||||||
|
- All relationship rows where both `from_id` and `to_id` are in the node set
|
||||||
|
|
||||||
|
This ensures edges between neighbors are included even if they aren't on the
|
||||||
|
traversal path from the seed.
|
||||||
|
|
||||||
|
### `getEntityNeighbors(entityIds[])`
|
||||||
|
|
||||||
|
Bulk 1-hop version designed for orchestration. Given multiple seed entity IDs
|
||||||
|
(the results of Qdrant semantic search), returns the combined 1-hop subgraph.
|
||||||
|
|
||||||
|
1. Finds all neighbor IDs via one query using `IN (...)` on both `from_id` and `to_id`
|
||||||
|
2. Deduplicates seeds + neighbors using a JavaScript `Set`
|
||||||
|
3. Fetches all entity rows and all relationship rows within the combined node set
|
||||||
|
|
||||||
|
This is intentionally simpler than the recursive version — orchestration always
|
||||||
|
uses depth=1, and the bulk query avoids N separate CTE calls.
|
||||||
|
|
||||||
|
## Graph-Aware Context Assembly
|
||||||
|
|
||||||
|
Orchestration's `assembleContext` (in `src/chat/index.js`) integrates the
|
||||||
|
graph at step 7 of the chat pipeline:
|
||||||
|
|
||||||
|
1. Qdrant entity search returns up to `ORCHESTRATION.ENTITIES_LIMIT` results, each including `r.id` (the SQLite entity ID) alongside the Qdrant payload
|
||||||
|
2. `graph.getNeighbors(entityIds)` is called with those IDs → `POST /graph/neighbors` on memory-service
|
||||||
|
3. The returned `{ nodes, edges }` is passed to `formatGraphContext()`
|
||||||
|
4. On failure, falls back to using the Qdrant payload data directly as flat nodes with no edges
|
||||||
|
|
||||||
|
### Prompt Format
|
||||||
|
|
||||||
|
`formatGraphContext(nodes, edges)` in `chat/index.js` formats the subgraph as:
|
||||||
|
|
||||||
|
```
|
||||||
|
Here is what you know about entities relevant to this conversation and their connections:
|
||||||
|
- Alice (person): software engineer working on NexusAI
|
||||||
|
→ works_on NexusAI (project)
|
||||||
|
→ knows Bob (person)
|
||||||
|
- NexusAI (project): AI assistant framework
|
||||||
|
- Bob (person): Alice's colleague
|
||||||
|
```
|
||||||
|
|
||||||
|
- One line per node: `- {name} ({type}): {notes}`
|
||||||
|
- Outbound edges indented below: ` → {label} {target_name} ({target_type})`
|
||||||
|
- Nodes with only inbound edges (pulled in as neighbors) appear without connection lines
|
||||||
|
- Only outbound edges are shown — each relationship appears once, from the `from_id` side
|
||||||
|
|
||||||
|
## Project Scoping
|
||||||
|
|
||||||
|
The knowledge graph respects project boundaries at the **entry point**, not
|
||||||
|
during traversal:
|
||||||
|
|
||||||
|
- Qdrant entity search is filtered by `projectId` — only entities tagged with this project are returned as seeds
|
||||||
|
- Graph traversal in SQLite is unfiltered — neighbors can be from any project or no project
|
||||||
|
- This is intentional: the graph entry is project-scoped, but traversal follows the global relationship graph to discover connected knowledge
|
||||||
|
|
||||||
|
Entities are tagged with `projectId` in the Qdrant payload at extraction time.
|
||||||
|
Entities extracted from non-project sessions have `projectId: null` and only
|
||||||
|
appear in unfiltered global searches.
|
||||||
|
|
||||||
|
## API Reference
|
||||||
|
|
||||||
|
### `GET /graph/neighborhood/:entityId`
|
||||||
|
|
||||||
|
Returns the neighborhood of a single entity.
|
||||||
|
|
||||||
|
**Query params:**
|
||||||
|
|
||||||
|
| Param | Default | Max | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `depth` | `ENTITIES.GRAPH_HOP_DEPTH` (1) | 3 | Traversal depth |
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"entity": { "id": 5, "name": "Alice", "type": "person", "notes": "...", "mention_count": 4 },
|
||||||
|
"neighborhood": {
|
||||||
|
"nodes": [
|
||||||
|
{ "id": 5, "name": "Alice", "type": "person", "notes": "..." },
|
||||||
|
{ "id": 8, "name": "NexusAI", "type": "project", "notes": "..." }
|
||||||
|
],
|
||||||
|
"edges": [
|
||||||
|
{ "id": 2, "from_id": 5, "to_id": 8, "label": "works_on", "notes": "...", "mention_count": 3 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns 404 if the entity does not exist.
|
||||||
|
|
||||||
|
### `POST /graph/neighbors`
|
||||||
|
|
||||||
|
Bulk 1-hop neighborhood for a set of entity IDs. Used internally by
|
||||||
|
orchestration — not intended for direct client use.
|
||||||
|
|
||||||
|
**Request body:**
|
||||||
|
```json
|
||||||
|
{ "entityIds": [5, 8, 12] }
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"nodes": [ ...entity objects... ],
|
||||||
|
"edges": [ ...relationship objects... ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns 400 if `entityIds` is missing or empty.
|
||||||
|
|
||||||
|
## Constants (`packages/shared/src/config/constants.js`)
|
||||||
|
|
||||||
|
| Constant | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `ENTITIES.PROMOTION_THRESHOLD` | 3 | `mention_count` at which an entity is considered well-established |
|
||||||
|
| `ENTITIES.GRAPH_HOP_DEPTH` | 1 | Default traversal depth for neighborhood queries |
|
||||||
|
| `ORCHESTRATION.ENTITIES_LIMIT` | 5 | Max entity seeds returned from Qdrant search |
|
||||||
|
| `ORCHESTRATION.ENTITIES_THRESHOLD` | 0.55 | Minimum similarity score for entity Qdrant search |
|
||||||
@@ -9,8 +9,8 @@
|
|||||||
|
|
||||||
Responsible for all reading and writing of long-term memory. Acts as the
|
Responsible for all reading and writing of long-term memory. Acts as the
|
||||||
sole interface to both SQLite and Qdrant — no other service accesses these
|
sole interface to both SQLite and Qdrant — no other service accesses these
|
||||||
stores directly. On episode creation, automatically calls the embedding
|
stores directly. On episode creation, automatically triggers entity and
|
||||||
service to generate and store a vector in Qdrant.
|
relationship extraction and embeds results into Qdrant.
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
@@ -38,25 +38,29 @@ src/
|
|||||||
├── db/
|
├── db/
|
||||||
│ ├── index.js # SQLite connection + initialization + migrations
|
│ ├── index.js # SQLite connection + initialization + migrations
|
||||||
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
|
│ ├── schema.js # Table definitions, indexes, FTS5, triggers
|
||||||
│ └── projects.js # Project CRUD functions
|
│ ├── projects.js # Project CRUD functions
|
||||||
|
│ └── summaries.js # Summary CRUD functions
|
||||||
├── episodic/
|
├── episodic/
|
||||||
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
│ └── index.js # Session + episode CRUD, FTS search, embedding write path
|
||||||
├── semantic/
|
├── semantic/
|
||||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||||
├── entities/
|
├── entities/
|
||||||
│ ├── index.js # Entity + relationship CRUD
|
│ ├── index.js # Entity + relationship CRUD (upsert, mention tracking)
|
||||||
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
│ └── extraction.js # Automatic entity + relationship extraction via qwen2.5:3b
|
||||||
|
├── graph/
|
||||||
|
│ └── index.js # Knowledge graph traversal (neighborhood queries, recursive CTE)
|
||||||
└── index.js # Express app + all route definitions
|
└── index.js # Express app + all route definitions
|
||||||
```
|
```
|
||||||
|
|
||||||
## SQLite Schema
|
## SQLite Schema
|
||||||
|
|
||||||
Six core tables:
|
Eight core tables:
|
||||||
|
|
||||||
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
- **sessions** — top-level conversation containers. Fields: `external_id`, `name`, `project_id`, `metadata`
|
||||||
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
- **episodes** — individual exchanges (user message + AI response) tied to a session
|
||||||
- **entities** — named things the system learns about (people, places, concepts)
|
- **entities** — named things the system learns about (people, places, concepts, etc.). Fields include `mention_count`, `confidence`, `source`, `last_seen_at`
|
||||||
- **relationships** — directional labeled links between entities
|
- **relationships** — directional labeled links between entities (`from_id`, `to_id`, `label`). Fields include `mention_count`, `notes`
|
||||||
|
- **entity_episodes** — join table linking entities to the episodes where they were extracted. Used for provenance and orphan cleanup
|
||||||
- **summaries** — condensed episode groups for efficient context retrieval
|
- **summaries** — condensed episode groups for efficient context retrieval
|
||||||
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
- **projects** — named groupings of sessions with `name`, `description`, `colour`, `icon`, `isolated`, `notes`, `system_prompt`
|
||||||
|
|
||||||
@@ -72,10 +76,18 @@ try { db.exec(`CREATE INDEX IF NOT EXISTS idx_sessions_project ON sessions(proje
|
|||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN isolated INTEGER NOT NULL DEFAULT 0`); } catch {}
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN notes TEXT`); } catch {}
|
||||||
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
try { db.exec(`ALTER TABLE projects ADD COLUMN system_prompt TEXT`); } catch {}
|
||||||
|
// Knowledge graph columns:
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||||
```
|
```
|
||||||
|
|
||||||
New migrations are always appended here — never modify the schema file for
|
`entity_episodes` is defined in `schema.js` itself (not a migration) since it is a new table.
|
||||||
existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
|
||||||
|
New migrations are always appended — never modify the schema file for existing tables since `ALTER TABLE` cannot use `IF NOT EXISTS`.
|
||||||
|
|
||||||
### FTS5 Full-Text Search
|
### FTS5 Full-Text Search
|
||||||
|
|
||||||
@@ -100,12 +112,9 @@ that weren't touched.
|
|||||||
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
|
const allowed = ['name', 'description', 'colour', 'icon', 'isolated', 'notes', 'system_prompt'];
|
||||||
```
|
```
|
||||||
|
|
||||||
This means saving just `{ notes: "..." }` or `{ system_prompt: "..." }` won't
|
|
||||||
touch any other field.
|
|
||||||
|
|
||||||
## Qdrant / Semantic Layer
|
## Qdrant / Semantic Layer
|
||||||
|
|
||||||
Three Qdrant collections are initialized on service startup:
|
Three Qdrant collections are initialized on service startup via `semantic.initCollections()`:
|
||||||
|
|
||||||
| Collection | Purpose |
|
| Collection | Purpose |
|
||||||
|---|---|
|
|---|---|
|
||||||
@@ -117,9 +126,12 @@ All collections use **768-dimension vectors** with **Cosine similarity**,
|
|||||||
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
|
matching `nomic-embed-text` via Ollama. Vector size and distance metric are
|
||||||
defined in `@nexusai/shared` — not hardcoded here.
|
defined in `@nexusai/shared` — not hardcoded here.
|
||||||
|
|
||||||
Each collection exposes three operations in `src/semantic/index.js`:
|
`initCollections()` iterates `Object.values(COLLECTIONS)` and creates any
|
||||||
upsert, search (with optional Qdrant filter), and delete. The `wait: true`
|
collection that doesn't already exist at startup — all three collections are
|
||||||
flag is used on all writes.
|
guaranteed to exist before any requests are handled.
|
||||||
|
|
||||||
|
Each collection exposes upsert, search (with optional Qdrant filter), and
|
||||||
|
delete operations. The `wait: true` flag is used on all writes.
|
||||||
|
|
||||||
## Embedding Write Path
|
## Embedding Write Path
|
||||||
|
|
||||||
@@ -133,8 +145,7 @@ When a new episode is created:
|
|||||||
This step is **fire-and-forget** — if embedding fails, the episode is still
|
This step is **fire-and-forget** — if embedding fails, the episode is still
|
||||||
saved and searchable via FTS. The error is logged but not surfaced.
|
saved and searchable via FTS. The error is logged but not surfaced.
|
||||||
|
|
||||||
> The Qdrant payload stores `sessionId` (the internal integer ID). This is
|
> The Qdrant payload stores `sessionId` (the internal integer ID). See
|
||||||
> used for per-session and per-project filtering during semantic search. See
|
|
||||||
> `memory-isolation.md` for how project-level filtering works.
|
> `memory-isolation.md` for how project-level filtering works.
|
||||||
|
|
||||||
## Entity Layer
|
## Entity Layer
|
||||||
@@ -142,38 +153,36 @@ saved and searchable via FTS. The error is logged but not surfaced.
|
|||||||
Entities and relationships use upsert semantics with composite unique
|
Entities and relationships use upsert semantics with composite unique
|
||||||
constraints to prevent duplicates:
|
constraints to prevent duplicates:
|
||||||
|
|
||||||
- `UNIQUE(name, type)` on entities
|
- `UNIQUE(name, type)` on entities — conflict increments `mention_count` and updates `last_seen_at`
|
||||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
- `UNIQUE(from_id, to_id, label)` on relationships — conflict increments `mention_count` and preserves existing `notes`
|
||||||
- `ON DELETE CASCADE` on relationship foreign keys
|
- `ON DELETE CASCADE` on relationship foreign keys
|
||||||
|
|
||||||
### Automatic Entity Extraction
|
|
||||||
|
|
||||||
After each episode is saved, `extraction.js` automatically extracts named
|
After each episode is saved, `extraction.js` automatically extracts named
|
||||||
entities from the conversation using `qwen2.5:3b` running on Ollama (Mini PC 1).
|
entities **and relationships** from the conversation using `qwen2.5:3b` on
|
||||||
This runs **fire-and-forget** — the episode is already saved and returned
|
Ollama — fire-and-forget. Each saved entity is also linked to the episode
|
||||||
before extraction begins.
|
via the `entity_episodes` join table.
|
||||||
|
|
||||||
**Entity types extracted:** `person`, `place`, `project`, `technology`,
|
> For full details on the extraction pipeline and JSON format, see `entity-extraction.md`.
|
||||||
`concept`, `organization`
|
> For the knowledge graph traversal layer, see `knowledge-graph.md`.
|
||||||
|
|
||||||
The extraction prompt uses ChatML format (native to qwen2.5) and primes the
|
## Knowledge Graph Layer
|
||||||
response by ending with `[` to steer the model directly into JSON array output.
|
|
||||||
A list of already-known entities is injected into the prompt so the model
|
|
||||||
reuses existing `(name, type)` pairs rather than creating duplicates with
|
|
||||||
different types.
|
|
||||||
|
|
||||||
After extraction, each entity is:
|
`src/graph/index.js` provides SQLite-based graph traversal over the entities
|
||||||
1. Upserted into SQLite via `upsertEntity` — notes are only written if
|
and relationships tables. Two functions are exposed via HTTP:
|
||||||
the entity is new (`COALESCE(entities.notes, excluded.notes)` prevents
|
|
||||||
overwriting existing notes with speculative updates)
|
|
||||||
2. Embedded via the embedding service and upserted into the `entities`
|
|
||||||
Qdrant collection with `{ name, type, notes, projectId }` as payload —
|
|
||||||
`projectId` scopes entities to their project for isolated retrieval
|
|
||||||
|
|
||||||
`extractAndStoreEntities` receives `projectId` from `createEpisode`, which
|
- **`getNeighborhood(entityId, depth)`** — recursive CTE traversal, bidirectional, returns `{ nodes, edges }`
|
||||||
receives it from the episode route, which receives it from orchestration's
|
- **`getEntityNeighbors(entityIds[])`** — bulk 1-hop traversal for orchestration context assembly
|
||||||
`createEpisode` call. This ensures entities are tagged with the correct
|
|
||||||
project scope at extraction time.
|
> For design rationale, traversal queries, and integration with orchestration, see `knowledge-graph.md`.
|
||||||
|
|
||||||
|
## Summaries Layer
|
||||||
|
|
||||||
|
Session summaries are generated by `orchestration-service/src/services/summarization.js`
|
||||||
|
after each episode write and stored here via `POST /summaries`. The memory
|
||||||
|
service is responsible only for CRUD — generation logic lives in orchestration.
|
||||||
|
|
||||||
|
> For full details on trigger conditions, prompt format, cumulative updates,
|
||||||
|
> and ChatML token stripping, see `summarization.md`.
|
||||||
|
|
||||||
## Project Delete Behaviour
|
## Project Delete Behaviour
|
||||||
|
|
||||||
|
|||||||
@@ -30,7 +30,8 @@ or inference services — all traffic flows through orchestration.
|
|||||||
| LLAMA_SERVER_URL | No | http://localhost:8080 | Direct llama-server URL for /models/props |
|
| LLAMA_SERVER_URL | No | http://localhost:8080 | Direct llama-server URL for /models/props |
|
||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant URL for semantic search |
|
||||||
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
| CORS_ORIGIN | No | http://localhost:5173 | Allowed origin for CORS requests |
|
||||||
| MODELS_MANIFEST_PATH | No | — | Legacy — superseded by `modelsFolderPath` in settings.json |
|
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for summarisation |
|
||||||
|
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for summarisation |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
@@ -40,20 +41,22 @@ src/
|
|||||||
│ ├── memory.js # HTTP client for memory service
|
│ ├── memory.js # HTTP client for memory service
|
||||||
│ ├── inference.js # HTTP client for inference service
|
│ ├── inference.js # HTTP client for inference service
|
||||||
│ ├── embedding.js # HTTP client for embedding service
|
│ ├── embedding.js # HTTP client for embedding service
|
||||||
│ └── qdrant.js # HTTP client for Qdrant (direct vector search)
|
│ ├── qdrant.js # HTTP client for Qdrant (direct vector search)
|
||||||
|
│ ├── graph.js # HTTP client for memory-service graph endpoints
|
||||||
|
│ └── summarization.js # Session summarisation — triggers after each episode
|
||||||
├── chat/
|
├── chat/
|
||||||
│ └── index.js # Core pipeline — context assembly, isolation, auto-naming
|
│ └── index.js # Core pipeline — context assembly, graph expansion, auto-naming
|
||||||
├── config/
|
├── config/
|
||||||
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
│ └── settings.js # Settings load/save — reads/writes data/settings.json
|
||||||
├── routes/
|
├── routes/
|
||||||
│ ├── chat.js # POST /chat and POST /chat/stream
|
│ ├── chat.js # POST /chat and POST /chat/stream
|
||||||
│ ├── sessions.js # Session CRUD proxy
|
│ ├── sessions.js # Session CRUD proxy
|
||||||
│ ├── projects.js # Project CRUD proxy — passes req.body straight through
|
│ ├── projects.js # Project CRUD proxy
|
||||||
│ ├── episodes.js # Episode list and delete proxy
|
│ ├── episodes.js # Episode list and delete proxy
|
||||||
|
│ ├── summaries.js # GET /summaries/session/:id and /summaries/project/:id
|
||||||
│ ├── settings.js # GET /settings and PATCH /settings
|
│ ├── settings.js # GET /settings and PATCH /settings
|
||||||
│ ├── health.js # GET /health — pings all four services
|
│ ├── health.js # GET /health/services — pings all four services
|
||||||
│ └── models.js # GET /models — scans .gguf files live, merges with models.json
|
│ └── models.js # GET /models and GET /models/props
|
||||||
# GET /models/props — context window + loaded model from llama-server
|
|
||||||
└── index.js # Express app entry point
|
└── index.js # Express app entry point
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -69,7 +72,9 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
| `recentEpisodeLimit` | 5 | Recent episodes injected into prompt |
|
||||||
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
| `semanticLimit` | 5 | Semantic search results injected into prompt |
|
||||||
| `scoreThreshold` | 0.75 | Minimum similarity score for semantic results |
|
| `scoreThreshold` | 0.5 | Minimum similarity score for Qdrant semantic results |
|
||||||
|
| `semanticWeight` | 1.0 | RRF weight for Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | RRF weight for FTS5 keyword results (`0` = disabled) |
|
||||||
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
| `modelsFolderPath` | `/mnt/nexus-models` | Path to folder containing .gguf files |
|
||||||
| `temperature` | 0.7 | Inference temperature |
|
| `temperature` | 0.7 | Inference temperature |
|
||||||
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
| `repeatPenalty` | 1.1 | Repeat token penalty |
|
||||||
@@ -77,9 +82,6 @@ via `appSettings.load()` — changes apply immediately without a service restart
|
|||||||
| `topK` | 40 | Top-K token candidates per step |
|
| `topK` | 40 | Top-K token candidates per step |
|
||||||
| `systemPrompt` | *(ORCHESTRATION.SYSTEM_PROMPT)* | Global system prompt. `null` reverts to hardcoded constant. |
|
| `systemPrompt` | *(ORCHESTRATION.SYSTEM_PROMPT)* | Global system prompt. `null` reverts to hardcoded constant. |
|
||||||
|
|
||||||
Defaults are defined in `config/settings.js` and fall back to constants in
|
|
||||||
`@nexusai/shared`. Values saved in `settings.json` take precedence.
|
|
||||||
|
|
||||||
## Chat Pipeline
|
## Chat Pipeline
|
||||||
|
|
||||||
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
Both `POST /chat` and `POST /chat/stream` share the same steps. The only
|
||||||
@@ -88,70 +90,86 @@ difference is how the inference response is delivered to the client.
|
|||||||
### Steps
|
### Steps
|
||||||
|
|
||||||
1. **Session resolution** — look up session by `externalId`. Auto-create if
|
1. **Session resolution** — look up session by `externalId`. Auto-create if
|
||||||
not found. Clients generate a UUID for new conversations — no pre-creation
|
not found.
|
||||||
step needed.
|
|
||||||
|
|
||||||
2. **Project context resolution** — if the session has a `project_id`, fetch
|
2. **Project context resolution** — if the session has a `project_id`, fetch
|
||||||
the project and all its session IDs. Used to scope semantic search. The
|
the project and all its session IDs. Used to scope semantic search. The
|
||||||
project's `system_prompt` is also read at this step if set.
|
project's `system_prompt` is also read at this step if set.
|
||||||
|
|
||||||
3. **System prompt resolution** — three-tier hierarchy:
|
3. **System prompt resolution** — three-tier hierarchy:
|
||||||
- `project.system_prompt` — if the session is in a project and it's set (highest priority)
|
- `project.system_prompt` — highest priority
|
||||||
- `settings.systemPrompt` — global setting from `settings.json`
|
- `settings.systemPrompt` — global setting from `settings.json`
|
||||||
- `ORCHESTRATION.SYSTEM_PROMPT` — hardcoded constant in `@nexusai/shared` (last resort)
|
- `ORCHESTRATION.SYSTEM_PROMPT` — hardcoded constant (last resort)
|
||||||
|
|
||||||
4. **Recent episode retrieval** — fetch the most recent episodes for the
|
4. **Recent episode retrieval** — fetch most recent episodes (`recentEpisodeLimit`).
|
||||||
session (`recentEpisodeLimit`, default 5).
|
|
||||||
|
|
||||||
5. **Semantic search** — embed the user message, query Qdrant for the top
|
5. **Fused episode retrieval** — runs semantic (Qdrant) and keyword (FTS5)
|
||||||
most similar past episodes (`semanticLimit`, `scoreThreshold`). Deduplicated
|
search in parallel, then merges results via Reciprocal Rank Fusion (RRF).
|
||||||
against recent episodes. Non-critical — if it fails, pipeline continues with
|
Both paths are filtered against `recentIds` before fusion. FTS is scoped
|
||||||
recency-only context.
|
to the current session or all project sessions. If `keywordWeight` is `0`,
|
||||||
|
the FTS call is skipped entirely. Non-critical — failures fall back to
|
||||||
|
whichever strategy succeeded.
|
||||||
|
|
||||||
6. **Entity search** — query the `entities` Qdrant collection filtered by
|
6. **Entity search** — query `entities` Qdrant collection filtered by
|
||||||
`projectId`. Non-project sessions receive no entity context. Non-critical.
|
`projectId`. Returns entity IDs alongside Qdrant payload data (the Qdrant
|
||||||
|
point ID equals the SQLite entity ID). Non-critical.
|
||||||
|
|
||||||
7. **Prompt assembly** — combine resolved system prompt, entity context,
|
7. **Graph neighborhood expansion** — call `POST /graph/neighbors` on
|
||||||
semantic episodes, recent episodes, and user message.
|
memory-service with the entity IDs from step 6. Returns a 1-hop subgraph
|
||||||
|
`{ nodes, edges }` — entity objects plus the relationships connecting them.
|
||||||
|
If no entities were found or the graph call fails, falls back to flat entity
|
||||||
|
list (no edges). Non-critical.
|
||||||
|
|
||||||
8. **Inference** — send to inference service with settings-derived parameters
|
8. **Prompt assembly** — combine system prompt, graph context, fused episodes,
|
||||||
(temperature, topP, topK, repeatPenalty). `/chat` awaits full response;
|
recent episodes, and user message.
|
||||||
|
|
||||||
|
9. **Inference** — send to inference service. `/chat` awaits full response;
|
||||||
`/chat/stream` pipes SSE chunks to the client.
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
|
|
||||||
9. **Episode write** — write the exchange back to memory with `projectId`.
|
10. **Episode write** — write exchange back to memory with `projectId`.
|
||||||
Fire-and-forget for `/chat`; awaited for `/chat/stream`.
|
|
||||||
|
|
||||||
10. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
11. **Summarisation trigger** — `triggerSummary(session, allEpisodes)` called
|
||||||
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
fire-and-forget. See `summarization.md` for full details.
|
||||||
write the result back as `session.name`. Fully fire-and-forget.
|
|
||||||
|
12. **Auto-naming** — on first message with no session name, fires a secondary
|
||||||
|
inference call (max 20 tokens, temperature 0.3) to generate a session name.
|
||||||
|
|
||||||
### Prompt Structure
|
### Prompt Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
[Resolved system prompt]
|
[Resolved system prompt]
|
||||||
|
|
||||||
Here is what you know about entities relevant to this conversation:
|
Here is what you know about entities relevant to this conversation and their connections:
|
||||||
- {name} ({type}): {notes}
|
- {name} ({type}): {notes}
|
||||||
... (up to 5 entity results)
|
→ {label} {neighbor_name} ({neighbor_type})
|
||||||
---
|
---
|
||||||
Here are some relevant memories from earlier conversations:
|
Here are some relevant memories from earlier conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to semanticLimit semantic episodes)
|
|
||||||
---
|
---
|
||||||
Here are some relevant memories from your past conversations:
|
Here are some relevant memories from your past conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
... (up to recentEpisodeLimit recent episodes)
|
|
||||||
--- End of recent memories ---
|
--- End of recent memories ---
|
||||||
|
|
||||||
User: {current message}
|
User: {current message}
|
||||||
Assistant:
|
Assistant:
|
||||||
```
|
```
|
||||||
|
|
||||||
Entity context appears first — before episodic memory — because structured
|
The entity block renders the full graph neighborhood — seed entities matched
|
||||||
facts about known entities are the most stable and reliable context. Semantic
|
by Qdrant search plus any neighbors pulled in by 1-hop traversal. Each entity
|
||||||
episodes follow, then recent episodes as the immediate conversation flow.
|
shows its `notes` and any outbound relationships with their targets. Neighbor
|
||||||
|
nodes that have no outbound edges within the subgraph appear without connection
|
||||||
|
lines.
|
||||||
|
|
||||||
|
## Summarisation
|
||||||
|
|
||||||
|
After each episode write, `triggerSummary` is called fire-and-forget. It
|
||||||
|
checks token thresholds and episode counts before generating, then stores
|
||||||
|
the result in the memory service.
|
||||||
|
|
||||||
|
> For full details on trigger conditions, prompt format, cumulative updates,
|
||||||
|
> ChatML token stripping, and episode range tracking, see `summarization.md`.
|
||||||
|
|
||||||
## SSE Stream Format
|
## SSE Stream Format
|
||||||
|
|
||||||
@@ -168,37 +186,26 @@ data: {"text":"Hello"}
|
|||||||
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":42}
|
||||||
```
|
```
|
||||||
|
|
||||||
The `[DONE]` sentinel is consumed internally and not forwarded. The stream
|
The `[DONE]` sentinel is consumed internally and not forwarded.
|
||||||
is terminated by `res.end()` after the done event.
|
|
||||||
|
|
||||||
## Models Route
|
## Models Route
|
||||||
|
|
||||||
`GET /models` scans `.gguf` files live on each request from `modelsFolderPath`
|
`GET /models` scans `.gguf` files live from `modelsFolderPath` and merges
|
||||||
(read from settings). Merges results with a `models.json` file in the same
|
with `models.json` for metadata. Returns file size in GB.
|
||||||
folder for richer metadata (label, description). Returns file size in GB.
|
|
||||||
|
|
||||||
`GET /models/props` fetches directly from llama-server via `LLAMA_SERVER_URL`.
|
`GET /models/props` fetches directly from llama-server. Returns
|
||||||
Returns `{ contextWindow, modelAlias }`. `n_ctx` is at
|
`{ contextWindow, modelAlias }`. Returns `503` if unreachable.
|
||||||
`data.default_generation_settings.n_ctx` in the llama-server response.
|
|
||||||
Returns `503` if llama-server is unreachable.
|
|
||||||
|
|
||||||
## Sessions Route Behaviour
|
## Sessions Route Behaviour
|
||||||
|
|
||||||
`PATCH /sessions/:sessionId` accepts either `name`, `projectId`, or both.
|
`PATCH /sessions/:sessionId` accepts `name`, `projectId`, or both.
|
||||||
The validation guard only rejects requests where neither is provided:
|
Rejects only when neither is provided — allows `useChat` to write project
|
||||||
|
assignment separately from rename operations.
|
||||||
```js
|
|
||||||
if (!name?.trim() && projectId === undefined) {
|
|
||||||
return res.status(400).json({ error: 'name or projectId is required' });
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
This allows `useChat` to write project assignment separately from rename
|
|
||||||
operations.
|
|
||||||
|
|
||||||
## Caddy Configuration
|
## Caddy Configuration
|
||||||
|
|
||||||
Each route prefix needs a handle block in the Caddyfile on Mini PC 2:
|
Each route prefix needs a handle block in the Caddyfile on Mini PC 2.
|
||||||
|
**Any new top-level route must be added here AND in `vite.config.js`.**
|
||||||
|
|
||||||
```
|
```
|
||||||
handle /chat* { reverse_proxy localhost:4000 }
|
handle /chat* { reverse_proxy localhost:4000 }
|
||||||
@@ -207,9 +214,13 @@ handle /models* { reverse_proxy localhost:4000 }
|
|||||||
handle /projects* { reverse_proxy localhost:4000 }
|
handle /projects* { reverse_proxy localhost:4000 }
|
||||||
handle /episodes* { reverse_proxy localhost:4000 }
|
handle /episodes* { reverse_proxy localhost:4000 }
|
||||||
handle /settings* { reverse_proxy localhost:4000 }
|
handle /settings* { reverse_proxy localhost:4000 }
|
||||||
|
handle /summaries* { reverse_proxy localhost:4000 }
|
||||||
handle /health* { reverse_proxy localhost:4000 }
|
handle /health* { reverse_proxy localhost:4000 }
|
||||||
```
|
```
|
||||||
|
|
||||||
After updating: `caddy reload --config /path/to/Caddyfile`
|
After updating: `caddy reload --config /path/to/Caddyfile`
|
||||||
|
|
||||||
|
> Note: `/graph` routes are on the memory-service (port 3002) and are called
|
||||||
|
> internally by orchestration — they do not need a Caddy entry.
|
||||||
|
|
||||||
For all HTTP endpoints, see `api-routes.md`.
|
For all HTTP endpoints, see `api-routes.md`.
|
||||||
153
docs/services/retrieval-fusion.md
Normal file
153
docs/services/retrieval-fusion.md
Normal file
@@ -0,0 +1,153 @@
|
|||||||
|
# Retrieval Fusion
|
||||||
|
|
||||||
|
**Implementation:** `packages/orchestration-service/src/chat/index.js`
|
||||||
|
**FTS scoping:** `packages/memory-service/src/episodic/index.js`, `src/index.js`
|
||||||
|
**Settings:** `semanticWeight`, `keywordWeight` via `PATCH /settings`
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Rather than relying solely on Qdrant vector similarity (which finds semantically
|
||||||
|
related content but misses exact keyword matches) or FTS5 keyword search alone
|
||||||
|
(which finds exact matches but not paraphrases), Reciprocal Rank Fusion (RRF)
|
||||||
|
merges the ranked results from both strategies into a single better-ranked list.
|
||||||
|
|
||||||
|
Episodes that rank highly in **both** lists score highest. An episode that is
|
||||||
|
the top semantic match but irrelevant by keyword, or vice versa, scores lower
|
||||||
|
than one that satisfies both.
|
||||||
|
|
||||||
|
## How RRF Works
|
||||||
|
|
||||||
|
For each episode `d`, its fused score is:
|
||||||
|
|
||||||
|
```
|
||||||
|
RRF(d) = w_semantic / (k + rank_semantic(d))
|
||||||
|
+ w_keyword / (k + rank_keyword(d))
|
||||||
|
```
|
||||||
|
|
||||||
|
- `rank_i(d)` — 1-based position in that strategy's result list (episode absent from a list contributes 0 for that term)
|
||||||
|
- `k = 60` — smoothing constant (standard; not exposed in settings)
|
||||||
|
- `w_semantic`, `w_keyword` — user-tunable weights (both default-sourced from `RETRIEVAL` constants)
|
||||||
|
|
||||||
|
Setting a weight to `0` removes that strategy's contribution entirely. Setting
|
||||||
|
`keywordWeight` to `0` also short-circuits the FTS network call.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
Fusion lives in orchestration — the service already coordinates multiple data
|
||||||
|
sources, and fusion is a retrieval strategy, not a storage concern.
|
||||||
|
|
||||||
|
```
|
||||||
|
getFusedEpisodes()
|
||||||
|
├── getSemanticEpisodes() — Qdrant embed+search → fetch full rows by ID
|
||||||
|
│ (existing path, unchanged)
|
||||||
|
└── getFTSResults() — memory-service /episodes/search → full rows directly
|
||||||
|
(skipped entirely if keywordWeight == 0)
|
||||||
|
↓
|
||||||
|
fuseEpisodeResults() — pure RRF, no I/O
|
||||||
|
↓
|
||||||
|
fusedEpisodes[] — top semanticLimit episodes by RRF score
|
||||||
|
```
|
||||||
|
|
||||||
|
### Data Shape Consistency
|
||||||
|
|
||||||
|
Both sides must enter fusion as `Episode[]` — full SQLite row objects with
|
||||||
|
the same shape — and both must be filtered against `recentIds` first:
|
||||||
|
|
||||||
|
- **Semantic path**: `recentIds` filter applied before `getEpisodeById` fetch (existing behaviour)
|
||||||
|
- **FTS path**: full rows returned directly; `recentIds` filter applied in `getFusedEpisodes` after receiving them
|
||||||
|
|
||||||
|
FTS requests `semanticLimit * 2` results to provide headroom for the
|
||||||
|
`recentIds` filter without under-serving the fusion.
|
||||||
|
|
||||||
|
## FTS Session Scoping
|
||||||
|
|
||||||
|
Without scoping, FTS5 searches across all episodes in the database. For
|
||||||
|
context assembly, results must be constrained to the current session or
|
||||||
|
project session pool — the same scope used for Qdrant semantic search.
|
||||||
|
|
||||||
|
`searchEpisodes(query, limit, sessionIds)` in memory-service accepts an
|
||||||
|
optional `sessionIds` array. When provided, the SQL becomes:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
|
WHERE episodes_fts MATCH ?
|
||||||
|
AND e.session_id IN (?, ?, ...)
|
||||||
|
ORDER BY rank
|
||||||
|
LIMIT ?
|
||||||
|
```
|
||||||
|
|
||||||
|
The HTTP endpoint `GET /episodes/search` accepts `sessionIds` as a
|
||||||
|
comma-separated query param: `?q=hello&sessionIds=1,2,3`.
|
||||||
|
|
||||||
|
In orchestration, `ftsSessionIds` is set to:
|
||||||
|
- `projectSessionIds` (all sessions in the project) — if the session belongs to a project
|
||||||
|
- `[session.id]` — otherwise (single session only)
|
||||||
|
|
||||||
|
This mirrors the Qdrant scoping logic exactly.
|
||||||
|
|
||||||
|
## `fuseEpisodeResults` — Implementation Detail
|
||||||
|
|
||||||
|
```js
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K; // 60
|
||||||
|
const scores = new Map(); // episode.id → { episode, score }
|
||||||
|
|
||||||
|
// Score semantic results (already filtered against recentIds)
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
|
||||||
|
// Score + merge keyword results (already filtered against recentIds)
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib; // appears in both — sum scores
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib }); // FTS-only episode
|
||||||
|
}
|
||||||
|
// contrib == 0 (keywordWeight: 0) → episode not added (guard prevents score-0 bleed-through)
|
||||||
|
});
|
||||||
|
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit)
|
||||||
|
.map(({ episode }) => episode);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `else if (contrib > 0)` guard prevents FTS-only episodes from entering
|
||||||
|
the result set with a score of 0 when `keywordWeight` is 0 — verified by
|
||||||
|
the test suite.
|
||||||
|
|
||||||
|
## Settings
|
||||||
|
|
||||||
|
| Setting | Default | Range | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `semanticWeight` | 1.0 | 0–5 | Weight applied to Qdrant semantic results |
|
||||||
|
| `keywordWeight` | 0 | 0–5 | Weight applied to FTS5 keyword results. `0` = disabled |
|
||||||
|
|
||||||
|
Both are readable via `GET /settings` and writable via `PATCH /settings`
|
||||||
|
without a service restart. Changes take effect on the next chat request.
|
||||||
|
|
||||||
|
**To enable keyword search:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"keywordWeight": 1.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**To favour keyword matches over semantic:**
|
||||||
|
```bash
|
||||||
|
curl -X PATCH http://localhost:4000/settings \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"semanticWeight": 0.5, "keywordWeight": 2.0}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Constants (`packages/shared/src/config/constants.js`)
|
||||||
|
|
||||||
|
| Constant | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `RETRIEVAL.RRF_K` | 60 | RRF smoothing constant — not exposed in settings |
|
||||||
|
| `RETRIEVAL.SEMANTIC_WEIGHT` | 1.0 | Default semantic weight |
|
||||||
|
| `RETRIEVAL.KEYWORD_WEIGHT` | 0 | Default keyword weight (off) |
|
||||||
@@ -165,10 +165,16 @@ Orchestration pipeline defaults. Used as fallback values in
|
|||||||
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
|
||||||
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
|
||||||
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
|
||||||
|
| `ENTITIES_LIMIT` | `5` | Max entity search results to inject into prompt |
|
||||||
|
| `ENTITIES_THRESHOLD` | `0.55` | Minimum similarity score for entity results |
|
||||||
| `TEMPERATURE` | `0.7` | Default inference temperature |
|
| `TEMPERATURE` | `0.7` | Default inference temperature |
|
||||||
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
|
||||||
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
|
||||||
|
|
||||||
|
> `ENTITIES_THRESHOLD` is set to `0.55` — lower than `SCORE_THRESHOLD` because
|
||||||
|
> entity notes generated by a 3B model tend to embed with lower cosine similarity
|
||||||
|
> than full episode text. Tune upward if irrelevant entities appear in context.
|
||||||
|
|
||||||
> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
|
> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
|
||||||
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
|
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
|
||||||
> since those constants already define the canonical values.
|
> since those constants already define the canonical values.
|
||||||
@@ -178,6 +184,25 @@ Default system prompt:
|
|||||||
> of past conversations with the user. Use them to provide consistent,
|
> of past conversations with the user. Use them to provide consistent,
|
||||||
> personalised responses."
|
> personalised responses."
|
||||||
|
|
||||||
|
#### `SUMMARIES`
|
||||||
|
|
||||||
|
Controls the automatic session summarisation system in `orchestration-service/src/services/summarization.js`.
|
||||||
|
|
||||||
|
| Key | Value | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `THRESHOLD_TOKENS` | `200` | Minimum total session tokens before summarisation is considered |
|
||||||
|
| `MAX_SUMMARY_TOKENS` | `800` | If existing summary exceeds this length (chars), create a new row instead of updating |
|
||||||
|
| `MIN_EPISODES_SINCE` | `5` | Minimum new episodes since last summary before re-summarising |
|
||||||
|
|
||||||
|
These can be overridden per-deployment via environment variables in the
|
||||||
|
orchestration service `.env`:
|
||||||
|
|
||||||
|
```
|
||||||
|
SUMMARY_THRESHOLD_TOKENS=200
|
||||||
|
SUMMARY_MAX_TOKENS=800
|
||||||
|
SUMMARY_MIN_EPISODES=5
|
||||||
|
```
|
||||||
|
|
||||||
#### `SQLITE`
|
#### `SQLITE`
|
||||||
|
|
||||||
| Key | Value | Description |
|
| Key | Value | Description |
|
||||||
|
|||||||
201
docs/services/summarization.md
Normal file
201
docs/services/summarization.md
Normal file
@@ -0,0 +1,201 @@
|
|||||||
|
# Summarization
|
||||||
|
|
||||||
|
Session summarization generates rolling plain-text summaries of conversation
|
||||||
|
history, giving the model a condensed view of past context without consuming
|
||||||
|
the full context window with raw episodes.
|
||||||
|
|
||||||
|
**Location:** `packages/orchestration-service/src/services/summarization.js`
|
||||||
|
**Triggered by:** `chat/index.js` after every episode write (fire-and-forget)
|
||||||
|
**Model:** `qwen2.5:3b` via Ollama on Mini PC 1 (192.168.0.81)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Trigger Conditions
|
||||||
|
|
||||||
|
`triggerSummary(session, allEpisodes)` calls `maybeSummarize` fire-and-forget.
|
||||||
|
`maybeSummarize` proceeds only when both conditions are met:
|
||||||
|
|
||||||
|
1. Total session token count exceeds `SUMMARIES.THRESHOLD_TOKENS` (default 200)
|
||||||
|
2. At least `SUMMARIES.MIN_EPISODES_SINCE` (default 5) new episodes have
|
||||||
|
accumulated since the last summary
|
||||||
|
|
||||||
|
The token threshold is intentionally low — it ensures summaries start
|
||||||
|
generating early in a session's life rather than only after very long
|
||||||
|
conversations.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Rows and Cumulative Updates
|
||||||
|
|
||||||
|
Each session can have multiple summary rows in the `summaries` table.
|
||||||
|
The update strategy depends on the size of the most recent summary:
|
||||||
|
|
||||||
|
| Condition | Action |
|
||||||
|
|---|---|
|
||||||
|
| No existing summary | Generate fresh summary from all episodes |
|
||||||
|
| Latest summary under `MAX_SUMMARY_TOKENS` | Update: summarise new episodes with existing summary as context |
|
||||||
|
| Latest summary over `MAX_SUMMARY_TOKENS` | Create new row: treat as fresh summarisation |
|
||||||
|
|
||||||
|
This produces a chain of summary rows over time. Each row's `episode_range`
|
||||||
|
covers only the episodes summarised in that specific pass (e.g. `259-263`),
|
||||||
|
not all episodes in the session.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ollama Request
|
||||||
|
|
||||||
|
```js
|
||||||
|
{
|
||||||
|
model: EXTRACTION_MODEL, // qwen2.5:3b (set via EXTRACTION_MODEL env var)
|
||||||
|
prompt: buildSummaryPrompt(episodesToSummarize, existingSummary),
|
||||||
|
stream: false,
|
||||||
|
// No format: 'json' — free-text output required for summaries
|
||||||
|
options: {
|
||||||
|
temperature: 0.2,
|
||||||
|
num_predict: 500,
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`temperature: 0.2` is slightly higher than extraction (0.1) — summaries
|
||||||
|
benefit from some fluency. `num_predict: 500` gives room for 5 thorough
|
||||||
|
sentences without risk of runoff.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prompt Format
|
||||||
|
|
||||||
|
ChatML format — native to qwen2.5:
|
||||||
|
|
||||||
|
```
|
||||||
|
<|im_start|>user
|
||||||
|
Summarize the conversation below in 3-5 sentences.
|
||||||
|
Write in third person. Do not quote directly — paraphrase only.
|
||||||
|
Do not include greetings, sign-offs, or filler. Output only the summary text.
|
||||||
|
|
||||||
|
Conversation:
|
||||||
|
{context}
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
For cumulative updates, the instruction and context change:
|
||||||
|
|
||||||
|
```
|
||||||
|
<|im_start|>user
|
||||||
|
Update the summary below to incorporate the new exchanges.
|
||||||
|
Write 3-5 sentences in third person. Do not quote directly — paraphrase only.
|
||||||
|
Do not include greetings, sign-offs, or filler. Output only the updated summary text.
|
||||||
|
|
||||||
|
Previous summary:
|
||||||
|
{existingSummary}
|
||||||
|
|
||||||
|
New exchanges:
|
||||||
|
{context}
|
||||||
|
<|im_end|>
|
||||||
|
<|im_start|>assistant
|
||||||
|
```
|
||||||
|
|
||||||
|
### Input truncation
|
||||||
|
|
||||||
|
Episode context is truncated to `MAX_CHARS = 3000` characters, keeping the
|
||||||
|
most recent exchanges (sliced from the end). This keeps Qwen focused and
|
||||||
|
prevents the prompt from exceeding its effective context window.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ChatML Token Stripping
|
||||||
|
|
||||||
|
Qwen occasionally echoes ChatML tokens back into its response. The raw output
|
||||||
|
is cleaned before saving:
|
||||||
|
|
||||||
|
```js
|
||||||
|
const raw = data.response?.trim() ?? '';
|
||||||
|
const content = raw
|
||||||
|
.replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
|
||||||
|
.replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
|
||||||
|
.trim();
|
||||||
|
return content;
|
||||||
|
```
|
||||||
|
|
||||||
|
Without this, leaked tokens get stored in the summary and then injected
|
||||||
|
back into the next summarisation prompt — causing the model to append a new
|
||||||
|
summary after the old one rather than replacing it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Episode Range Tracking
|
||||||
|
|
||||||
|
Each summary row stores `episode_range` as `"firstId-lastId"` covering only
|
||||||
|
the episodes summarised in that pass:
|
||||||
|
|
||||||
|
```js
|
||||||
|
const summarizedIds = episodesToSummarize.map(ep => ep.id).sort((a,b) => a - b);
|
||||||
|
const episodeRange = `${summarizedIds.at(0)}-${summarizedIds.at(-1)}`;
|
||||||
|
```
|
||||||
|
|
||||||
|
This makes SummaryView cards meaningful — "Episodes 259-263" tells you
|
||||||
|
exactly which exchanges that summary covers, rather than always showing
|
||||||
|
the full session range.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Summary Storage
|
||||||
|
|
||||||
|
Summaries are written directly to the memory service from orchestration:
|
||||||
|
|
||||||
|
```js
|
||||||
|
// Create new row
|
||||||
|
await fetch(`${MEMORY_URL}/summaries`, {
|
||||||
|
method: 'POST',
|
||||||
|
body: JSON.stringify({ sessionId: session.id, content, tokenCount, episodeRange }),
|
||||||
|
});
|
||||||
|
|
||||||
|
// Update existing row
|
||||||
|
await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
|
||||||
|
method: 'PATCH',
|
||||||
|
body: JSON.stringify({ content, tokenCount, episodeRange }),
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
`session.id` here is the internal SQLite integer ID — not the external UUID.
|
||||||
|
It is available directly on the `session` object passed from `chat/index.js`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Client-Side Indicator
|
||||||
|
|
||||||
|
The chat client shows a "Summarising…" spinner in the `ChatWindow` header
|
||||||
|
and on the InfoPanel's Session Memory button while summarisation may be
|
||||||
|
in progress.
|
||||||
|
|
||||||
|
Since summarisation is fire-and-forget with no completion signal back to
|
||||||
|
the client, the indicator is timer-based: it activates when the stream
|
||||||
|
finishes and clears after 8 seconds.
|
||||||
|
|
||||||
|
```js
|
||||||
|
// In App.jsx, watching the streaming state from useChat:
|
||||||
|
useEffect(() => {
|
||||||
|
if (prevStreaming.current && !streaming) {
|
||||||
|
setSummarising(true);
|
||||||
|
const t = setTimeout(() => setSummarising(false), 8000);
|
||||||
|
return () => clearTimeout(t);
|
||||||
|
}
|
||||||
|
prevStreaming.current = streaming;
|
||||||
|
}, [streaming]);
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
Set in `packages/orchestration-service/src/.env`:
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `EXTRACTION_URL` | `http://localhost:11434` | Ollama instance URL |
|
||||||
|
| `EXTRACTION_MODEL` | `qwen2.5:3b` | Model for summarisation |
|
||||||
|
| `MEMORY_SERVICE_URL` | `http://localhost:3002` | Memory service URL |
|
||||||
|
| `SUMMARY_THRESHOLD_TOKENS` | `200` | Token threshold before summarisation triggers |
|
||||||
|
| `SUMMARY_MAX_TOKENS` | `800` | Max summary length before a new row is created |
|
||||||
|
| `SUMMARY_MIN_EPISODES` | `5` | Min new episodes since last summary before re-summarising |s
|
||||||
3
package-lock.json
generated
3
package-lock.json
generated
@@ -4224,8 +4224,7 @@
|
|||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@nexusai/shared": "^1.0.0",
|
"@nexusai/shared": "^1.0.0",
|
||||||
"dotenv": "^17.4.0",
|
"dotenv": "^17.4.0",
|
||||||
"express": "^5.2.1",
|
"express": "^5.2.1"
|
||||||
"ollama": "^0.6.3"
|
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
"packages/inference-service": {
|
"packages/inference-service": {
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ import AllProjectsView from './components/AllProjectsView';
|
|||||||
import SettingsView from './components/SettingsView';
|
import SettingsView from './components/SettingsView';
|
||||||
import ProjectView from './components/ProjectView';
|
import ProjectView from './components/ProjectView';
|
||||||
import MemoryView from './components/MemoryView';
|
import MemoryView from './components/MemoryView';
|
||||||
|
import SummaryView from './components/SummaryView';
|
||||||
|
|
||||||
/**** useHooks **** */
|
/**** useHooks **** */
|
||||||
import { useSession } from './hooks/useSession';
|
import { useSession } from './hooks/useSession';
|
||||||
@@ -27,6 +28,7 @@ const BACK_MAP = {
|
|||||||
'settings': 'home',
|
'settings': 'home',
|
||||||
'project': 'all-projects',
|
'project': 'all-projects',
|
||||||
'memory': 'settings',
|
'memory': 'settings',
|
||||||
|
'summaries': 'chat',
|
||||||
};
|
};
|
||||||
|
|
||||||
export default function App() {
|
export default function App() {
|
||||||
@@ -63,6 +65,7 @@ export default function App() {
|
|||||||
streaming,
|
streaming,
|
||||||
lastTokenCount,
|
lastTokenCount,
|
||||||
lastModel,
|
lastModel,
|
||||||
|
summarising,
|
||||||
} = useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions });
|
} = useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions });
|
||||||
|
|
||||||
function navigate(nextView) {
|
function navigate(nextView) {
|
||||||
@@ -159,6 +162,7 @@ export default function App() {
|
|||||||
onBack={goBack}
|
onBack={goBack}
|
||||||
canGoBack={canGoBack}
|
canGoBack={canGoBack}
|
||||||
loadedModel={modelProps?.modelAlias ?? null}
|
loadedModel={modelProps?.modelAlias ?? null}
|
||||||
|
summarising={summarising}
|
||||||
/>
|
/>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
@@ -205,6 +209,13 @@ export default function App() {
|
|||||||
/>
|
/>
|
||||||
)}
|
)}
|
||||||
|
|
||||||
|
{view === 'summaries' && (
|
||||||
|
<SummaryView
|
||||||
|
activeSession={activeSession}
|
||||||
|
onBack={goBack}
|
||||||
|
/>
|
||||||
|
)}
|
||||||
|
|
||||||
<InfoPanel
|
<InfoPanel
|
||||||
isOpen={rightOpen}
|
isOpen={rightOpen}
|
||||||
onToggle={() => setRightOpen(o => !o)}
|
onToggle={() => setRightOpen(o => !o)}
|
||||||
@@ -214,6 +225,8 @@ export default function App() {
|
|||||||
onModelChange={setSelectedModel}
|
onModelChange={setSelectedModel}
|
||||||
lastModel={lastModel}
|
lastModel={lastModel}
|
||||||
lastTokenCount={lastTokenCount}
|
lastTokenCount={lastTokenCount}
|
||||||
|
summarising={summarising}
|
||||||
|
onViewSummary={() => navigate('summaries')}
|
||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
);
|
);
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
import { API_DEFAULTS } from "../config/constants";
|
import { API_DEFAULTS } from "../config/constants";
|
||||||
|
|
||||||
|
|
||||||
const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';
|
const BASE_URL = import.meta.env.VITE_ORCHESTRATION_URL ?? '';
|
||||||
|
|
||||||
// ── Sessions ────────────────────────────────────────────────
|
// ── Sessions ────────────────────────────────────────────────
|
||||||
@@ -205,3 +206,21 @@ export async function getModelProps() {
|
|||||||
if (!res.ok) throw new Error('Failed to fetch model props');
|
if (!res.ok) throw new Error('Failed to fetch model props');
|
||||||
return res.json();
|
return res.json();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export async function fetchSessionSummaries(sessionId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/summaries/session/${sessionId}`);
|
||||||
|
if (!res.ok) throw new Error(`Failed to fetch summaries: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function generateProjectSummary(projectId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/summaries/project/${projectId}/generate`, { method: 'POST' });
|
||||||
|
if (!res.ok) throw new Error(`Failed to generate project summary: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function fetchProjectOverviewSummary(projectId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/summaries/project/${projectId}/overview`);
|
||||||
|
if (!res.ok) throw new Error(`Failed to fetch project overview: ${res.status}`);
|
||||||
|
return res.json(); // null if none exists yet
|
||||||
|
}
|
||||||
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
|
|||||||
import { fetchSessions, deleteSession } from '../api/orchestration';
|
import { fetchSessions, deleteSession } from '../api/orchestration';
|
||||||
import { CLIENT_DEFAULTS } from '../config/constants';
|
import { CLIENT_DEFAULTS } from '../config/constants';
|
||||||
|
|
||||||
|
|
||||||
const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;
|
const PAGE_SIZE = CLIENT_DEFAULTS.PAGE_SIZE;
|
||||||
|
|
||||||
export default function AllChatsView({ onSelectSession, onBack, projects }) {
|
export default function AllChatsView({ onSelectSession, onBack, projects }) {
|
||||||
|
|||||||
@@ -2,6 +2,7 @@ import React, { useState, useEffect } from 'react';
|
|||||||
import ProjectModal from './ProjectModal';
|
import ProjectModal from './ProjectModal';
|
||||||
import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';
|
import { fetchProjects, createProject, updateProject, deleteProject } from '../api/orchestration';
|
||||||
|
|
||||||
|
|
||||||
export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
|
export default function AllProjectsView({ onProjectsChange, onBack, onSelectProject, onNavigate }) {
|
||||||
const [projects, setProjects] = useState([]);
|
const [projects, setProjects] = useState([]);
|
||||||
const [loading, setLoading] = useState(true);
|
const [loading, setLoading] = useState(true);
|
||||||
|
|||||||
@@ -12,6 +12,7 @@ export default function ChatWindow({
|
|||||||
onBack,
|
onBack,
|
||||||
canGoBack,
|
canGoBack,
|
||||||
loadedModel,
|
loadedModel,
|
||||||
|
summarising,
|
||||||
}) {
|
}) {
|
||||||
const bottomRef = useRef(null);
|
const bottomRef = useRef(null);
|
||||||
const inputRef = useRef(null);
|
const inputRef = useRef(null);
|
||||||
@@ -86,6 +87,20 @@ export default function ChatWindow({
|
|||||||
No model loaded
|
No model loaded
|
||||||
</span>
|
</span>
|
||||||
)}
|
)}
|
||||||
|
{summarising && (
|
||||||
|
<div style={{ display: 'flex', alignItems: 'center', gap: '6px' }}>
|
||||||
|
<div style={{
|
||||||
|
width: '10px', height: '10px', borderRadius: '50%',
|
||||||
|
border: '2px solid var(--accent)',
|
||||||
|
borderTopColor: 'transparent',
|
||||||
|
animation: 'spin 0.7s linear infinite',
|
||||||
|
flexShrink: 0,
|
||||||
|
}} />
|
||||||
|
<span style={{ fontSize: '11px', color: 'var(--text-muted)', whiteSpace: 'nowrap' }}>
|
||||||
|
Summarising…
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
<button className="btn-icon" onClick={onTogglePanel} title="Session info">⊹</button>
|
<button className="btn-icon" onClick={onTogglePanel} title="Session info">⊹</button>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,6 +1,17 @@
|
|||||||
import React from 'react';
|
import React from 'react';
|
||||||
|
|
||||||
export default function InfoPanel({ isOpen, onToggle, activeSession, lastModel, lastTokenCount, selectedModel, onModelChange, models }) {
|
export default function InfoPanel({
|
||||||
|
isOpen,
|
||||||
|
onToggle,
|
||||||
|
activeSession,
|
||||||
|
lastModel,
|
||||||
|
lastTokenCount,
|
||||||
|
selectedModel,
|
||||||
|
onModelChange,
|
||||||
|
models,
|
||||||
|
summarising,
|
||||||
|
onViewSummary,
|
||||||
|
}) {
|
||||||
return (
|
return (
|
||||||
<div className="flex-col" style={{
|
<div className="flex-col" style={{
|
||||||
position: 'fixed',
|
position: 'fixed',
|
||||||
@@ -74,6 +85,37 @@ export default function InfoPanel({ isOpen, onToggle, activeSession, lastModel,
|
|||||||
)}
|
)}
|
||||||
</Section>
|
</Section>
|
||||||
|
|
||||||
|
{/* Session Memory button */}
|
||||||
|
{activeSession && !activeSession.isNew && (
|
||||||
|
<button
|
||||||
|
onClick={onViewSummary}
|
||||||
|
className="btn-reset text-sm"
|
||||||
|
style={{
|
||||||
|
marginTop: '8px', width: '100%', padding: '7px 10px',
|
||||||
|
borderRadius: 'var(--radius-md)',
|
||||||
|
background: 'var(--bg-elevated)',
|
||||||
|
border: '1px solid var(--border)',
|
||||||
|
color: 'var(--text-secondary)',
|
||||||
|
display: 'flex', alignItems: 'center', gap: '8px',
|
||||||
|
}}
|
||||||
|
onMouseEnter={e => e.currentTarget.style.borderColor = 'var(--accent-hover)'}
|
||||||
|
onMouseLeave={e => e.currentTarget.style.borderColor = 'var(--border)'}
|
||||||
|
>
|
||||||
|
<span>◈</span>
|
||||||
|
<span>Session Memory</span>
|
||||||
|
{summarising && (
|
||||||
|
<div style={{
|
||||||
|
marginLeft: 'auto',
|
||||||
|
width: '8px', height: '8px', borderRadius: '50%',
|
||||||
|
border: '2px solid var(--accent-hover)',
|
||||||
|
borderTopColor: 'transparent',
|
||||||
|
animation: 'spin 0.7s linear infinite',
|
||||||
|
flexShrink: 0,
|
||||||
|
}} />
|
||||||
|
)}
|
||||||
|
</button>
|
||||||
|
)}
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
)}
|
)}
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
import React, { useState, useEffect } from 'react';
|
import React, { useState, useEffect } from 'react';
|
||||||
import { fetchSessions, updateProject, deleteProject } from '../api/orchestration';
|
import { fetchSessions, updateProject, deleteProject, generateProjectSummary, fetchProjectOverviewSummary } from '../api/orchestration';
|
||||||
import ProjectModal from './ProjectModal';
|
import ProjectModal from './ProjectModal';
|
||||||
|
|
||||||
export default function ProjectView({ project, onNavigate, onBack, onSelectSession, onNewProjectChat, onProjectsChange }) {
|
export default function ProjectView({ project, onNavigate, onBack, onSelectSession, onNewProjectChat, onProjectsChange }) {
|
||||||
@@ -8,9 +8,27 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi
|
|||||||
const [input, setInput] = useState('');
|
const [input, setInput] = useState('');
|
||||||
const [menuOpen, setMenuOpen] = useState(false);
|
const [menuOpen, setMenuOpen] = useState(false);
|
||||||
const [modal, setModal] = useState(null);
|
const [modal, setModal] = useState(null);
|
||||||
|
const [overview, setOverview] = useState(null);
|
||||||
|
const [overviewLoading, setOverviewLoading] = useState(true);
|
||||||
|
const [generating, setGenerating] = useState(false);
|
||||||
|
const [generateError, setGenerateError] = useState(null);
|
||||||
|
|
||||||
useEffect(() => { load(); }, [project.id]);
|
useEffect(() => { load(); }, [project.id]);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
async function loadOverview() {
|
||||||
|
setOverviewLoading(true);
|
||||||
|
try {
|
||||||
|
setOverview(await fetchProjectOverviewSummary(project.id));
|
||||||
|
} catch (err) {
|
||||||
|
console.error('[ProjectView] Failed to load overview:', err.message);
|
||||||
|
} finally {
|
||||||
|
setOverviewLoading(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
loadOverview();
|
||||||
|
}, [project.id]);
|
||||||
|
|
||||||
async function load() {
|
async function load() {
|
||||||
setLoading(true);
|
setLoading(true);
|
||||||
try {
|
try {
|
||||||
@@ -71,6 +89,23 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi
|
|||||||
return date.toLocaleDateString([], { month: 'short', day: 'numeric', year: 'numeric' });
|
return date.toLocaleDateString([], { month: 'short', day: 'numeric', year: 'numeric' });
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function handleGenerateSummary() {
|
||||||
|
setGenerating(true);
|
||||||
|
setGenerateError(null);
|
||||||
|
try {
|
||||||
|
setOverview(await generateProjectSummary(project.id));
|
||||||
|
} catch (err) {
|
||||||
|
// 422 means no session summaries exist yet — surface a friendly message
|
||||||
|
setGenerateError(
|
||||||
|
err.message.includes('422')
|
||||||
|
? 'No conversations found in this project yet.'
|
||||||
|
: 'Failed to generate summary. Please try again.'
|
||||||
|
);
|
||||||
|
} finally {
|
||||||
|
setGenerating(false);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
return (
|
return (
|
||||||
<div className="flex-col flex-1 overflow-hidden" style={{ background: 'var(--bg-base)' }}>
|
<div className="flex-col flex-1 overflow-hidden" style={{ background: 'var(--bg-base)' }}>
|
||||||
|
|
||||||
@@ -198,34 +233,61 @@ export default function ProjectView({ project, onNavigate, onBack, onSelectSessi
|
|||||||
|
|
||||||
{/* ── Project Memory ── */}
|
{/* ── Project Memory ── */}
|
||||||
<div style={{ marginBottom: '40px' }}>
|
<div style={{ marginBottom: '40px' }}>
|
||||||
<p className="label-upper" style={{ marginBottom: '12px' }}>Project Memory</p>
|
<div style={{ display: 'flex', alignItems: 'center', justifyContent: 'space-between', marginBottom: '12px' }}>
|
||||||
|
<p className="label-upper">Project Memory</p>
|
||||||
|
<button
|
||||||
|
className="btn-primary"
|
||||||
|
style={{ padding: '5px 12px', fontSize: '12px', display: 'flex', alignItems: 'center', gap: '6px' }}
|
||||||
|
onClick={handleGenerateSummary}
|
||||||
|
disabled={generating}
|
||||||
|
>
|
||||||
|
{generating
|
||||||
|
? <><span className="spinner" />Generating…</>
|
||||||
|
: overview ? 'Regenerate' : 'Generate Summary'
|
||||||
|
}
|
||||||
|
</button>
|
||||||
|
</div>
|
||||||
|
|
||||||
<div style={{
|
<div style={{
|
||||||
background: 'var(--bg-surface)',
|
background: 'var(--bg-surface)',
|
||||||
border: '1px solid var(--border)',
|
border: '1px solid var(--border)',
|
||||||
borderRadius: 'var(--radius-lg)',
|
borderRadius: 'var(--radius-lg)',
|
||||||
padding: '20px',
|
padding: '20px',
|
||||||
display: 'flex', flexDirection: 'column', gap: '10px',
|
|
||||||
}}>
|
}}>
|
||||||
|
{overviewLoading ? (
|
||||||
|
<p className="text-sm text-muted">Loading…</p>
|
||||||
|
|
||||||
|
) : generateError ? (
|
||||||
|
<p className="text-sm" style={{ color: 'var(--text-muted)', fontStyle: 'italic' }}>
|
||||||
|
{generateError}
|
||||||
|
</p>
|
||||||
|
|
||||||
|
) : overview ? (
|
||||||
|
<>
|
||||||
|
<p className="text-sm" style={{ color: 'var(--text-secondary)', lineHeight: 1.7, whiteSpace: 'pre-wrap' }}>
|
||||||
|
{overview.content}
|
||||||
|
</p>
|
||||||
|
<p className="text-xs text-muted" style={{ marginTop: '12px' }}>
|
||||||
|
Last generated {formatTimestamp(overview.created_at)}
|
||||||
|
</p>
|
||||||
|
</>
|
||||||
|
|
||||||
|
) : (
|
||||||
|
// No overview exists yet — explain what this section is for
|
||||||
|
<div style={{ display: 'flex', flexDirection: 'column', gap: '10px' }}>
|
||||||
<div style={{ display: 'flex', alignItems: 'center', gap: '10px' }}>
|
<div style={{ display: 'flex', alignItems: 'center', gap: '10px' }}>
|
||||||
<span style={{ fontSize: '20px', opacity: 0.4 }}>◈</span>
|
<span style={{ fontSize: '20px', opacity: 0.4 }}>◈</span>
|
||||||
<span className="text-sm" style={{ fontWeight: 500, color: 'var(--text-primary)' }}>
|
<span className="text-sm" style={{ fontWeight: 500, color: 'var(--text-primary)' }}>
|
||||||
Project Summary
|
No project summary yet
|
||||||
</span>
|
</span>
|
||||||
<span style={{
|
|
||||||
fontSize: '11px', padding: '2px 8px',
|
|
||||||
borderRadius: '999px',
|
|
||||||
background: 'var(--bg-elevated)',
|
|
||||||
border: '1px solid var(--border)',
|
|
||||||
color: 'var(--text-muted)',
|
|
||||||
}}>Coming soon</span>
|
|
||||||
</div>
|
</div>
|
||||||
<p className="text-sm text-muted" style={{ lineHeight: 1.6, maxWidth: '520px' }}>
|
<p className="text-sm text-muted" style={{ lineHeight: 1.6, maxWidth: '520px' }}>
|
||||||
Once this project has enough conversations, NexusAI will automatically
|
Generate a summary to create a concise overview of this project's goals,
|
||||||
generate a rolling summary of key themes, decisions, and context — giving
|
progress, and key decisions — built from your session summaries.
|
||||||
the model a condensed view of the project's memory without consuming the
|
|
||||||
full context window.
|
|
||||||
</p>
|
</p>
|
||||||
</div>
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
{/* ── Notes ── */}
|
{/* ── Notes ── */}
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ import { useSettings } from '../hooks/useSettings';
|
|||||||
import { useModels } from '../hooks/useModels';
|
import { useModels } from '../hooks/useModels';
|
||||||
import { getServiceHealth } from '../api/orchestration';
|
import { getServiceHealth } from '../api/orchestration';
|
||||||
|
|
||||||
|
|
||||||
export default function SettingsView({ onNavigate, onBack, modelProps }) {
|
export default function SettingsView({ onNavigate, onBack, modelProps }) {
|
||||||
const { settings, saveSetting, saving } = useSettings();
|
const { settings, saveSetting, saving } = useSettings();
|
||||||
|
|
||||||
|
|||||||
124
packages/chat-client/src/components/SummaryView.jsx
Normal file
124
packages/chat-client/src/components/SummaryView.jsx
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
import React, { useState, useEffect } from 'react';
|
||||||
|
import { fetchSessionSummaries } from '../api/orchestration';
|
||||||
|
import ReactMarkdown from 'react-markdown';
|
||||||
|
|
||||||
|
export default function SummaryView({ activeSession, onBack }) {
|
||||||
|
const [summaries, setSummaries] = useState([]);
|
||||||
|
const [loading, setLoading] = useState(true);
|
||||||
|
const [error, setError] = useState(null);
|
||||||
|
const [expanded, setExpanded] = useState(null);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
if (!activeSession || activeSession.isNew) {
|
||||||
|
setLoading(false);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
setLoading(true);
|
||||||
|
fetchSessionSummaries(activeSession.external_id)
|
||||||
|
.then(data => setSummaries(Array.isArray(data) ? data : []))
|
||||||
|
.catch(err => setError(err.message))
|
||||||
|
.finally(() => setLoading(false));
|
||||||
|
}, [activeSession]);
|
||||||
|
|
||||||
|
function formatTimestamp(ts) {
|
||||||
|
if (!ts) return '—';
|
||||||
|
return new Date(ts * 1000).toLocaleString([], {
|
||||||
|
month: 'short', day: 'numeric',
|
||||||
|
hour: '2-digit', minute: '2-digit',
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
return (
|
||||||
|
<div style={{ display: 'flex', flexDirection: 'column', flex: 1, overflow: 'hidden', background: 'var(--bg-base)' }}>
|
||||||
|
|
||||||
|
{/* Header */}
|
||||||
|
<div className="panel-header" style={{ padding: '0 24px', gap: 12 }}>
|
||||||
|
<button className="btn-icon" onClick={onBack}>←</button>
|
||||||
|
<span className="text-base" style={{ fontWeight: 500 }}>Session Memory</span>
|
||||||
|
<span className="text-sm text-muted" style={{ marginLeft: 'auto' }}>
|
||||||
|
{summaries.length} summar{summaries.length !== 1 ? 'ies' : 'y'}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Session name pill */}
|
||||||
|
{activeSession && (
|
||||||
|
<div style={{ padding: '8px 24px 0' }}>
|
||||||
|
<span className="text-xs text-muted" style={{
|
||||||
|
background: 'var(--bg-elevated)',
|
||||||
|
border: '1px solid var(--border)',
|
||||||
|
borderRadius: '999px',
|
||||||
|
padding: '3px 10px',
|
||||||
|
}}>
|
||||||
|
{activeSession.name || activeSession.external_id}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{/* Content */}
|
||||||
|
<div className="scroll-y flex-1" style={{ padding: '16px 24px' }}>
|
||||||
|
{loading && <p className="text-sm text-muted">Loading…</p>}
|
||||||
|
{error && <p className="text-sm" style={{ color: 'var(--error, #e05)' }}>{error}</p>}
|
||||||
|
|
||||||
|
{!loading && !activeSession && (
|
||||||
|
<p className="text-sm text-muted">No active session.</p>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{!loading && activeSession && summaries.length === 0 && (
|
||||||
|
<div style={{
|
||||||
|
display: 'flex', flexDirection: 'column', alignItems: 'center',
|
||||||
|
gap: '12px', padding: '48px 0', color: 'var(--text-muted)',
|
||||||
|
}}>
|
||||||
|
<span style={{ fontSize: '28px', opacity: 0.3 }}>◈</span>
|
||||||
|
<p className="text-sm">No summaries yet for this session.</p>
|
||||||
|
<p className="text-xs text-muted" style={{ maxWidth: '280px', textAlign: 'center', lineHeight: 1.6 }}>
|
||||||
|
Summaries generate automatically once a session accumulates enough conversation.
|
||||||
|
</p>
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
|
||||||
|
{summaries.map(summary => (
|
||||||
|
<div key={summary.id} style={{
|
||||||
|
background: 'var(--bg-surface)',
|
||||||
|
border: '1px solid var(--border)',
|
||||||
|
borderRadius: 'var(--radius-lg)',
|
||||||
|
marginBottom: '10px', overflow: 'hidden',
|
||||||
|
}}>
|
||||||
|
{/* Card header */}
|
||||||
|
<div
|
||||||
|
onClick={() => setExpanded(expanded === summary.id ? null : summary.id)}
|
||||||
|
style={{ display: 'flex', alignItems: 'center', gap: '10px', padding: '10px 14px', cursor: 'pointer' }}
|
||||||
|
>
|
||||||
|
<span style={{ flex: 1, fontSize: 13, color: 'var(--text-primary)' }}>
|
||||||
|
Episodes {summary.episode_range}
|
||||||
|
</span>
|
||||||
|
<span className="text-xs text-muted">{formatTimestamp(summary.created_at)}</span>
|
||||||
|
<span className="text-muted" style={{ fontSize: 11 }}>
|
||||||
|
{expanded === summary.id ? '▲' : '▼'}
|
||||||
|
</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
{/* Expanded content */}
|
||||||
|
{expanded === summary.id && (
|
||||||
|
<div style={{ padding: '0 14px 14px', borderTop: '1px solid var(--border)' }}>
|
||||||
|
<ReactMarkdown components={{
|
||||||
|
p: ({ children }) => (
|
||||||
|
<p style={{ margin: '8px 0', lineHeight: 1.7, fontSize: 13, color: 'var(--text-secondary)' }}>
|
||||||
|
{children}
|
||||||
|
</p>
|
||||||
|
),
|
||||||
|
}}>
|
||||||
|
{summary.content}
|
||||||
|
</ReactMarkdown>
|
||||||
|
{summary.token_count > 0 && (
|
||||||
|
<p className="text-xs text-muted" style={{ marginTop: 8 }}>
|
||||||
|
{summary.token_count.toLocaleString()} tokens covered
|
||||||
|
</p>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
)}
|
||||||
|
</div>
|
||||||
|
))}
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
);
|
||||||
|
}
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
import { useState, useCallback, useRef } from 'react';
|
import React, { useEffect, useState, useCallback, useRef } from 'react';
|
||||||
import { streamMessage, updateSession } from '../api/orchestration';
|
import { streamMessage, updateSession } from '../api/orchestration';
|
||||||
|
|
||||||
export function useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions }) {
|
export function useChat({ activeSession, appendMessage, updateLastMessage, refreshSessions }) {
|
||||||
@@ -7,6 +7,18 @@ export function useChat({ activeSession, appendMessage, updateLastMessage, refre
|
|||||||
const [lastTokenCount, setLastTokenCount] = useState(0);
|
const [lastTokenCount, setLastTokenCount] = useState(0);
|
||||||
const [lastModel, setLastModel] = useState(null);
|
const [lastModel, setLastModel] = useState(null);
|
||||||
const cancelRef = useRef(null);
|
const cancelRef = useRef(null);
|
||||||
|
const prevStreaming = React.useRef(false);
|
||||||
|
const [summarising, setSummarising] = useState(false);
|
||||||
|
|
||||||
|
useEffect(() => {
|
||||||
|
if (prevStreaming.current && !streaming) {
|
||||||
|
// Stream just finished — trigger the summarising indicator
|
||||||
|
setSummarising(true);
|
||||||
|
const t = setTimeout(() => setSummarising(false), 8000);
|
||||||
|
return () => clearTimeout(t);
|
||||||
|
}
|
||||||
|
prevStreaming.current = streaming;
|
||||||
|
}, [streaming]);
|
||||||
|
|
||||||
const sendMessage = useCallback(async (text, model, projectId = null, session=null) => {
|
const sendMessage = useCallback(async (text, model, projectId = null, session=null) => {
|
||||||
const targetSession = session ?? activeSession;
|
const targetSession = session ?? activeSession;
|
||||||
@@ -96,5 +108,6 @@ export function useChat({ activeSession, appendMessage, updateLastMessage, refre
|
|||||||
error,
|
error,
|
||||||
lastTokenCount,
|
lastTokenCount,
|
||||||
lastModel,
|
lastModel,
|
||||||
|
summarising,
|
||||||
};
|
};
|
||||||
}
|
}
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
import { useState, useEffect, useCallback } from 'react';
|
import { useState, useEffect, useCallback } from 'react';
|
||||||
import { fetchProjects } from '../api/orchestration';
|
import { fetchProjects } from '../api/orchestration';
|
||||||
|
|
||||||
|
|
||||||
export function useProjects() {
|
export function useProjects() {
|
||||||
const [projects, setProjects] = useState([]);
|
const [projects, setProjects] = useState([]);
|
||||||
|
|
||||||
|
|||||||
@@ -35,6 +35,10 @@ html, body, #root {
|
|||||||
50% { opacity: 0; }
|
50% { opacity: 0; }
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@keyframes spin {
|
||||||
|
to { transform: rotate(360deg); }
|
||||||
|
}
|
||||||
|
|
||||||
/* ── Layout ─────────────────────────────────────────── */
|
/* ── Layout ─────────────────────────────────────────── */
|
||||||
|
|
||||||
.flex { display: flex; }
|
.flex { display: flex; }
|
||||||
@@ -111,3 +115,13 @@ html, body, #root {
|
|||||||
.text-accent { color: var(--accent); }
|
.text-accent { color: var(--accent); }
|
||||||
.label-upper { font-size: 13px; font-weight: 750; color: var(--text-muted); text-transform: uppercase; letter-spacing: 0.08em; }
|
.label-upper { font-size: 13px; font-weight: 750; color: var(--text-muted); text-transform: uppercase; letter-spacing: 0.08em; }
|
||||||
.truncate { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }
|
.truncate { overflow: hidden; text-overflow: ellipsis; white-space: nowrap; }
|
||||||
|
|
||||||
|
.spinner {
|
||||||
|
width: 12px;
|
||||||
|
height: 12px;
|
||||||
|
border: 2px solid var(--border);
|
||||||
|
border-top-color: var(--text-muted);
|
||||||
|
border-radius: 50%;
|
||||||
|
animation: spin 0.7s linear infinite;
|
||||||
|
flex-shrink: 0;
|
||||||
|
}
|
||||||
@@ -16,6 +16,7 @@ export default defineConfig({
|
|||||||
'/episodes': 'http://192.168.0.205:4000',
|
'/episodes': 'http://192.168.0.205:4000',
|
||||||
'/settings': 'http://192.168.0.205:4000',
|
'/settings': 'http://192.168.0.205:4000',
|
||||||
'/health': 'http://192.168.0.205:4000',
|
'/health': 'http://192.168.0.205:4000',
|
||||||
|
'/summaries': 'http://192.168.0.205:4000',
|
||||||
},
|
},
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
64
packages/embedding-service/CLAUDE.md
Normal file
64
packages/embedding-service/CLAUDE.md
Normal file
@@ -0,0 +1,64 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run embedding # From repo root
|
||||||
|
npm -w packages/embedding-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **3003**. Requires Ollama to be reachable at `OLLAMA_URL`.
|
||||||
|
|
||||||
|
## Single-File Service
|
||||||
|
|
||||||
|
The entire service is `src/index.js` — no subdirectory structure. All routes, the Ollama helper, and startup are in one file.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `PORT` | `3003` | Port to listen on |
|
||||||
|
| `OLLAMA_URL` | `http://localhost:11434` | Ollama instance URL |
|
||||||
|
| `EMBEDDING_MODEL` | `nomic-embed-text` | Model passed to Ollama `/api/embed` |
|
||||||
|
|
||||||
|
Note: the env var name is `EMBEDDING_MODEL`, not `EMBED_MODEL` — the internal constant is `EMBED_MODEL` but the lookup key is different.
|
||||||
|
|
||||||
|
## Ollama API Details
|
||||||
|
|
||||||
|
Uses Ollama's `/api/embed` endpoint (not `/api/embeddings`). Request shape:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||||
|
```
|
||||||
|
|
||||||
|
Ollama returns `{ "embeddings": [[...]] }` — an array of arrays even for a single input. The helper takes `data.embeddings[0]` to return the single vector.
|
||||||
|
|
||||||
|
The `ollama` npm package is listed as a dependency but is **not used** — all calls are raw `fetch`. Do not refactor to use the package without checking the API shape matches.
|
||||||
|
|
||||||
|
## Batch Endpoint
|
||||||
|
|
||||||
|
`POST /embed/batch` embeds items **sequentially** in a for-loop, not in parallel. The comment explains this: Ollama doesn't parallelise embedding calls, so parallel requests would queue internally anyway. Do not change to `Promise.all` without verifying Ollama behaviour.
|
||||||
|
|
||||||
|
## Error Responses
|
||||||
|
|
||||||
|
| Condition | Status | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Missing/empty `text` | 400 | |
|
||||||
|
| Ollama call fails | 502 | Upstream failure — correct status |
|
||||||
|
| Empty `texts` array | 400 | |
|
||||||
|
|
||||||
|
## Known Issue
|
||||||
|
|
||||||
|
The 400 error message for `/embed` reads `"text is required and must be empty"` — the word "not" is missing. Should read `"must not be empty"`.
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Static response — does not verify Ollama is reachable |
|
||||||
|
| POST | `/embed` | Body: `{ text: string }`. Returns `{ embedding, model, dimensions }` |
|
||||||
|
| POST | `/embed/batch` | Body: `{ texts: string[] }`. Returns `{ embeddings, model, dimensions, count }` |
|
||||||
@@ -9,7 +9,6 @@
|
|||||||
"dependencies": {
|
"dependencies": {
|
||||||
"@nexusai/shared": "^1.0.0",
|
"@nexusai/shared": "^1.0.0",
|
||||||
"dotenv": "^17.4.0",
|
"dotenv": "^17.4.0",
|
||||||
"express": "^5.2.1",
|
"express": "^5.2.1"
|
||||||
"ollama": "^0.6.3"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,20 +1,21 @@
|
|||||||
require ('dotenv').config();
|
require ('dotenv').config();
|
||||||
const express = require('express');
|
const express = require('express');
|
||||||
const {getEnv, OLLAMA, PORTS} = require('@nexusai/shared');
|
const {getEnv, OLLAMA, PORTS, logger} = require('@nexusai/shared');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '1mb' })); // limit request body to 1mb to prevent abuse - embedding requests should be small
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.EMBEDDING); // Default to 3003 if PORT is not set
|
const PORT = getEnv('PORT', PORTS.EMBEDDING);
|
||||||
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL); // URL for Ollama API
|
const OLLAMA_URL = getEnv('OLLAMA_URL', OLLAMA.DEFAULT_URL);
|
||||||
const EMBED_MODEL = getEnv('EMBEDDING_MODEL', OLLAMA.EMBED_MODEL); // Ollama model for embeddings
|
const EMBED_MODEL = getEnv('EMBEDDING_MODEL', OLLAMA.EMBED_MODEL);
|
||||||
|
|
||||||
//OLLAMA embedding helper function
|
//OLLAMA embedding helper function
|
||||||
async function embedText(text) {
|
async function embedText(text) {
|
||||||
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
|
const res = await fetch(`${OLLAMA_URL}/api/embed`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify({ model: EMBED_MODEL, input: text })
|
body: JSON.stringify({ model: EMBED_MODEL, input: text }),
|
||||||
|
signal: AbortSignal.timeout(30_000),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
@@ -37,7 +38,7 @@ app.get('/health', (req,res) => {
|
|||||||
app.post('/embed', async (req, res) => {
|
app.post('/embed', async (req, res) => {
|
||||||
const { text } = req.body;
|
const { text } = req.body;
|
||||||
if (!text || typeof text !== 'string' || text.trim() === '') {
|
if (!text || typeof text !== 'string' || text.trim() === '') {
|
||||||
return res.status(400).json({ error: 'text is required and must be empty' });
|
return res.status(400).json({ error: 'text is required and must not be empty' });
|
||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
@@ -60,7 +61,10 @@ app.post('/embed/batch', async (req, res) => {
|
|||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
//sequential embedding for now, Ollama doesn't natively parallize embeddings
|
const invalid = texts.findIndex(t => !t || typeof t !== 'string' || t.trim() === '');
|
||||||
|
if (invalid !== -1)
|
||||||
|
return res.status(400).json({ error: `texts[${invalid}] is empty or not a string` });
|
||||||
|
|
||||||
const embeddings = [];
|
const embeddings = [];
|
||||||
for (const text of texts) {
|
for (const text of texts) {
|
||||||
embeddings.push(await embedText(text.trim()));
|
embeddings.push(await embedText(text.trim()));
|
||||||
@@ -78,5 +82,5 @@ app.post('/embed/batch', async (req, res) => {
|
|||||||
|
|
||||||
/******* Start Server ********/
|
/******* Start Server ********/
|
||||||
app.listen(PORT, () => {
|
app.listen(PORT, () => {
|
||||||
console.log(`Embedding Service listening on port ${PORT}`);
|
logger.info(`Embedding Service listening on port ${PORT}`);
|
||||||
});
|
});
|
||||||
75
packages/inference-service/CLAUDE.md
Normal file
75
packages/inference-service/CLAUDE.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and deployment layout.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run inference # From repo root
|
||||||
|
npm -w packages/inference-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **3001**. Set `INFERENCE_PROVIDER` to select the backend.
|
||||||
|
|
||||||
|
## Provider Pattern
|
||||||
|
|
||||||
|
`src/infer.js` reads `INFERENCE_PROVIDER` at startup and loads one of two providers:
|
||||||
|
|
||||||
|
| `INFERENCE_PROVIDER` | Module | Backend |
|
||||||
|
|---|---|---|
|
||||||
|
| `ollama` (default) | `src/providers/ollama.js` | Ollama npm client → `/api/generate` |
|
||||||
|
| `llamacpp` | `src/providers/llamacpp.js` | Raw fetch → `/v1/chat/completions` (OpenAI-compatible) |
|
||||||
|
|
||||||
|
An unknown provider throws immediately at startup — fail-fast, not at request time.
|
||||||
|
|
||||||
|
Both providers export the same interface: `complete(prompt, options)` and `completeStream(prompt, options)`.
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Default | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| `PORT` | `3001` | Port to listen on |
|
||||||
|
| `INFERENCE_PROVIDER` | `ollama` | `ollama` or `llamacpp` |
|
||||||
|
| `INFERENCE_URL` | `http://localhost:11434` (Ollama) / `http://localhost:8080` (llama.cpp) | Backend URL |
|
||||||
|
| `DEFAULT_MODEL` | Provider-specific | Model name passed to backend |
|
||||||
|
|
||||||
|
`INFERENCE_URL` defaults differ per provider — Ollama uses the Ollama default URL, llama.cpp uses the llama-server default.
|
||||||
|
|
||||||
|
## Options Resolution
|
||||||
|
|
||||||
|
Both providers use `resolveOptions(options)` to merge caller-supplied options with `INFERENCE_DEFAULTS` from shared constants. Any option not supplied by the caller falls back to the constant.
|
||||||
|
|
||||||
|
## Streaming Chunk Format
|
||||||
|
|
||||||
|
The two providers yield differently shaped chunks — the route in `src/routes/inference.js` normalises them:
|
||||||
|
|
||||||
|
**Ollama** yields raw Ollama generate chunks: `{ response, done, model, eval_count, prompt_eval_count, ... }`
|
||||||
|
|
||||||
|
**llama.cpp** yields:
|
||||||
|
- Per-token: `{ response: delta, done: false }`
|
||||||
|
- Final: `{ response: '', done: true, model, tokenCount }` — token count is the sum of `completion_tokens + prompt_tokens` from the usage chunk
|
||||||
|
|
||||||
|
The route checks `chunk.response` to stream text and `chunk.done` to capture metadata. For Ollama streaming, **token count is not captured** — the done chunk from Ollama contains `eval_count`/`prompt_eval_count` but the route only reads `chunk.tokenCount` (a llama.cpp field). Ollama streaming calls always report `tokenCount: 0` to the client.
|
||||||
|
|
||||||
|
## Known Issue: `maxTokens` Missing from Streaming Route
|
||||||
|
|
||||||
|
`POST /complete` correctly destructures `maxTokens` from the request body and passes it through. `POST /complete/stream` does **not** — it omits `maxTokens` from its destructuring, so streaming completions always use `INFERENCE_DEFAULTS.MAX_TOKENS` regardless of what the caller sends. This means `/chat/stream` has a different effective token ceiling than `/chat`.
|
||||||
|
|
||||||
|
## SSE Format (route → caller)
|
||||||
|
|
||||||
|
```
|
||||||
|
data: {"response":"Hello"} ← per token
|
||||||
|
data: {"response":" world"}
|
||||||
|
data: {"done":true,"model":"...","tokenCount":42} ← final metadata
|
||||||
|
data: [DONE] ← sentinel
|
||||||
|
```
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Returns `{ service, status, provider, model }` |
|
||||||
|
| POST | `/complete` | Body: `{ prompt, model?, temperature?, maxTokens?, topP?, topK?, repeatPenalty? }` |
|
||||||
|
| POST | `/complete/stream` | Same body as `/complete` except `maxTokens` is silently ignored |
|
||||||
@@ -1,10 +1,10 @@
|
|||||||
require ('dotenv').config();
|
require ('dotenv').config();
|
||||||
const express = require('express');
|
const express = require('express');
|
||||||
const {getEnv, PORTS, OLLAMA} = require('@nexusai/shared');
|
const {getEnv, PORTS, OLLAMA, logger} = require('@nexusai/shared');
|
||||||
const inferenceRouter = require('./routes/inference');
|
const inferenceRouter = require('./routes/inference');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '8mb' })); // prompts include full context window
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.INFERENCE);
|
const PORT = getEnv('PORT', PORTS.INFERENCE);
|
||||||
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
|
const PROVIDER = getEnv('INFERENCE_PROVIDER', 'ollama');
|
||||||
@@ -24,5 +24,5 @@ app.use('/', inferenceRouter);
|
|||||||
|
|
||||||
// Start the server
|
// Start the server
|
||||||
app.listen(PORT, () => {
|
app.listen(PORT, () => {
|
||||||
console.log(`Inference Service is running on port ${PORT}`);
|
logger.info(`Inference Service is running on port ${PORT}`);
|
||||||
});
|
});
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS } = require("@nexusai/shared");
|
const { getEnv, LLAMACPP, INFERENCE_DEFAULTS, logger } = require("@nexusai/shared");
|
||||||
|
|
||||||
const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
|
const BASE_URL = getEnv("INFERENCE_URL", LLAMACPP.DEFAULT_URL);
|
||||||
const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
|
const DEFAULT_MODEL = getEnv("DEFAULT_MODEL", LLAMACPP.DEFAULT_MODEL);
|
||||||
@@ -89,7 +89,7 @@ async function* completeStream(prompt, options = {}) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log('[llamacpp] finalTokenCount:', finalTokenCount);
|
logger.info('[llamacpp] finalTokenCount:', finalTokenCount);
|
||||||
|
|
||||||
yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
|
yield { response: '', done: true, model: finalModel, tokenCount: finalTokenCount };
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -57,8 +57,17 @@ async function* completeStream(prompt, options = {} ) {
|
|||||||
});
|
});
|
||||||
|
|
||||||
for await (const chunk of stream) {
|
for await (const chunk of stream) {
|
||||||
|
if (chunk.done) {
|
||||||
|
yield {
|
||||||
|
response: '',
|
||||||
|
done: true,
|
||||||
|
model: chunk.model,
|
||||||
|
tokenCount: (chunk.eval_count ?? 0) + (chunk.prompt_eval_count ?? 0),
|
||||||
|
};
|
||||||
|
} else {
|
||||||
yield chunk;
|
yield chunk;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = { complete, completeStream };
|
module.exports = { complete, completeStream };
|
||||||
@@ -1,5 +1,6 @@
|
|||||||
const { Router } = require('express');
|
const { Router } = require('express');
|
||||||
const { complete, completeStream } = require('../infer');
|
const { complete, completeStream } = require('../infer');
|
||||||
|
const { logger } = require('@nexusai/shared');
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
@@ -15,14 +16,14 @@ router.post('/complete', async (req, res) => {
|
|||||||
const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
|
const result = await complete (prompt, {model, temperature, maxTokens, topP, topK, repeatPenalty});
|
||||||
res.json(result);
|
res.json(result);
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
console.error('[Inference] Completion error:', error.message);
|
logger.error('[Inference] Completion error:', error.message);
|
||||||
res.status(500).json({ error: error.message });
|
res.status(500).json({ error: 'Inference failed', detail: error.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
// Streaming completion endpoint - sends partial responses as they arrive
|
// Streaming completion endpoint - sends partial responses as they arrive
|
||||||
router.post('/complete/stream', async (req, res) => {
|
router.post('/complete/stream', async (req, res) => {
|
||||||
const { prompt, model, temperature, topP, topK, repeatPenalty } = req.body;
|
const { prompt, model, temperature, maxTokens, topP, topK, repeatPenalty } = req.body;
|
||||||
|
|
||||||
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
|
if (!prompt) return res.status(400).json({ error: 'prompt is required' });
|
||||||
|
|
||||||
@@ -34,7 +35,7 @@ router.post('/complete/stream', async (req, res) => {
|
|||||||
let lastModel = model;
|
let lastModel = model;
|
||||||
let tokenCount = 0;
|
let tokenCount = 0;
|
||||||
|
|
||||||
for await (const chunk of completeStream(prompt, { model, temperature, topP, topK, repeatPenalty })) {
|
for await (const chunk of completeStream(prompt, { model, temperature, maxTokens,topP, topK, repeatPenalty })) {
|
||||||
if (chunk.response) {
|
if (chunk.response) {
|
||||||
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
|
res.write(`data: ${JSON.stringify({ response: chunk.response })}\n\n`);
|
||||||
}
|
}
|
||||||
@@ -42,7 +43,7 @@ router.post('/complete/stream', async (req, res) => {
|
|||||||
// capture final metadata from the done signal
|
// capture final metadata from the done signal
|
||||||
lastModel = chunk.model ?? lastModel;
|
lastModel = chunk.model ?? lastModel;
|
||||||
tokenCount = chunk.tokenCount ?? tokenCount;
|
tokenCount = chunk.tokenCount ?? tokenCount;
|
||||||
console.log('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
|
logger.info('[inference router] tokenCount from chunk:', chunk.tokenCount, '→', tokenCount);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -51,7 +52,7 @@ router.post('/complete/stream', async (req, res) => {
|
|||||||
res.write('data: [DONE]\n\n');
|
res.write('data: [DONE]\n\n');
|
||||||
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error('[Inference] Streaming error:', err.message);
|
logger.error('[Inference] Streaming error:', err.message);
|
||||||
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
|
res.write(`data: ${JSON.stringify({ error: err.message })}\n\n`);
|
||||||
} finally {
|
} finally {
|
||||||
res.end();
|
res.end();
|
||||||
|
|||||||
114
packages/memory-service/CLAUDE.md
Normal file
114
packages/memory-service/CLAUDE.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the dual-store memory model.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run memory # From repo root (node src/index.js)
|
||||||
|
npm -w packages/memory-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **3002**. Requires Qdrant and the embedding-service to be reachable on startup.
|
||||||
|
|
||||||
|
## SQLite Schema
|
||||||
|
|
||||||
|
`src/db/schema.js` is the source of truth for the data model. Key schema facts:
|
||||||
|
|
||||||
|
- `sessions` and `episodes` are linked by FK with cascade delete — deleting a session removes all its episodes automatically.
|
||||||
|
- `episodes_fts` is an FTS5 virtual table that mirrors `user_message` and `ai_response`. It is kept in sync via SQL triggers on INSERT/UPDATE/DELETE. On service startup, the FTS index is fully rebuilt from live episode data.
|
||||||
|
- Several columns (`sessions.name`, `sessions.project_id`, `entities.mention_count`, etc.) were added as migrations using `ALTER TABLE` wrapped in individual try-catch blocks. Failures are silently swallowed — if a column already exists, the alter fails and the service continues. The `idx_summaries_project` index is defined twice (benign duplicate).
|
||||||
|
- `summaries` rows with `session_id IS NULL` and a `project_id` represent project-level overviews, not session summaries. This distinction is how `GET /projects/:id/overview` works.
|
||||||
|
- `entity_episodes` is a join table linking entities to the episodes where they were first extracted. Used for provenance tracking and future orphan cleanup. Defined in `schema.js` (not a migration), so it exists on all installs.
|
||||||
|
|
||||||
|
**New columns on `entities` (added via migration):**
|
||||||
|
- `mention_count INTEGER DEFAULT 1` — incremented every time this entity is re-extracted
|
||||||
|
- `confidence REAL DEFAULT 1.0` — reserved for future confidence scoring
|
||||||
|
- `source TEXT DEFAULT 'extraction'` — `'extraction'` or `'manual'`
|
||||||
|
- `last_seen_at INTEGER` — Unix timestamp of most recent extraction hit
|
||||||
|
|
||||||
|
**New columns on `relationships` (added via migration):**
|
||||||
|
- `mention_count INTEGER DEFAULT 1` — incremented every time this edge is re-extracted
|
||||||
|
- `notes TEXT` — relationship context sentence from extraction
|
||||||
|
|
||||||
|
## Async Pipeline: Episode Creation
|
||||||
|
|
||||||
|
`POST /episodes` returns a 201 as soon as the SQLite insert succeeds. Two background tasks run after without blocking the response:
|
||||||
|
|
||||||
|
1. **Embedding** — Fetches a vector from embedding-service, stores to Qdrant with `{sessionId, createdAt}` as payload metadata.
|
||||||
|
2. **Entity + relationship extraction** — Sends the episode text to Ollama (`qwen2.5:3b`, temp 0.1, 1500 tokens) and upserts any recognized entities and relationships to both SQLite and Qdrant. Also links each entity to the episode via `entity_episodes`.
|
||||||
|
|
||||||
|
Both tasks catch and log errors silently. An episode can exist in SQLite with no corresponding Qdrant point if either step fails.
|
||||||
|
|
||||||
|
## Entity Extraction Details
|
||||||
|
|
||||||
|
`src/entities/extraction.js`:
|
||||||
|
|
||||||
|
- Fetches the last 20 known entities from SQLite before prompting the model, so the prompt can ask for name/type consistency with existing entries.
|
||||||
|
- Recognized entity types: `person`, `place`, `project`, `technology`, `concept`, `organization` — anything else is discarded.
|
||||||
|
- Ignores a hardcoded list of low-value names (`hello`, `thanks`, `good morning`, etc.).
|
||||||
|
- Extracts JSON using a regex (`{...}`) applied to raw model output, so surrounding prose doesn't break parsing.
|
||||||
|
- The model is asked to return both entities and relationships in a single JSON response: `{ "entities": [...], "relationships": [...] }`.
|
||||||
|
- Entity upsert uses `ON CONFLICT(name, type) DO UPDATE` — preserves existing `notes` if the new extraction returns null, increments `mention_count`, updates `last_seen_at`.
|
||||||
|
- Relationship upsert uses `ON CONFLICT(from_id, to_id, label) DO UPDATE` — increments `mention_count`, preserves existing `notes` if new is null.
|
||||||
|
- Relationships are resolved by looking up both endpoints in the `entityMap` built during entity processing — if either entity wasn't saved (filtered out or invalid type), the relationship is silently dropped.
|
||||||
|
- After upsert, embeds each entity as `"${name} (${type}): ${notes}"` and stores to Qdrant with `projectId` in the payload for project-scoped filtering.
|
||||||
|
|
||||||
|
> For full details see `docs/services/entity-extraction.md` and `docs/services/knowledge-graph.md`.
|
||||||
|
|
||||||
|
## Knowledge Graph
|
||||||
|
|
||||||
|
`src/graph/index.js` provides two SQLite traversal functions:
|
||||||
|
|
||||||
|
- **`getNeighborhood(entityId, depth)`** — Single-entity recursive CTE traversal. Bidirectional (follows edges in both directions). Returns `{ nodes: [...entities], edges: [...relationships] }`. Depth defaults to `ENTITIES.GRAPH_HOP_DEPTH` (1), max enforced to 3 at the HTTP layer.
|
||||||
|
|
||||||
|
- **`getEntityNeighbors(entityIds[])`** — Bulk 1-hop version for orchestration. Given a set of seed entity IDs, returns their immediate neighbors plus all edges within the combined node set.
|
||||||
|
|
||||||
|
The recursive CTE uses `UNION` (not `UNION ALL`) to eliminate cycles and duplicate visits automatically.
|
||||||
|
|
||||||
|
> For full design rationale and usage see `docs/services/knowledge-graph.md`.
|
||||||
|
|
||||||
|
## Summarization Strategy
|
||||||
|
|
||||||
|
`src/summarization/project.js`:
|
||||||
|
|
||||||
|
- Preferred path: generate a project overview from existing **session-level summaries** (higher-level abstraction, shorter input).
|
||||||
|
- Fallback path: if no session summaries exist, summarize raw episodes directly (up to `SUMMARIES.MAX_PROJECT_EPISODE_LIMIT`).
|
||||||
|
- Both paths truncate input at `SUMMARIES.MAX_SUMMARY_CHARS` (8,000 chars) by slicing from the end (most recent content wins).
|
||||||
|
- Strips ChatML tokens from the Ollama response (`<|im_start|>`, `<|im_end|>`).
|
||||||
|
- Uses temp 0.2 and `num_predict 1200`.
|
||||||
|
|
||||||
|
## Qdrant Client
|
||||||
|
|
||||||
|
`src/semantic/index.js` creates the Qdrant client lazily on first use and reuses it. All three collections (`episodes`, `entities`, `summaries`) are created at startup if missing. There is no connection health check — if Qdrant is unreachable, semantic operations throw at call time.
|
||||||
|
|
||||||
|
## API Endpoints Quick Reference
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Static response, no dependency checks |
|
||||||
|
| GET/POST | `/sessions` | POST requires `externalId`; duplicate → 409 |
|
||||||
|
| GET/PATCH | `/sessions/by-external/:externalId` | PATCH accepts `name`, `projectId` |
|
||||||
|
| DELETE | `/sessions/by-external/:externalId` | Cascades to episodes, summaries, relationships |
|
||||||
|
| GET/POST | `/episodes` | POST triggers async embedding + entity/relationship extraction |
|
||||||
|
| GET | `/episodes/search` | FTS5 search; route must precede `/:id` |
|
||||||
|
| GET | `/sessions/:id/episodes` | Paginated, ordered `created_at DESC` |
|
||||||
|
| DELETE | `/episodes/:id` | Removes from SQLite + async Qdrant delete |
|
||||||
|
| POST | `/entities` | Upsert by `(name, type)`; increments `mention_count` on conflict |
|
||||||
|
| GET | `/entities/by-type/:type` | All entities of given type |
|
||||||
|
| GET/DELETE | `/entities/:id` | |
|
||||||
|
| POST | `/relationships` | Upsert by `(fromId, toId, label)`; increments `mention_count` on conflict. Body: `fromId`, `toId`, `label`, `notes` (optional) |
|
||||||
|
| GET | `/entities/:id/relationships` | Outbound only |
|
||||||
|
| DELETE | `/relationships` | Body: `fromId`, `toId`, `label` |
|
||||||
|
| GET | `/graph/neighborhood/:entityId` | Single-entity neighborhood; `?depth=` (default 1, max 3) |
|
||||||
|
| POST | `/graph/neighbors` | Bulk 1-hop neighborhood; body: `{ entityIds: [...] }` |
|
||||||
|
| GET/POST | `/projects` | POST requires non-empty `name` |
|
||||||
|
| GET/PATCH/DELETE | `/projects/:id` | |
|
||||||
|
| POST | `/projects/:id/summarize` | On-demand overview generation; 422 if no data |
|
||||||
|
| GET | `/projects/:id/overview` | Returns null (not 404) if no overview exists |
|
||||||
|
| GET | `/projects/:id/summaries` | All summaries for project |
|
||||||
|
| POST | `/summaries` | Requires `content` + at least one of `sessionId`/`projectId` |
|
||||||
|
| GET | `/sessions/:id/summaries` | |
|
||||||
|
| PATCH/DELETE | `/summaries/:id` | |
|
||||||
@@ -1,6 +1,6 @@
|
|||||||
const Database = require('better-sqlite3');
|
const Database = require('better-sqlite3');
|
||||||
const schema = require('./schema');
|
const schema = require('./schema');
|
||||||
const {getEnv, SQLITE } = require('@nexusai/shared');
|
const {getEnv, SQLITE, logger } = require('@nexusai/shared');
|
||||||
|
|
||||||
let db; // Declare db variable in a scope accessible to all functions
|
let db; // Declare db variable in a scope accessible to all functions
|
||||||
|
|
||||||
@@ -54,15 +54,20 @@ function getDB() {
|
|||||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
|
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_session ON summaries(session_id)`);
|
||||||
} catch {}
|
} catch {}
|
||||||
|
|
||||||
try {
|
try { db.exec(`ALTER TABLE entities ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
db.exec(`CREATE INDEX IF NOT EXISTS idx_summaries_project ON summaries(project_id)`);
|
try { db.exec(`ALTER TABLE entities ADD COLUMN confidence REAL NOT NULL DEFAULT 1.0`) } catch {}
|
||||||
} catch {}
|
try { db.exec(`ALTER TABLE entities ADD COLUMN source TEXT NOT NULL DEFAULT 'extraction'`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE entities ADD COLUMN last_seen_at INTEGER`) } catch {}
|
||||||
|
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN mention_count INTEGER NOT NULL DEFAULT 1`) } catch {}
|
||||||
|
try { db.exec(`ALTER TABLE relationships ADD COLUMN notes TEXT`) } catch {}
|
||||||
|
|
||||||
|
|
||||||
// Sync FTS index with any existing episodes data
|
// Sync FTS index with any existing episodes data
|
||||||
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
|
db.exec(`INSERT OR REPLACE INTO episodes_fts(rowid, user_message, ai_response)
|
||||||
SELECT id, user_message, ai_response FROM episodes`);
|
SELECT id, user_message, ai_response FROM episodes`);
|
||||||
|
|
||||||
console.log(`Connected to SQLite database at ${path}`);
|
logger.info(`Connected to SQLite database at ${path}`);
|
||||||
}
|
}
|
||||||
return db;
|
return db;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -38,6 +38,20 @@ const schema = `
|
|||||||
UNIQUE(from_id, to_id, label)
|
UNIQUE(from_id, to_id, label)
|
||||||
);
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_relationships_from ON relationships(from_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_relationships_to ON relationships(to_id);
|
||||||
|
|
||||||
|
CREATE TABLE IF NOT EXISTS entity_episodes (
|
||||||
|
entity_id INTEGER NOT NULL REFERENCES entities(id) ON DELETE CASCADE,
|
||||||
|
episode_id INTEGER NOT NULL REFERENCES episodes(id) ON DELETE CASCADE,
|
||||||
|
PRIMARY KEY (entity_id, episode_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_entity_episodes_entity ON entity_episodes(entity_id);
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_entity_episodes_episode ON entity_episodes(episode_id);
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
CREATE TABLE IF NOT EXISTS projects (
|
CREATE TABLE IF NOT EXISTS projects (
|
||||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||||
name TEXT NOT NULL,
|
name TEXT NOT NULL,
|
||||||
|
|||||||
@@ -50,4 +50,27 @@ function deleteSummary(id) {
|
|||||||
getDB().prepare(`DELETE FROM summaries WHERE id = ?`).run(id);
|
getDB().prepare(`DELETE FROM summaries WHERE id = ?`).run(id);
|
||||||
}
|
}
|
||||||
|
|
||||||
module.exports = { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary };
|
// Fetches session summaries that belong to sessions in a given project
|
||||||
|
// Joins through sessions table since session summaries don't store project_id directly
|
||||||
|
function getSessionSummariesForProject(projectId) {
|
||||||
|
const db = getDB();
|
||||||
|
return db.prepare(`
|
||||||
|
SELECT s.* FROM summaries s
|
||||||
|
JOIN sessions sess ON sess.id = s.session_id
|
||||||
|
WHERE sess.project_id = ? AND s.session_id IS NOT NULL
|
||||||
|
ORDER BY s.created_at ASC
|
||||||
|
`).all(projectId).map(parseRow);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fetches the most recent project-level overview summary (session_id IS NULL distinguishes it)
|
||||||
|
function getProjectOverviewSummary(projectId) {
|
||||||
|
const db = getDB();
|
||||||
|
const row = db.prepare(`
|
||||||
|
SELECT * FROM summaries
|
||||||
|
WHERE project_id = ? AND session_id IS NULL
|
||||||
|
ORDER BY created_at DESC LIMIT 1
|
||||||
|
`).get(projectId);
|
||||||
|
return row ? parseRow(row) : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary, getSessionSummariesForProject, getProjectOverviewSummary };
|
||||||
@@ -1,13 +1,18 @@
|
|||||||
const semantic = require('../semantic')
|
const semantic = require('../semantic')
|
||||||
const { getEnv, SERVICES, formatEpisodeText } = require('@nexusai/shared');
|
const { getEnv, SERVICES, formatEpisodeText, ENTITIES, logger } = require('@nexusai/shared');
|
||||||
const { upsertEntity } = require('./index');
|
const { upsertEntity, upsertRelationship, linkEntityToEpisode } = require('./index');
|
||||||
|
|
||||||
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b'); // ChatML format — see buildExtractionPrompt
|
||||||
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
|
const EMBEDDING_SERVICE_URL = getEnv('EMBEDDING_SERVICE_URL', SERVICES.EMBEDDING_URL);
|
||||||
|
|
||||||
const ENTITY_TYPES = ['person', 'place', 'project', 'technology', 'concept', 'organization'];
|
const ENTITY_TYPES = ENTITIES.TYPES;
|
||||||
|
const IGNORED_NAMES = ['good morning', 'good night', 'hello', 'goodbye', 'thanks', 'thank you'];
|
||||||
|
|
||||||
|
// NOTE: This prompt uses ChatML format (<|im_start|> / <|im_end|> tags), which is
|
||||||
|
// specific to qwen-family models. If EXTRACTION_MODEL is changed to a Llama-family
|
||||||
|
// or other model, this format will need to change — most alternatives use either
|
||||||
|
// plain text or [INST] / <<SYS>> tags. Silent degradation is likely if mismatched.
|
||||||
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
||||||
const knownBlock = knownEntities.length > 0
|
const knownBlock = knownEntities.length > 0
|
||||||
? [
|
? [
|
||||||
@@ -19,21 +24,24 @@ function buildExtractionPrompt(userMessage, aiResponse, knownEntities = []) {
|
|||||||
|
|
||||||
return [
|
return [
|
||||||
'<|im_start|>system',
|
'<|im_start|>system',
|
||||||
'You are a named entity extractor. You output only valid JSON.',
|
'You are a named entity and relationship extractor. You output only valid JSON.',
|
||||||
'<|im_end|>',
|
'<|im_end|>',
|
||||||
'<|im_start|>user',
|
'<|im_start|>user',
|
||||||
'Read the conversation below and extract every named entity mentioned.',
|
'Read the conversation below and extract all named entities and the relationships between them.',
|
||||||
`Entity types to extract: ${ENTITY_TYPES.join(', ')}`,
|
`Entity types: ${ENTITY_TYPES.join(', ')}`,
|
||||||
'For each entity found, provide: name, type, and a one-sentence notes field.',
|
'Use "character" for any fictional, game, or media characters (e.g. characters from anime, games, books, TV shows, movies)',
|
||||||
'Return your answer as: { "entities": [ ... ] }',
|
'Use "person" only for real people',
|
||||||
'For each entity found, you MUST provide a non-empty notes field describing it based on the conversation.',
|
'For each entity provide:',
|
||||||
'For each entity found, provide:',
|
' "name": short proper noun only (max 4 words)',
|
||||||
' "name": short proper noun only (max 4 words, e.g. "Sydney", "NexusAI", "Tim")',
|
|
||||||
' "type": one of the valid types',
|
' "type": one of the valid types',
|
||||||
' "notes": one specific sentence about this entity based on the conversation (not generic)',
|
' "notes": one specific sentence about this entity based on the conversation',
|
||||||
|
'For relationships, use snake_case verb labels (e.g. works_on, manages, uses, knows, located_in, part_of, created_by).',
|
||||||
|
'Only include relationships between entities you have listed above.',
|
||||||
|
'Return this exact JSON structure:',
|
||||||
|
'{ "entities": [{"name": "...", "type": "...", "notes": "..."}], "relationships": [{"from": "...", "fromType": "...", "to": "...", "toType": "...", "label": "...", "notes": "..."}] }',
|
||||||
'',
|
'',
|
||||||
knownBlock,
|
knownBlock,
|
||||||
'--- CONVERSATION ---', // clear delimiter helps smaller models
|
'--- CONVERSATION ---',
|
||||||
`User: ${userMessage}`,
|
`User: ${userMessage}`,
|
||||||
`Assistant: ${aiResponse}`,
|
`Assistant: ${aiResponse}`,
|
||||||
'--- END CONVERSATION ---',
|
'--- END CONVERSATION ---',
|
||||||
@@ -57,17 +65,13 @@ async function embedEntity(entity) {
|
|||||||
return data.embedding;
|
return data.embedding;
|
||||||
}
|
}
|
||||||
|
|
||||||
async function extractAndStoreEntities(userMessage, aiResponse, projectId=null) {
|
async function extractAndStoreEntities(userMessage, aiResponse, episodeId=null, projectId=null) {
|
||||||
console.log('[entities] Extraction triggered')
|
logger.info('[entities] Extraction triggered')
|
||||||
try {
|
try {
|
||||||
// Fetch existing entities to guide the model toward consistent name/type pairs
|
// Fetch existing entities to guide the model toward consistent name/type pairs
|
||||||
const db = require('../db').getDB();
|
const db = require('../db').getDB();
|
||||||
console.log('[entities] fetching known entities...'); // add this
|
|
||||||
const knownEntities = db.prepare(`SELECT name, type FROM entities ORDER BY rowid DESC LIMIT 20`).all();
|
const knownEntities = db.prepare(`SELECT name, type FROM entities ORDER BY rowid DESC LIMIT 20`).all();
|
||||||
console.log('[entities] known entities count:', knownEntities.length);
|
|
||||||
|
|
||||||
const prompt = buildExtractionPrompt(userMessage, aiResponse, knownEntities);
|
const prompt = buildExtractionPrompt(userMessage, aiResponse, knownEntities);
|
||||||
console.log('[entities] prompt preview:', JSON.stringify(prompt.slice(-300)));
|
|
||||||
|
|
||||||
|
|
||||||
const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
|
const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
|
||||||
@@ -79,32 +83,53 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
|||||||
stream: false,
|
stream: false,
|
||||||
format: 'json',
|
format: 'json',
|
||||||
options: {
|
options: {
|
||||||
temperature: 0.1,
|
temperature: ENTITIES.TEMPERATURE,
|
||||||
num_predict: 1024,
|
num_predict: ENTITIES.NUM_PREDICT,
|
||||||
},
|
},
|
||||||
}),
|
}),
|
||||||
|
signal: AbortSignal.timeout(60_000),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||||
|
|
||||||
const data = await res.json();
|
const data = await res.json();
|
||||||
const raw = data.response?.trim() ?? '';
|
const raw = data.response?.trim() ?? '';
|
||||||
console.log('[entities] raw response:', JSON.stringify(raw.slice(0, 300)));
|
|
||||||
|
|
||||||
const parsed = JSON.parse(raw);
|
const jsonMatch = raw.match(/\{[\s\S]*\}/);
|
||||||
|
if (!jsonMatch) {
|
||||||
|
logger.warn('[entities] No JSON object found in response');
|
||||||
|
logger.debug('[entities] Raw response was:', raw);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
let parsed;
|
||||||
|
try {
|
||||||
|
parsed = JSON.parse(jsonMatch[0]);
|
||||||
|
} catch (err) {
|
||||||
|
logger.warn('[entities] Failed to parse extraction response:', err.message);
|
||||||
|
logger.debug('[entities] Raw response was:', raw);
|
||||||
|
return;
|
||||||
|
}
|
||||||
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
const entities = Array.isArray(parsed.entities) ? parsed.entities : [];
|
||||||
if (entities.length === 0) throw new Error('No entities in response');
|
if (entities.length === 0) {
|
||||||
|
logger.debug('[entities] No entities found in this exchange — skipping');
|
||||||
if (!Array.isArray(entities)) throw new Error('Response was not a JSON array');
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Map of "name::type" → saved entity, used for relationship resolution below
|
||||||
|
const entityMap = new Map();
|
||||||
let saved = 0;
|
let saved = 0;
|
||||||
|
|
||||||
for (const { name, type, notes } of entities) {
|
for (const { name, type, notes } of entities) {
|
||||||
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
|
if (!name || !type || !ENTITY_TYPES.includes(type)) continue;
|
||||||
|
if (IGNORED_NAMES.includes(name.toLowerCase())) continue;
|
||||||
|
|
||||||
const entity = upsertEntity(name, type, notes ?? null);
|
const entity = upsertEntity(name, type, notes ?? null);
|
||||||
console.log('[entities] Upserted entity:', entity);
|
entityMap.set(`${name}::${type}`, entity);
|
||||||
|
logger.info('[entities] Upserted entity:', entity);
|
||||||
|
|
||||||
|
if (episodeId) linkEntityToEpisode(entity.id, episodeId);
|
||||||
|
|
||||||
// Embed and upsert to Qdrant fire-and-forget
|
|
||||||
embedEntity(entity)
|
embedEntity(entity)
|
||||||
.then(vector => semantic.upsertEntity(entity.id, vector, {
|
.then(vector => semantic.upsertEntity(entity.id, vector, {
|
||||||
name: entity.name,
|
name: entity.name,
|
||||||
@@ -113,19 +138,34 @@ async function extractAndStoreEntities(userMessage, aiResponse, projectId=null)
|
|||||||
projectId: projectId ?? null,
|
projectId: projectId ?? null,
|
||||||
}))
|
}))
|
||||||
.catch(err => {
|
.catch(err => {
|
||||||
console.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
|
logger.warn(`[entities] Failed to embed entity "${entity.name}":`, err.message);
|
||||||
console.warn(`[entities] Embed error stack:`, err.stack); // add this
|
|
||||||
});
|
});
|
||||||
|
|
||||||
saved++;
|
saved++;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (saved > 0) console.log(`[entities] Extracted and stored ${saved} entities`);
|
if (saved > 0) logger.info(`[entities] Extracted and stored ${saved} entities`);
|
||||||
|
|
||||||
|
// Process extracted relationships — both entities must have been saved above
|
||||||
|
const relationships = Array.isArray(parsed.relationships) ? parsed.relationships : [];
|
||||||
|
let relSaved = 0;
|
||||||
|
|
||||||
|
for (const { from, fromType, to, toType, label, notes } of relationships) {
|
||||||
|
if (!from || !fromType || !to || !toType || !label) continue;
|
||||||
|
|
||||||
|
const fromEntity = entityMap.get(`${from}::${fromType}`);
|
||||||
|
const toEntity = entityMap.get(`${to}::${toType}`);
|
||||||
|
if (!fromEntity || !toEntity) continue;
|
||||||
|
|
||||||
|
upsertRelationship(fromEntity.id, toEntity.id, label, notes ?? null);
|
||||||
|
relSaved++;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (relSaved > 0) logger.info(`[entities] Extracted and stored ${relSaved} relationships`);
|
||||||
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
// Non-critical — log and move on, episode is already saved
|
// Non-critical — log and move on, episode is already saved
|
||||||
console.warn('[entities] Extraction failed:', err.message);
|
logger.warn('[entities] Extraction failed:', err.message);
|
||||||
console.warn('[entities] Stack:', err.stack);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -4,18 +4,23 @@ const { parseRow } = require ('@nexusai/shared')
|
|||||||
/******* Entities ********/
|
/******* Entities ********/
|
||||||
|
|
||||||
// Upsert an entity - insert or update if (name, type) already exists
|
// Upsert an entity - insert or update if (name, type) already exists
|
||||||
function upsertEntity(name, type, notes = null, metadata = null) {
|
function upsertEntity(name, type, notes = null, metadata = null, source = 'extraction') {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
INSERT INTO entities (name, type, notes, metadata)
|
INSERT INTO entities (name, type, notes, metadata, source, last_seen_at)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?, unixepoch())
|
||||||
ON CONFLICT(name, type) DO UPDATE SET
|
ON CONFLICT(name, type) DO UPDATE SET
|
||||||
|
-- First extraction wins: notes are never overwritten once set.
|
||||||
|
-- Revisit during Memory Consolidation Lifecycle (Phase 2) — once entity
|
||||||
|
-- quality scoring exists, a higher-confidence extraction should be able
|
||||||
|
-- to replace stale notes rather than being silently dropped.
|
||||||
notes = COALESCE(entities.notes, excluded.notes),
|
notes = COALESCE(entities.notes, excluded.notes),
|
||||||
metadata = excluded.metadata,
|
metadata = excluded.metadata,
|
||||||
|
mention_count = entities.mention_count + 1,
|
||||||
|
last_seen_at = unixepoch(),
|
||||||
updated_at = unixepoch()
|
updated_at = unixepoch()
|
||||||
`);
|
`);
|
||||||
const result = stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null);
|
stmt.run(name, type, notes, metadata ? JSON.stringify(metadata) : null, source);
|
||||||
|
|
||||||
return getEntityByNameType(name, type);
|
return getEntityByNameType(name, type);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -40,15 +45,17 @@ function deleteEntity(id) {
|
|||||||
/********* Relationships *********/
|
/********* Relationships *********/
|
||||||
|
|
||||||
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
|
// Upsert a relationship, insert or ignore if (from_id, to_id, label) already exists
|
||||||
function upsertRelationship(fromId, toId, label, metadata = null){
|
function upsertRelationship(fromId, toId, label, notes = null, metadata = null) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
INSERT INTO relationships (from_id, to_id, label, metadata)
|
INSERT INTO relationships (from_id, to_id, label, notes, metadata)
|
||||||
VALUES (?, ?, ?, ?)
|
VALUES (?, ?, ?, ?, ?)
|
||||||
ON CONFLICT(from_id, to_id, label) DO NOTHING
|
ON CONFLICT(from_id, to_id, label) DO UPDATE SET
|
||||||
|
mention_count = relationships.mention_count + 1,
|
||||||
|
-- First extraction wins for notes — same policy as entities.
|
||||||
|
notes = COALESCE(relationships.notes, excluded.notes)
|
||||||
`);
|
`);
|
||||||
|
stmt.run(fromId, toId, label, notes, metadata ? JSON.stringify(metadata) : null);
|
||||||
const result = stmt.run(fromId, toId, label, metadata ?JSON.stringify(metadata) : null);
|
|
||||||
return getRelationship(fromId, toId, label);
|
return getRelationship(fromId, toId, label);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -69,7 +76,7 @@ function getEntityByNameType(name, type) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Retrive all relationships originating from a given entity
|
// Retrive all relationships originating from a given entity
|
||||||
function getRelationshipsByEntity(entityId) {
|
function getOutboundRelationships(entityId) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
|
return db.prepare(`SELECT * FROM relationships WHERE from_id = ?`).all(entityId).map(parseRow);
|
||||||
}
|
}
|
||||||
@@ -81,14 +88,23 @@ function deleteRelationship(fromId, toId, label) {
|
|||||||
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
|
db.prepare(`DELETE FROM relationships WHERE from_id = ? AND to_id = ? AND label = ?`).run(fromId, toId, label);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function linkEntityToEpisode(entityId, episodeId) {
|
||||||
|
const db = getDB();
|
||||||
|
db.prepare(`
|
||||||
|
INSERT OR IGNORE INTO entity_episodes (entity_id, episode_id)
|
||||||
|
VALUES (?, ?)
|
||||||
|
`).run(entityId, episodeId);
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
upsertEntity,
|
upsertEntity,
|
||||||
getEntity,
|
getEntity,
|
||||||
getEntitiesByType,
|
getEntitiesByType,
|
||||||
getEntityByNameType,
|
getEntityByNameType,
|
||||||
deleteEntity,
|
deleteEntity,
|
||||||
|
linkEntityToEpisode,
|
||||||
upsertRelationship,
|
upsertRelationship,
|
||||||
getRelationship,
|
getRelationship,
|
||||||
getRelationshipsByEntity,
|
getOutboundRelationships,
|
||||||
deleteRelationship
|
deleteRelationship
|
||||||
}
|
}
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
const {getDB} = require('../db');
|
const {getDB} = require('../db');
|
||||||
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText } = require('@nexusai/shared');
|
const { EPISODIC, getEnv, SERVICES, parseRow, formatEpisodeText, SUMMARIES, logger } = require('@nexusai/shared');
|
||||||
const semantic = require('../semantic');
|
const semantic = require('../semantic');
|
||||||
const { extractAndStoreEntities } = require('../entities/extraction')
|
const { extractAndStoreEntities } = require('../entities/extraction')
|
||||||
|
|
||||||
@@ -25,7 +25,7 @@ function getSession(id) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
function getSessions(limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = 0, projectId = null) {
|
function getSessions(limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = EPISODIC.DEFAULT_OFFSET, projectId = null) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = projectId
|
const stmt = projectId
|
||||||
? db.prepare(`
|
? db.prepare(`
|
||||||
@@ -98,21 +98,20 @@ function deleteSessionByExternalId(externalId) {
|
|||||||
|
|
||||||
// --Episodes --------------------------------------------------
|
// --Episodes --------------------------------------------------
|
||||||
// Creates a new episode linked to a session, with user message, AI response, optional token count, and metadata
|
// Creates a new episode linked to a session, with user message, AI response, optional token count, and metadata
|
||||||
async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = null, metadata = null, projectId=null) {
|
async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = null, projectId=null) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
|
|
||||||
// Wrap insert + session touch in a transaction — both succeed or neither does
|
// Wrap insert + session touch in a transaction — both succeed or neither does
|
||||||
const insert = db.transaction(() => {
|
const insert = db.transaction(() => {
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
INSERT INTO episodes (session_id, user_message, ai_response, token_count, metadata)
|
INSERT INTO episodes (session_id, user_message, ai_response, token_count)
|
||||||
VALUES (?, ?, ?, ?, ?)
|
VALUES (?, ?, ?, ?)
|
||||||
`);
|
`);
|
||||||
const result = stmt.run(
|
const result = stmt.run(
|
||||||
sessionId,
|
sessionId,
|
||||||
userMessage,
|
userMessage,
|
||||||
aiResponse,
|
aiResponse,
|
||||||
tokenCount,
|
tokenCount,
|
||||||
metadata ? JSON.stringify(metadata) : null
|
|
||||||
);
|
);
|
||||||
touchSession(sessionId);
|
touchSession(sessionId);
|
||||||
return getEpisode(result.lastInsertRowid);
|
return getEpisode(result.lastInsertRowid);
|
||||||
@@ -126,10 +125,10 @@ async function createEpisode(sessionId, userMessage, aiResponse, tokenCount = nu
|
|||||||
sessionId: episode.session_id,
|
sessionId: episode.session_id,
|
||||||
createdAt: episode.created_at
|
createdAt: episode.created_at
|
||||||
}))
|
}))
|
||||||
.catch(err => console.error(`Failed to embed episode ${episode.id}:`, err.message));
|
.catch(err => logger.error(`Failed to embed episode ${episode.id}:`, err.message));
|
||||||
|
|
||||||
extractAndStoreEntities(userMessage, aiResponse, projectId)
|
extractAndStoreEntities(userMessage, aiResponse, episode.id, projectId)
|
||||||
.catch(err => console.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
.catch(err => logger.error(`Failed to extract entities for episode ${episode.id}:`, err.message));
|
||||||
|
|
||||||
|
|
||||||
return episode;
|
return episode;
|
||||||
@@ -143,7 +142,7 @@ function getEpisode(id) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Retrieves episodes for a given session, ordered by creation time descending, with pagination
|
// Retrieves episodes for a given session, ordered by creation time descending, with pagination
|
||||||
function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = 0) {
|
function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, offset = EPISODIC.DEFAULT_OFFSET) {
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
SELECT * FROM episodes
|
SELECT * FROM episodes
|
||||||
@@ -155,30 +154,41 @@ function getEpisodesBySession(sessionId, limit = EPISODIC.DEFAULT_PAGE_SIZE, off
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Retrieves recent episodes across all sessions, ordered by creation time descending, with a limit
|
// Retrieves recent episodes across all sessions, ordered by creation time descending, with a limit
|
||||||
function getRecentEpisodes(limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
|
function getRecentEpisodes(sessionId, limit = EPISODIC.DEFAULT_RECENT_LIMIT) {
|
||||||
// Cross-session recent episodes — useful for recency-based retrieval
|
// Cross-session recent episodes — useful for recency-based retrieval
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const stmt = db.prepare(`
|
||||||
SELECT * FROM episodes
|
SELECT * FROM episodes
|
||||||
|
WHERE session_id = ?
|
||||||
ORDER BY created_at DESC
|
ORDER BY created_at DESC
|
||||||
LIMIT ?
|
LIMIT ?
|
||||||
`);
|
`);
|
||||||
return stmt.all(limit).map(parseRow);
|
return stmt.all(sessionId, limit).map(parseRow);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
|
// Searches episodes using FTS5 full-text search, ordered by relevance, with a limit
|
||||||
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT) {
|
function searchEpisodes(query, limit = EPISODIC.DEFAULT_SEARCH_LIMIT, sessionIds = null) {
|
||||||
// FTS5 full-text search across all episodes
|
|
||||||
const db = getDB();
|
const db = getDB();
|
||||||
const stmt = db.prepare(`
|
const safeQuery = `"${query.replace(/"/g, '""')}"`;
|
||||||
|
if (sessionIds && sessionIds.length > 0) {
|
||||||
|
const ph = sessionIds.map(() => '?').join(',');
|
||||||
|
return db.prepare(`
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
|
WHERE episodes_fts MATCH ?
|
||||||
|
AND e.session_id IN (${ph})
|
||||||
|
ORDER BY rank
|
||||||
|
LIMIT ?
|
||||||
|
`).all(safeQuery, ...sessionIds, limit).map(parseRow);
|
||||||
|
}
|
||||||
|
return db.prepare(`
|
||||||
SELECT e.* FROM episodes e
|
SELECT e.* FROM episodes e
|
||||||
JOIN episodes_fts fts ON e.id = fts.rowid
|
JOIN episodes_fts fts ON e.id = fts.rowid
|
||||||
WHERE episodes_fts MATCH ?
|
WHERE episodes_fts MATCH ?
|
||||||
ORDER BY rank
|
ORDER BY rank
|
||||||
LIMIT ?
|
LIMIT ?
|
||||||
`);
|
`).all(safeQuery, limit).map(parseRow);
|
||||||
return stmt.all(query, limit).map(parseRow);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Deletes an episode by its ID
|
// Deletes an episode by its ID
|
||||||
@@ -197,7 +207,8 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
|
|||||||
const res = await fetch(`${url}/embed`, {
|
const res = await fetch(`${url}/embed`, {
|
||||||
method: 'POST',
|
method: 'POST',
|
||||||
headers: { 'Content-Type': 'application/json' },
|
headers: { 'Content-Type': 'application/json' },
|
||||||
body: JSON.stringify({ text })
|
body: JSON.stringify({ text }),
|
||||||
|
signal: AbortSignal.timeout(30_000),
|
||||||
})
|
})
|
||||||
|
|
||||||
if (!res.ok) {
|
if (!res.ok) {
|
||||||
@@ -207,6 +218,17 @@ async function getEpisodeEmbedding(userMessage, aiResponse){
|
|||||||
return data.embedding;
|
return data.embedding;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function getEpisodesByProject(projectId, limit = SUMMARIES.MAX_PROJECT_EPISODE_LIMIT) {
|
||||||
|
const db = getDB();
|
||||||
|
return db.prepare(`
|
||||||
|
SELECT e.* FROM episodes e
|
||||||
|
JOIN sessions s ON s.id = e.session_id
|
||||||
|
WHERE s.project_id = ?
|
||||||
|
ORDER BY e.created_at ASC
|
||||||
|
LIMIT ?
|
||||||
|
`).all(projectId, limit).map(parseRow);
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
createSession,
|
createSession,
|
||||||
getSession,
|
getSession,
|
||||||
@@ -221,5 +243,6 @@ module.exports = {
|
|||||||
getEpisodesBySession,
|
getEpisodesBySession,
|
||||||
getRecentEpisodes,
|
getRecentEpisodes,
|
||||||
searchEpisodes,
|
searchEpisodes,
|
||||||
deleteEpisode
|
deleteEpisode,
|
||||||
|
getEpisodesByProject
|
||||||
};
|
};
|
||||||
77
packages/memory-service/src/graph/index.js
Normal file
77
packages/memory-service/src/graph/index.js
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
const { getDB } = require('../db');
|
||||||
|
const { parseRow, ENTITIES } = require('@nexusai/shared');
|
||||||
|
|
||||||
|
// Single-entity neighborhood via recursive CTE — bidirectional, configurable depth
|
||||||
|
function getNeighborhood(entityId, depth = ENTITIES.GRAPH_HOP_DEPTH) {
|
||||||
|
const db = getDB();
|
||||||
|
|
||||||
|
const nodeRows = db.prepare(`
|
||||||
|
WITH RECURSIVE traverse(entity_id, depth) AS (
|
||||||
|
SELECT ?, 0
|
||||||
|
UNION
|
||||||
|
SELECT
|
||||||
|
CASE WHEN r.from_id = t.entity_id THEN r.to_id ELSE r.from_id END,
|
||||||
|
t.depth + 1
|
||||||
|
FROM relationships r
|
||||||
|
JOIN traverse t ON (r.from_id = t.entity_id OR r.to_id = t.entity_id)
|
||||||
|
WHERE t.depth < ?
|
||||||
|
)
|
||||||
|
SELECT DISTINCT entity_id FROM traverse
|
||||||
|
`).all(entityId, depth);
|
||||||
|
|
||||||
|
const nodeIds = nodeRows.map(r => r.entity_id);
|
||||||
|
if (nodeIds.length === 0) return { nodes: [], edges: [] };
|
||||||
|
|
||||||
|
const ph = nodeIds.map(() => '?').join(',');
|
||||||
|
const nodes = db.prepare(
|
||||||
|
`SELECT * FROM entities WHERE id IN (${ph})`
|
||||||
|
).all(...nodeIds).map(parseRow);
|
||||||
|
|
||||||
|
const edges = db.prepare(
|
||||||
|
`SELECT * FROM relationships WHERE from_id IN (${ph}) AND to_id IN (${ph})`
|
||||||
|
).all(...nodeIds, ...nodeIds).map(parseRow);
|
||||||
|
|
||||||
|
return { nodes, edges };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Bulk 1-hop neighborhood for orchestration — seeds are entity IDs from Qdrant search
|
||||||
|
function getEntityNeighbors(entityIds) {
|
||||||
|
if (!entityIds.length) return { nodes: [], edges: [] };
|
||||||
|
const db = getDB();
|
||||||
|
|
||||||
|
const ph = entityIds.map(() => '?').join(',');
|
||||||
|
|
||||||
|
// entityIds appears three times — once for the CASE (finding the neighbor),
|
||||||
|
// and once each for the FROM and TO sides of the WHERE clause
|
||||||
|
const neighborRows = db.prepare(`
|
||||||
|
SELECT DISTINCT
|
||||||
|
CASE WHEN from_id IN (${ph}) THEN to_id ELSE from_id END AS entity_id
|
||||||
|
FROM relationships
|
||||||
|
WHERE from_id IN (${ph}) OR to_id IN (${ph})
|
||||||
|
`).all(...entityIds, ...entityIds, ...entityIds);
|
||||||
|
|
||||||
|
const allIds = [...new Set([...entityIds, ...neighborRows.map(r => r.entity_id)])];
|
||||||
|
const allPh = allIds.map(() => '?').join(',');
|
||||||
|
|
||||||
|
const nodes = db.prepare(
|
||||||
|
`SELECT * FROM entities WHERE id IN (${allPh})`
|
||||||
|
).all(...allIds).map(parseRow);
|
||||||
|
|
||||||
|
const edges = db.prepare(
|
||||||
|
`SELECT * FROM relationships WHERE from_id IN (${allPh}) AND to_id IN (${allPh})`
|
||||||
|
).all(...allIds, ...allIds).map(parseRow);
|
||||||
|
|
||||||
|
return { nodes, edges };
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns episode IDs linked to any of the given entity IDs via entity_episodes
|
||||||
|
function getEpisodeIdsByEntities(entityIds) {
|
||||||
|
if (!entityIds.length) return [];
|
||||||
|
const db = getDB();
|
||||||
|
const ph = entityIds.map(() => '?').join(',');
|
||||||
|
return db.prepare(
|
||||||
|
`SELECT DISTINCT episode_id FROM entity_episodes WHERE entity_id IN (${ph})`
|
||||||
|
).all(...entityIds).map(r => r.episode_id);
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { getNeighborhood, getEntityNeighbors, getEpisodeIdsByEntities };
|
||||||
@@ -1,16 +1,18 @@
|
|||||||
require ('dotenv').config();
|
require ('dotenv').config();
|
||||||
const express = require('express');
|
const express = require('express');
|
||||||
const {getEnv, PORTS, EPISODIC} = require('@nexusai/shared');
|
const {getEnv, PORTS, EPISODIC, logger} = require('@nexusai/shared');
|
||||||
const { getDB } = require('./db');
|
const { getDB } = require('./db');
|
||||||
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
|
const { createProject, getProjects, getProject, updateProject, deleteProject } = require('./db/projects');
|
||||||
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
|
const { createSummary, getSummary, getSummariesBySession, getSummariesByProject, updateSummary, deleteSummary } = require('./db/summaries');
|
||||||
|
const { generateAndStoreProjectSummary } = require('./summarization/project');
|
||||||
|
const graph = require('./graph');
|
||||||
|
|
||||||
const episodic = require('./episodic');
|
const episodic = require('./episodic');
|
||||||
const semantic = require('./semantic');
|
const semantic = require('./semantic');
|
||||||
const entities = require('./entities');
|
const entities = require('./entities');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '2mb' }));
|
||||||
|
|
||||||
const PORT = getEnv('PORT', PORTS.MEMORY);
|
const PORT = getEnv('PORT', PORTS.MEMORY);
|
||||||
|
|
||||||
@@ -18,8 +20,8 @@ const PORT = getEnv('PORT', PORTS.MEMORY);
|
|||||||
const db = getDB();
|
const db = getDB();
|
||||||
|
|
||||||
semantic.initCollections()
|
semantic.initCollections()
|
||||||
.then(() => console.log(`QDrant collections ready`))
|
.then(() => logger.info(`QDrant collections ready`))
|
||||||
.catch(err => console.error(`QDrant initialization error:`, err.message));
|
.catch(err => logger.error(`QDrant initialization error:`, err.message));
|
||||||
|
|
||||||
// Health check endpoint
|
// Health check endpoint
|
||||||
app.get('/health', (req, res) => {
|
app.get('/health', (req, res) => {
|
||||||
@@ -79,13 +81,11 @@ app.patch('/sessions/by-external/:externalId', (req, res) => {
|
|||||||
const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
|
const session = episodic.updateSessionByExternalId(req.params.externalId, {name, projectId });
|
||||||
res.json(session);
|
res.json(session);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({error: err.message });
|
res.status(500).json({ error: 'Failed to update session', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Deletes a session and all associated episodes
|
||||||
|
|
||||||
// Updates the session's updated_at timestamp to now
|
|
||||||
app.delete('/sessions/by-external/:externalId', (req, res) => {
|
app.delete('/sessions/by-external/:externalId', (req, res) => {
|
||||||
episodic.deleteSessionByExternalId(req.params.externalId);
|
episodic.deleteSessionByExternalId(req.params.externalId);
|
||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
@@ -97,18 +97,11 @@ app.delete('/sessions/by-external/:externalId', (req, res) => {
|
|||||||
/************************************* */
|
/************************************* */
|
||||||
|
|
||||||
app.post('/episodes', async (req, res) => {
|
app.post('/episodes', async (req, res) => {
|
||||||
const { sessionId, userMessage, aiResponse, tokenCount, metadata, projectId } = req.body;
|
const { sessionId, userMessage, aiResponse, tokenCount, projectId } = req.body;
|
||||||
if (!sessionId || !userMessage || !aiResponse) {
|
if (!sessionId || !userMessage || !aiResponse) {
|
||||||
return res.status(400).json({ error: 'sessionId, userMessage and aiResponse are required' });
|
return res.status(400).json({ error: 'sessionId, userMessage and aiResponse are required' });
|
||||||
}
|
}
|
||||||
const episode = await episodic.createEpisode(sessionId, userMessage, aiResponse, tokenCount, metadata, projectId);
|
const episode = await episodic.createEpisode(sessionId, userMessage, aiResponse, tokenCount, projectId);
|
||||||
|
|
||||||
console.log('[memory] create episode body:', {
|
|
||||||
sessionId,
|
|
||||||
userMessageLength: userMessage?.length,
|
|
||||||
aiResponseLength: aiResponse?.length,
|
|
||||||
tokenCount
|
|
||||||
});
|
|
||||||
|
|
||||||
res.status(201).json(episode);
|
res.status(201).json(episode);
|
||||||
});
|
});
|
||||||
@@ -138,10 +131,12 @@ app.get('/episodes', (req, res) => {
|
|||||||
|
|
||||||
// Search MUST come before /:id — otherwise 'search' gets captured as an id
|
// Search MUST come before /:id — otherwise 'search' gets captured as an id
|
||||||
app.get('/episodes/search', (req, res) => {
|
app.get('/episodes/search', (req, res) => {
|
||||||
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE } = req.query;
|
const { q, limit = EPISODIC.DEFAULT_PAGE_SIZE, sessionIds } = req.query;
|
||||||
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
|
if (!q) return res.status(400).json({ error: 'q (query) parameter is required' });
|
||||||
const results = episodic.searchEpisodes(q, Number(limit));
|
const parsedSessionIds = sessionIds
|
||||||
res.json(results);
|
? sessionIds.split(',').map(Number).filter(Boolean)
|
||||||
|
: null;
|
||||||
|
res.json(episodic.searchEpisodes(q, Number(limit), parsedSessionIds));
|
||||||
});
|
});
|
||||||
|
|
||||||
app.get('/episodes/:id', (req, res) => {
|
app.get('/episodes/:id', (req, res) => {
|
||||||
@@ -166,7 +161,7 @@ app.delete('/episodes/:id', (req, res) => {
|
|||||||
episodic.deleteEpisode(id);
|
episodic.deleteEpisode(id);
|
||||||
|
|
||||||
semantic.deleteEpisode(id) // fire-and-forget
|
semantic.deleteEpisode(id) // fire-and-forget
|
||||||
.catch(err => console.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
|
.catch(err => logger.error(`[Memory] Qdrant delete failed for episode ${id}:`, err.message));
|
||||||
|
|
||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
});
|
});
|
||||||
@@ -210,17 +205,17 @@ app.delete('/entities/:id', (req, res) => {
|
|||||||
|
|
||||||
// Upsert a relationship between two entities
|
// Upsert a relationship between two entities
|
||||||
app.post('/relationships', (req, res) => {
|
app.post('/relationships', (req, res) => {
|
||||||
const {fromId, toId, label, metadata } = req.body;
|
const { fromId, toId, label, notes, metadata } = req.body;
|
||||||
if (!fromId || !toId || !label) {
|
if (!fromId || !toId || !label) {
|
||||||
return res.status(400).json({ error: 'fromId, toId and label are required' });
|
return res.status(400).json({ error: 'fromId, toId and label are required' });
|
||||||
}
|
}
|
||||||
const relationship = entities.upsertRelationship(fromId, toId, label, metadata);
|
const relationship = entities.upsertRelationship(fromId, toId, label, notes, metadata);
|
||||||
res.status(201).json(relationship);
|
res.status(201).json(relationship);
|
||||||
});
|
});
|
||||||
|
|
||||||
// Get all relationships for a given entity ID
|
// Get all relationships for a given entity ID
|
||||||
app.get('/entities/:id/relationships', (req, res) => {
|
app.get('/entities/:id/relationships', (req, res) => {
|
||||||
res.json(entities.getRelationshipsByEntity(req.params.id));
|
res.json(entities.getOutboundRelationships(req.params.id));
|
||||||
});
|
});
|
||||||
|
|
||||||
// Delete a specific relationship
|
// Delete a specific relationship
|
||||||
@@ -233,6 +228,37 @@ app.delete('/relationships', (req, res) => {
|
|||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
})
|
})
|
||||||
|
|
||||||
|
/********************************* */
|
||||||
|
/********** Graph Routes ********** */
|
||||||
|
/********************************* */
|
||||||
|
|
||||||
|
// Single-entity neighborhood — depth defaults to ENTITIES.GRAPH_HOP_DEPTH
|
||||||
|
app.get('/graph/neighborhood/:entityId', (req, res) => {
|
||||||
|
const entity = entities.getEntity(req.params.entityId);
|
||||||
|
if (!entity) return res.status(404).json({ error: 'Entity not found' });
|
||||||
|
|
||||||
|
const depth = req.query.depth ? Math.min(Number(req.query.depth), 3) : undefined;
|
||||||
|
const neighborhood = graph.getNeighborhood(Number(req.params.entityId), depth);
|
||||||
|
res.json({ entity, neighborhood });
|
||||||
|
});
|
||||||
|
|
||||||
|
// Bulk 1-hop neighborhood — body: { entityIds: [...] }
|
||||||
|
app.post('/graph/neighbors', (req, res) => {
|
||||||
|
const { entityIds } = req.body;
|
||||||
|
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||||
|
return res.status(400).json({ error: 'entityIds array is required' });
|
||||||
|
}
|
||||||
|
res.json(graph.getEntityNeighbors(entityIds.map(Number)));
|
||||||
|
});
|
||||||
|
|
||||||
|
app.post('/episodes/by-entities', (req, res) => {
|
||||||
|
const { entityIds } = req.body;
|
||||||
|
if (!Array.isArray(entityIds) || entityIds.length === 0) {
|
||||||
|
return res.status(400).json({ error: 'entityIds array is required' });
|
||||||
|
}
|
||||||
|
res.json({ episodeIds: graph.getEpisodeIdsByEntities(entityIds.map(Number)) });
|
||||||
|
});
|
||||||
|
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
/********** Project Routes ********** */
|
/********** Project Routes ********** */
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
@@ -243,7 +269,7 @@ app.post('/projects', (req, res) => {
|
|||||||
try {
|
try {
|
||||||
res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
|
res.status(201).json(createProject({ name: name.trim(), description, colour, icon }));
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to create project', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -251,6 +277,35 @@ app.get('/projects', (req, res) => {
|
|||||||
res.json(getProjects());
|
res.json(getProjects());
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Generate (or regenerate) a project overview summary on demand
|
||||||
|
app.post('/projects/:id/summarize', async (req, res) => {
|
||||||
|
const project = getProject(Number(req.params.id));
|
||||||
|
if (!project) return res.status(404).json({ error: 'Project not found' });
|
||||||
|
|
||||||
|
try {
|
||||||
|
const summary = await generateAndStoreProjectSummary(Number(req.params.id));
|
||||||
|
res.status(201).json(summary);
|
||||||
|
} catch (err) {
|
||||||
|
if (err.message.includes('No session summaries or episodes')) {
|
||||||
|
return res.status(422).json({ error: err.message });
|
||||||
|
}
|
||||||
|
res.status(500).json({ error: 'Failed to generate project summary', detail: err.message });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Get the current project overview summary
|
||||||
|
app.get('/projects/:id/overview', async (req, res) => {
|
||||||
|
const { getProjectOverviewSummary } = require('./db/summaries');
|
||||||
|
const summary = getProjectOverviewSummary(Number(req.params.id));
|
||||||
|
// 200 with null is fine — frontend can handle "no overview yet" gracefully
|
||||||
|
res.json(summary ?? null);
|
||||||
|
});
|
||||||
|
|
||||||
|
// Get summaries for a project
|
||||||
|
app.get('/projects/:id/summaries', (req, res) => {
|
||||||
|
res.json(getSummariesByProject(req.params.id));
|
||||||
|
});
|
||||||
|
|
||||||
app.get('/projects/:id', (req, res) => {
|
app.get('/projects/:id', (req, res) => {
|
||||||
const project = getProject(req.params.id);
|
const project = getProject(req.params.id);
|
||||||
if (!project) return res.status(404).json({ error: 'Not found' });
|
if (!project) return res.status(404).json({ error: 'Not found' });
|
||||||
@@ -271,6 +326,10 @@ app.delete('/projects/:id', (req, res) => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
/********** Summary Routes ********** */
|
/********** Summary Routes ********** */
|
||||||
/*********************************** */
|
/*********************************** */
|
||||||
@@ -285,7 +344,7 @@ app.post('/summaries', (req, res) => {
|
|||||||
const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
|
const summary = createSummary({ sessionId, projectId, content, tokenCount, episodeRange, metadata });
|
||||||
res.status(201).json(summary);
|
res.status(201).json(summary);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to create summary', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -294,11 +353,6 @@ app.get('/sessions/:id/summaries', (req, res) => {
|
|||||||
res.json(getSummariesBySession(req.params.id));
|
res.json(getSummariesBySession(req.params.id));
|
||||||
});
|
});
|
||||||
|
|
||||||
// Get summaries for a project
|
|
||||||
app.get('/projects/:id/summaries', (req, res) => {
|
|
||||||
res.json(getSummariesByProject(req.params.id));
|
|
||||||
});
|
|
||||||
|
|
||||||
// Update a summary (for cumulative updates)
|
// Update a summary (for cumulative updates)
|
||||||
app.patch('/summaries/:id', (req, res) => {
|
app.patch('/summaries/:id', (req, res) => {
|
||||||
const summary = getSummary(req.params.id);
|
const summary = getSummary(req.params.id);
|
||||||
@@ -318,5 +372,5 @@ app.delete('/summaries/:id', (req, res) => {
|
|||||||
/********** Start Server ********** */
|
/********** Start Server ********** */
|
||||||
/********************************** */
|
/********************************** */
|
||||||
app.listen(PORT, () => {
|
app.listen(PORT, () => {
|
||||||
console.log(`Memory Service is running on port ${PORT}`);
|
logger.info(`Memory Service is running on port ${PORT}`);
|
||||||
});
|
});
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
const {QdrantClient} = require('@qdrant/js-client-rest');
|
const {QdrantClient} = require('@qdrant/js-client-rest');
|
||||||
const {QDRANT, COLLECTIONS, getEnv} = require('@nexusai/shared');
|
const {QDRANT, COLLECTIONS, getEnv, logger} = require('@nexusai/shared');
|
||||||
|
|
||||||
let client;
|
let client;
|
||||||
|
|
||||||
@@ -24,9 +24,9 @@ async function initCollections() {
|
|||||||
distance: QDRANT.DISTANCE_METRIC
|
distance: QDRANT.DISTANCE_METRIC
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
console.log(`Created Qdrant collection: ${name}`);
|
logger.info(`Created Qdrant collection: ${name}`);
|
||||||
} else {
|
} else {
|
||||||
console.log(`Qdrant collection already exists: ${name}`);
|
logger.info(`Qdrant collection already exists: ${name}`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
142
packages/memory-service/src/summarization/project.js
Normal file
142
packages/memory-service/src/summarization/project.js
Normal file
@@ -0,0 +1,142 @@
|
|||||||
|
const { SERVICES, getEnv, SUMMARIES } = require('@nexusai/shared');
|
||||||
|
const {
|
||||||
|
getSessionSummariesForProject,
|
||||||
|
getProjectOverviewSummary,
|
||||||
|
createSummary,
|
||||||
|
updateSummary,
|
||||||
|
|
||||||
|
} = require('../db/summaries');
|
||||||
|
const { getEpisodesByProject } = require('../episodic');
|
||||||
|
const { getProject } = require('../db/projects');
|
||||||
|
|
||||||
|
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||||
|
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
||||||
|
|
||||||
|
const MAX_SUMMARY_CHARS = SUMMARIES.MAX_SUMMARY_CHARS; // generous ceiling before we truncate input
|
||||||
|
|
||||||
|
function buildProjectSummaryPrompt(projectName, sessionSummaries) {
|
||||||
|
let summaryBlock = sessionSummaries
|
||||||
|
.map((s, i) => `Session ${i + 1}:\n${s.content}`)
|
||||||
|
.join('\n\n');
|
||||||
|
|
||||||
|
// Guard against very large inputs — truncate oldest sessions if needed
|
||||||
|
if (summaryBlock.length > MAX_SUMMARY_CHARS) {
|
||||||
|
summaryBlock = summaryBlock.slice(-MAX_SUMMARY_CHARS);
|
||||||
|
}
|
||||||
|
|
||||||
|
return [
|
||||||
|
'<|im_start|>user',
|
||||||
|
`The following are session summaries from a project called "${projectName}".`,
|
||||||
|
'Write a project overview covering: goals, progress, key decisions, and current state.',
|
||||||
|
'Scale the length to the material — use multiple paragraphs for complex projects, a few sentences for simple ones.',
|
||||||
|
'Be comprehensive but avoid padding. Do not repeat the same point twice.',
|
||||||
|
'Write in third person. Output only the overview text, no headings or labels.',
|
||||||
|
'',
|
||||||
|
].join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildProjectSummaryFromEpisodesPrompt(projectName, episodes) {
|
||||||
|
// Condense episodes into a readable block, truncating if needed
|
||||||
|
let episodeBlock = episodes
|
||||||
|
.map(ep => `User: ${ep.user_message}\nAssistant: ${ep.ai_response}`)
|
||||||
|
.join('\n\n');
|
||||||
|
|
||||||
|
if (episodeBlock.length > MAX_SUMMARY_CHARS) {
|
||||||
|
// Keep the most recent episodes — slice from the end
|
||||||
|
episodeBlock = episodeBlock.slice(-MAX_SUMMARY_CHARS);
|
||||||
|
}
|
||||||
|
|
||||||
|
return [
|
||||||
|
'<|im_start|>user',
|
||||||
|
`The following are conversations from a project called "${projectName}".`,
|
||||||
|
'Write a project overview covering: goals, progress, key decisions, and current state.',
|
||||||
|
'Scale the length to the material — use multiple paragraphs for complex projects, a few sentences for simple ones.',
|
||||||
|
'Be comprehensive but avoid padding. Do not repeat the same point twice.',
|
||||||
|
'Write in third person. Output only the overview text, no headings or labels.',
|
||||||
|
'',
|
||||||
|
episodeBlock,
|
||||||
|
'<|im_end|>',
|
||||||
|
'<|im_start|>assistant',
|
||||||
|
].join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
async function generateProjectSummaryFromEpisodes(projectName, episodes) {
|
||||||
|
const prompt = buildProjectSummaryFromEpisodesPrompt(projectName, episodes);
|
||||||
|
|
||||||
|
const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: EXTRACTION_MODEL,
|
||||||
|
prompt,
|
||||||
|
stream: false,
|
||||||
|
options: { temperature: 0.2, num_predict: 1200 },
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||||
|
const data = await res.json();
|
||||||
|
|
||||||
|
const raw = data.response?.trim() ?? '';
|
||||||
|
return raw
|
||||||
|
.replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
|
||||||
|
.replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
|
||||||
|
.trim();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function generateProjectSummary(projectName, sessionSummaries) {
|
||||||
|
const prompt = buildProjectSummaryPrompt(projectName, sessionSummaries);
|
||||||
|
|
||||||
|
const res = await fetch(`${EXTRACTION_URL}/api/generate`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({
|
||||||
|
model: EXTRACTION_MODEL,
|
||||||
|
prompt,
|
||||||
|
stream: false,
|
||||||
|
// No format: 'json' — we want free-text narrative, same as session summarization
|
||||||
|
options: { temperature: 0.2, num_predict: 1200 },
|
||||||
|
}),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||||
|
const data = await res.json();
|
||||||
|
|
||||||
|
const raw = data.response?.trim() ?? '';
|
||||||
|
return raw
|
||||||
|
.replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
|
||||||
|
.replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
|
||||||
|
.trim();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Main entry point — called by the route handler
|
||||||
|
async function generateAndStoreProjectSummary(projectId) {
|
||||||
|
const project = getProject(projectId);
|
||||||
|
if (!project) throw new Error('Project not found');
|
||||||
|
|
||||||
|
let content;
|
||||||
|
const sessionSummaries = getSessionSummariesForProject(projectId);
|
||||||
|
|
||||||
|
if (sessionSummaries.length > 0) {
|
||||||
|
// Preferred path — summarize the summaries
|
||||||
|
content = await generateProjectSummary(project.name, sessionSummaries);
|
||||||
|
} else {
|
||||||
|
// Fallback — summarize raw episodes directly
|
||||||
|
const episodes = getEpisodesByProject(projectId);
|
||||||
|
if (!episodes.length) {
|
||||||
|
throw new Error('No session summaries or episodes found for this project');
|
||||||
|
}
|
||||||
|
content = await generateProjectSummaryFromEpisodes(project.name, episodes);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (!content) throw new Error('Model returned empty summary');
|
||||||
|
|
||||||
|
const existing = getProjectOverviewSummary(projectId);
|
||||||
|
if (existing) {
|
||||||
|
return updateSummary(existing.id, { content, tokenCount: null, episodeRange: null });
|
||||||
|
} else {
|
||||||
|
return createSummary({ projectId, content, sessionId: null });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { generateAndStoreProjectSummary };
|
||||||
156
packages/orchestration-service/CLAUDE.md
Normal file
156
packages/orchestration-service/CLAUDE.md
Normal file
@@ -0,0 +1,156 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
See the root [CLAUDE.md](../../CLAUDE.md) for overall architecture, service roles, and the end-to-end chat flow.
|
||||||
|
|
||||||
|
## Running This Service
|
||||||
|
|
||||||
|
```bash
|
||||||
|
npm run orchestration # From repo root (node src/index.js)
|
||||||
|
npm -w packages/orchestration-service run dev # With --watch
|
||||||
|
```
|
||||||
|
|
||||||
|
Default port: **4000**. Depends on memory-service, embedding-service, inference-service, and Qdrant.
|
||||||
|
|
||||||
|
## Context Assembly (`src/chat/index.js`)
|
||||||
|
|
||||||
|
`assembleContext(externalId, userMessage)` is the core function that builds the inference prompt. Order of operations:
|
||||||
|
|
||||||
|
1. Resolve session by `externalId` (creates it if missing — every chat call is self-healing).
|
||||||
|
2. If session has a `project_id`, load the project and fetch all sibling sessions (via `getProjectSessions`, hardcoded `limit=200`).
|
||||||
|
3. Fetch `recentEpisodeLimit` recent episodes from memory-service.
|
||||||
|
4. Embed the user message; search Qdrant EPISODES with `scoreThreshold`:
|
||||||
|
- No project: `must: [sessionId == this session]`
|
||||||
|
- Project: `should: [sessionId == s1, sessionId == s2, ...]` across all project sessions
|
||||||
|
- Dedup against recent episode IDs before including.
|
||||||
|
5. Run **fused episode retrieval** via `getFusedEpisodes` — Qdrant semantic search and FTS5 keyword search run in parallel, both filtered against `recentIds`, then merged via Reciprocal Rank Fusion (RRF). If `keywordWeight` is `0`, the FTS call is skipped. Returns top `semanticLimit` episodes by fused score.
|
||||||
|
6. Embed and search Qdrant ENTITIES (filtered by `projectId` if in a project). Returns entity IDs alongside payload — the Qdrant point ID equals the SQLite entity ID.
|
||||||
|
7. Expand matched entities into a 1-hop graph neighborhood via `POST /graph/neighbors` on the memory-service. Returns `{ nodes, edges }` — the full entity objects plus connecting relationships. Falls back to flat entity list (no edges) if the graph call fails.
|
||||||
|
8. Build prompt in this fixed order: **system prompt → graph context → fused episodes → recent episodes → user message → "Assistant:"**
|
||||||
|
|
||||||
|
The ordering prioritizes established facts (graph context) and relevant past context (semantic) over pure recency.
|
||||||
|
|
||||||
|
## Graph Context Format
|
||||||
|
|
||||||
|
`formatGraphContext(nodes, edges)` in `src/chat/index.js` formats the neighborhood as:
|
||||||
|
|
||||||
|
```
|
||||||
|
- Alice (person): software engineer working on NexusAI
|
||||||
|
→ works_on NexusAI (project)
|
||||||
|
→ knows Bob (person)
|
||||||
|
- NexusAI (project): AI assistant framework
|
||||||
|
- Bob (person): Alice's colleague
|
||||||
|
```
|
||||||
|
|
||||||
|
Each node shows its notes on the first line. Outbound edges are indented below with `→ label target (type)`. Nodes with only inbound edges (neighbors pulled in by traversal) appear without connection lines.
|
||||||
|
|
||||||
|
## System Prompt Resolution
|
||||||
|
|
||||||
|
Priority from highest to lowest:
|
||||||
|
1. `project.system_prompt` (stored on the project row in memory-service)
|
||||||
|
2. `settings.systemPrompt` (saved in `data/settings.json`)
|
||||||
|
3. `ORCHESTRATION.SYSTEM_PROMPT` (shared constants fallback)
|
||||||
|
|
||||||
|
## Settings (`src/config/settings.js`)
|
||||||
|
|
||||||
|
Settings are loaded from `data/settings.json` merged with defaults at every `GET /settings` call. `PATCH /settings` validates each field individually with specific constraints:
|
||||||
|
|
||||||
|
| Field | Constraint |
|
||||||
|
|---|---|
|
||||||
|
| `recentEpisodeLimit` | integer, 1–20 |
|
||||||
|
| `semanticLimit` | integer, 1–20 |
|
||||||
|
| `scoreThreshold` | number, 0–1 |
|
||||||
|
| `temperature` | number, 0–2 |
|
||||||
|
| `repeatPenalty` | number, 1–2 |
|
||||||
|
| `topP` | number, 0–1 |
|
||||||
|
| `topK` | integer, 1–100 |
|
||||||
|
| `modelsFolderPath` | path must exist and be readable |
|
||||||
|
| `systemPrompt` | string (trimmed); `null` reverts to shared default |
|
||||||
|
|
||||||
|
`data/settings.json` is created on first save. Parent directories are created if missing.
|
||||||
|
|
||||||
|
## Streaming SSE (`src/chat/index.js` — `chatStream`)
|
||||||
|
|
||||||
|
The route sets SSE headers and delegates to `chatStream`, which:
|
||||||
|
1. Calls `inference.completeStream()` → receives a raw HTTP Response with a readable body.
|
||||||
|
2. Reads the body in chunks, buffers across chunk boundaries, splits on `\n\n`.
|
||||||
|
3. For each event line starting with `data: `, parses the JSON and calls `onChunk(data.response)`.
|
||||||
|
4. The `[DONE]` sentinel (used by some llama-server versions) is explicitly ignored.
|
||||||
|
5. After stream ends, saves the assembled full response as an episode (same as non-streaming).
|
||||||
|
|
||||||
|
If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
|
||||||
|
|
||||||
|
## Fire-and-Forget Tasks
|
||||||
|
|
||||||
|
After every successful chat turn:
|
||||||
|
- **Summarization** (`services/summarization.js` → `triggerSummary`): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs if `SUMMARIES.THRESHOLD_TOKENS` is exceeded AND at least `SUMMARIES.MIN_EPISODES_SINCE` new episodes have occurred since the last summary.
|
||||||
|
- **Auto-naming** (`chat/index.js` → `autoNameSession`): only fires on the first message of a session. Uses temp 0.3, `maxTokens=20`, prompts for a ≤5-word title.
|
||||||
|
|
||||||
|
Both tasks catch all errors and log warnings without surfacing to the client.
|
||||||
|
|
||||||
|
## Summarization Recency Guard
|
||||||
|
|
||||||
|
`src/services/summarization.js` reads the `episode_range` field of the latest existing summary (format: `"<startId>-<endId>"`). It counts SQLite episodes with `id > endId`; if fewer than `SUMMARIES.MIN_EPISODES_SINCE`, it skips. This prevents rapid re-summarization on high-traffic sessions.
|
||||||
|
|
||||||
|
When the existing summary's token count exceeds `SUMMARIES.MAX_SUMMARY_TOKENS`, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
|
||||||
|
|
||||||
|
## Qdrant Calls (Direct, Not Via Memory-Service)
|
||||||
|
|
||||||
|
`src/services/qdrant.js` makes REST calls to Qdrant directly at `QDRANT_URL`. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID *after* getting vector search results from Qdrant.
|
||||||
|
|
||||||
|
`searchEntities` checks `projectId !== null && projectId !== undefined` before applying the filter — a session with no project skips the filter entirely and searches globally.
|
||||||
|
|
||||||
|
## Retrieval Fusion (`src/chat/index.js`)
|
||||||
|
|
||||||
|
Three functions handle fusion — all pure or lightly async, all non-critical:
|
||||||
|
|
||||||
|
- **`getFTSResults(userMessage, { limit, sessionIds })`** — calls `memory.searchEpisodes`; returns `[]` and logs a warning on failure
|
||||||
|
- **`fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit })`** — pure RRF implementation. Key guard: FTS-only episodes are only added to the scores Map if `contrib > 0` (prevents score-0 bleed-through when `keywordWeight: 0`)
|
||||||
|
- **`getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings)`** — orchestrates both paths in `Promise.all`, applies `recentIds` filter to FTS results, calls fusion. Short-circuits FTS call entirely if `keywordWeight === 0`
|
||||||
|
|
||||||
|
FTS is scoped to `projectSessionIds` if in a project, otherwise `[session.id]` — mirrors Qdrant scoping exactly.
|
||||||
|
|
||||||
|
> For RRF formula, weight semantics, and enabling keyword search, see `docs/services/retrieval-fusion.md`.
|
||||||
|
|
||||||
|
## Graph Service Client (`src/services/graph.js`)
|
||||||
|
|
||||||
|
Thin HTTP client for memory-service graph endpoints. One function:
|
||||||
|
|
||||||
|
- **`getNeighbors(entityIds[])`** — POSTs to `memory-service/graph/neighbors` with the entity IDs from Qdrant entity search. Returns `{ nodes, edges }`. Throws on non-2xx — caller wraps in try/catch with graceful fallback.
|
||||||
|
|
||||||
|
## Models Endpoint
|
||||||
|
|
||||||
|
`GET /models` scans `modelsFolderPath` for `.gguf` files and optionally reads a `models.json` manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
|
||||||
|
|
||||||
|
`GET /models/props` proxies `/props` from llama-server and returns `{contextWindow, modelAlias}`. Returns 503 if llama-server is unreachable.
|
||||||
|
|
||||||
|
## Health Check
|
||||||
|
|
||||||
|
`GET /health/services` runs parallel fetch calls to all four dependent services with a 3-second `AbortSignal.timeout` each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
|
||||||
|
|
||||||
|
## Background Model (qwen2.5:3b)
|
||||||
|
Used for entity/relationship extraction and summarization via Ollama on Mini PC 1. Uses **ChatML format** (`<|im_start|>` / `<|im_end|>`) — not Phi3 format. Use `format: 'json'` only for structured extraction, never for free-text summarization.
|
||||||
|
|
||||||
|
## API Endpoints Quick Reference
|
||||||
|
|
||||||
|
| Method | Path | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| GET | `/health` | Returns service URLs |
|
||||||
|
| GET | `/health/services` | Parallel status of all dependencies |
|
||||||
|
| POST | `/chat` | Blocking completion |
|
||||||
|
| POST | `/chat/stream` | SSE streaming |
|
||||||
|
| GET/PATCH | `/settings` | Persistent settings |
|
||||||
|
| GET | `/models` | `.gguf` file scan |
|
||||||
|
| GET | `/models/props` | llama-server model info |
|
||||||
|
| GET | `/sessions` | Delegates to memory-service |
|
||||||
|
| GET | `/sessions/:sessionId/history` | Paginated episodes by external ID |
|
||||||
|
| PATCH | `/sessions/:sessionId` | `name` and/or `projectId` |
|
||||||
|
| DELETE | `/sessions/:sessionId` | |
|
||||||
|
| GET | `/episodes` | Delegates; supports `q` for FTS |
|
||||||
|
| DELETE | `/episodes/:id` | Delegates |
|
||||||
|
| GET/POST/PATCH/DELETE | `/projects` and `/projects/:id` | Delegates |
|
||||||
|
| POST | `/summaries/project/:projectId/generate` | On-demand; 422 if no data |
|
||||||
|
| GET | `/summaries/project/:projectId/overview` | |
|
||||||
|
| GET | `/summaries/session/:sessionId` | Resolves external ID first |
|
||||||
|
| GET | `/summaries/project/:projectId` | |
|
||||||
@@ -2,34 +2,32 @@ const memory = require("../services/memory");
|
|||||||
const inference = require("../services/inference");
|
const inference = require("../services/inference");
|
||||||
const embedding = require("../services/embedding");
|
const embedding = require("../services/embedding");
|
||||||
const qdrant = require("../services/qdrant");
|
const qdrant = require("../services/qdrant");
|
||||||
const { ORCHESTRATION } = require("@nexusai/shared");
|
const { ORCHESTRATION, RETRIEVAL, logger } = require("@nexusai/shared");
|
||||||
const appSettings = require("../config/settings");
|
const appSettings = require("../config/settings");
|
||||||
const {triggerSummary} = require('../services/summarization')
|
const {triggerSummary} = require('../services/summarization')
|
||||||
|
const graph = require('../services/graph');
|
||||||
|
|
||||||
function buildPrompt(recentEpisodes, semanticEpisodes, entities, userMessage, systemPrompt) {
|
function buildPrompt(guaranteed, selected, neighborhood, userMessage, systemPrompt) {
|
||||||
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
|
const parts = [systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT];
|
||||||
|
|
||||||
if (entities.length > 0) {
|
const graphText = formatGraphContext(neighborhood.nodes ?? [], neighborhood.edges ?? []);
|
||||||
parts.push(
|
if (graphText) {
|
||||||
"Here is what you know about entities relevant to this conversation:",
|
parts.push("Here is what you know about entities relevant to this conversation and their connections:");
|
||||||
);
|
parts.push(graphText);
|
||||||
for (const e of entities) {
|
|
||||||
parts.push(`- ${e.name} (${e.type}): ${e.notes}`);
|
|
||||||
}
|
|
||||||
parts.push("---");
|
parts.push("---");
|
||||||
}
|
}
|
||||||
|
|
||||||
if (semanticEpisodes.length > 0) {
|
if (selected.length > 0) {
|
||||||
parts.push("Here are some relevant memories from earlier conversations:");
|
parts.push("Relevant memories from earlier conversations:");
|
||||||
for (const ep of semanticEpisodes) {
|
for (const ep of selected) {
|
||||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||||
}
|
}
|
||||||
parts.push("---");
|
parts.push("---");
|
||||||
}
|
}
|
||||||
|
|
||||||
if (recentEpisodes.length > 0) {
|
if (guaranteed.length > 0) {
|
||||||
parts.push(`Here are some relevant memories from your past conversations:`);
|
parts.push("Recent conversation history (most recent exchanges):");
|
||||||
for (const ep of recentEpisodes) {
|
for (const ep of guaranteed) {
|
||||||
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
parts.push(`User: ${ep.user_message}\nAssistant: ${ep.ai_response}`);
|
||||||
}
|
}
|
||||||
parts.push("--- End of recent memories ---\n");
|
parts.push("--- End of recent memories ---\n");
|
||||||
@@ -54,6 +52,28 @@ function buildNamingPrompt(userMessage, aiResponse) {
|
|||||||
].join("\n");
|
].join("\n");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
function formatGraphContext(nodes, edges) {
|
||||||
|
if (!nodes.length) return null;
|
||||||
|
|
||||||
|
const nodeMap = new Map(nodes.map(n => [n.id, n]));
|
||||||
|
|
||||||
|
// Build outbound adjacency
|
||||||
|
const outbound = new Map(nodes.map(n => [n.id, []]));
|
||||||
|
for (const edge of edges) {
|
||||||
|
if (outbound.has(edge.from_id) && nodeMap.has(edge.to_id)) {
|
||||||
|
const target = nodeMap.get(edge.to_id);
|
||||||
|
outbound.get(edge.from_id).push(`${edge.label} ${target.name} (${target.type})`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nodes.map(n => {
|
||||||
|
const lines = [`- ${n.name} (${n.type}): ${n.notes ?? '(no notes)'}`];
|
||||||
|
for (const conn of outbound.get(n.id) ?? []) lines.push(` → ${conn}`);
|
||||||
|
return lines.join('\n');
|
||||||
|
}).join('\n');
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
async function autoNameSession(externalId, userMessage, aiResponse) {
|
async function autoNameSession(externalId, userMessage, aiResponse) {
|
||||||
try {
|
try {
|
||||||
const prompt = buildNamingPrompt(userMessage, aiResponse);
|
const prompt = buildNamingPrompt(userMessage, aiResponse);
|
||||||
@@ -64,12 +84,12 @@ async function autoNameSession(externalId, userMessage, aiResponse) {
|
|||||||
const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
|
const name = result.text?.trim().replace(/^["']|["']$/g, ""); // strip any quotes the model adds
|
||||||
if (name) {
|
if (name) {
|
||||||
await memory.updateSession(externalId, { name });
|
await memory.updateSession(externalId, { name });
|
||||||
console.log(
|
logger.info(
|
||||||
`[orchestration] Auto-named session "${externalId}": "${name}"`,
|
`[orchestration] Auto-named session "${externalId}": "${name}"`,
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(
|
logger.warn(
|
||||||
"[orchestration] Auto-naming failed (non-critical):",
|
"[orchestration] Auto-naming failed (non-critical):",
|
||||||
err.message,
|
err.message,
|
||||||
);
|
);
|
||||||
@@ -99,7 +119,7 @@ async function getSemanticEpisodes(
|
|||||||
);
|
);
|
||||||
return fetched.filter(Boolean);
|
return fetched.filter(Boolean);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(
|
logger.warn(
|
||||||
`[orchestration] Semantic search failed, continuing without: `,
|
`[orchestration] Semantic search failed, continuing without: `,
|
||||||
err.message,
|
err.message,
|
||||||
);
|
);
|
||||||
@@ -111,27 +131,138 @@ async function getRelevantEntities(userMessage, projectId=null) {
|
|||||||
try {
|
try {
|
||||||
const vector = await embedding.embed(userMessage);
|
const vector = await embedding.embed(userMessage);
|
||||||
const results = await qdrant.searchEntities(vector, { projectId });
|
const results = await qdrant.searchEntities(vector, { projectId });
|
||||||
console.log(
|
logger.info(
|
||||||
"[orchestration] Entity search results:",
|
'[orchestration] Entity search results:',
|
||||||
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
results.map((r) => ({ name: r.payload?.name, score: r.score })),
|
||||||
);
|
);
|
||||||
return results.map((r) => r.payload).filter(Boolean);
|
// Include the Qdrant point ID (== SQLite entity ID) for graph traversal
|
||||||
|
return results.map((r) => r.payload ? { id: r.id, ...r.payload } : null).filter(Boolean);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(
|
logger.debug('[orchestration] Entity search failed, continuing without:', err.message);
|
||||||
"[orchestration] Entity search failed, continuing without:",
|
|
||||||
err.message,
|
|
||||||
);
|
|
||||||
return [];
|
return [];
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async function chat(externalId, userMessage, options = {}) {
|
async function getFTSResults(userMessage, { limit, sessionIds }) {
|
||||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt} =
|
try {
|
||||||
appSettings.load();
|
return await memory.searchEpisodes(userMessage, { limit, sessionIds });
|
||||||
|
} catch (err) {
|
||||||
|
logger.warn('[orchestration] FTS search failed, continuing without:', err.message);
|
||||||
|
return [];
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Returns {episode, score}[] — scores needed for buildScoredPool downstream
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const scores = new Map();
|
||||||
|
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib;
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit);
|
||||||
|
|
||||||
|
}
|
||||||
|
|
||||||
|
function estimateTokens(episode) {
|
||||||
|
return episode.token_count
|
||||||
|
?? Math.ceil((episode.user_message.length + episode.ai_response.length) / 4);
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const pool = new Map(); // episode.id → {episode, score}
|
||||||
|
|
||||||
|
for (const { episode, score } of fusedWithScores) {
|
||||||
|
pool.set(episode.id, { episode, score });
|
||||||
|
}
|
||||||
|
|
||||||
|
recentEpisodes.forEach((ep, i) => {
|
||||||
|
const recencyScore = 1.0 / (k + i + 1);
|
||||||
|
if (pool.has(ep.id)) {
|
||||||
|
pool.get(ep.id).score += recencyScore;
|
||||||
|
} else {
|
||||||
|
pool.set(ep.id, { episode: ep, score: recencyScore });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
for (const id of entityBoostedIds) {
|
||||||
|
if (pool.has(id)) pool.get(id).score += entityWeight;
|
||||||
|
}
|
||||||
|
|
||||||
|
return [...pool.values()].sort((a, b) => b.score - a.score);
|
||||||
|
}
|
||||||
|
|
||||||
|
function selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes) {
|
||||||
|
let budget = contextBudget;
|
||||||
|
const sortByTime = (a, b) => a.created_at - b.created_at;
|
||||||
|
|
||||||
|
// Guarantee floor: always include the N most recent episodes
|
||||||
|
const guaranteed = recentEpisodes.slice(0, minRecentEpisodes);
|
||||||
|
const guaranteedIds = new Set(guaranteed.map(ep => ep.id));
|
||||||
|
for (const ep of guaranteed) budget -= estimateTokens(ep);
|
||||||
|
|
||||||
|
// Fill remaining budget from scored pool, highest-priority first
|
||||||
|
const selected = [];
|
||||||
|
for (const { episode } of scoredPool) {
|
||||||
|
if (guaranteedIds.has(episode.id)) continue;
|
||||||
|
const cost = estimateTokens(episode);
|
||||||
|
|
||||||
|
// // Break rather than skip — lower-priority episodes aren't worth fitting over higher-priority ones
|
||||||
|
if (budget - cost < 0) break;
|
||||||
|
selected.push(episode);
|
||||||
|
budget -= cost;
|
||||||
|
}
|
||||||
|
|
||||||
|
return {
|
||||||
|
guaranteed: [...guaranteed].sort(sortByTime),
|
||||||
|
selected: selected.sort(sortByTime),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
async function getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, settings) {
|
||||||
|
const { semanticLimit, scoreThreshold, semanticWeight, keywordWeight } = settings;
|
||||||
|
const ftsSessionIds = projectSessionIds ?? [session.id];
|
||||||
|
|
||||||
|
const ftsPromise = keywordWeight > 0
|
||||||
|
// FTS and semantic may have significant overlap, so fetching more from FTS gives the fusion step more to work with before deduplication.
|
||||||
|
? getFTSResults(userMessage, { limit: semanticLimit * 2, sessionIds: ftsSessionIds })
|
||||||
|
: Promise.resolve([]);
|
||||||
|
|
||||||
|
const [semanticEps, rawKeywordEps] = await Promise.all([
|
||||||
|
getSemanticEpisodes(userMessage, session.id, recentIds, projectSessionIds, { semanticLimit, scoreThreshold }),
|
||||||
|
ftsPromise,
|
||||||
|
]);
|
||||||
|
|
||||||
|
const keywordEps = rawKeywordEps.filter(ep => !recentIds.has(ep.id));
|
||||||
|
return fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit: semanticLimit });
|
||||||
|
}
|
||||||
|
|
||||||
|
async function assembleContext(externalId, userMessage) {
|
||||||
|
const settings = appSettings.load();
|
||||||
|
const { recentEpisodeLimit, semanticLimit, scoreThreshold,
|
||||||
|
temperature, repeatPenalty, topP, topK, systemPrompt,
|
||||||
|
semanticWeight, keywordWeight,
|
||||||
|
contextBudget, entityWeight, minRecentEpisodes } = settings;
|
||||||
|
|
||||||
// 1. Resolve or create session
|
// 1. Resolve or create session
|
||||||
let session = await memory.getSessionByExternalId(externalId);
|
let session = await memory.getSessionByExternalId(externalId);
|
||||||
if (!session) session = await memory.createSession(externalId);
|
if (!session) session = await memory.createSession(externalId);
|
||||||
|
|
||||||
|
// 2. Resolve project context
|
||||||
let projectSessionIds = null;
|
let projectSessionIds = null;
|
||||||
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
|
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
|
||||||
if (session.project_id) {
|
if (session.project_id) {
|
||||||
@@ -139,73 +270,85 @@ async function chat(externalId, userMessage, options = {}) {
|
|||||||
const project = await memory.getProject(session.project_id);
|
const project = await memory.getProject(session.project_id);
|
||||||
if (project) {
|
if (project) {
|
||||||
const projectSessions = await memory.getProjectSessions(session.project_id);
|
const projectSessions = await memory.getProjectSessions(session.project_id);
|
||||||
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
|
if (project.system_prompt) activeSystemPrompt = project.system_prompt;
|
||||||
projectSessionIds = projectSessions.map((s) => s.id);
|
projectSessionIds = projectSessions.map(s => s.id);
|
||||||
}
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.warn(
|
logger.warn('[orchestration] Failed to resolve project context:', err.message);
|
||||||
"[orchestration] Failed to resolve project context:",
|
|
||||||
err.message,
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// 2. Fetch recent episodes for context
|
|
||||||
const recentEpisodes = await memory.getRecentEpisodes(
|
// 3. Fetch recent episodes
|
||||||
session.id,
|
const recentEpisodes = await memory.getRecentEpisodes(session.id, recentEpisodeLimit);
|
||||||
recentEpisodeLimit,
|
|
||||||
);
|
|
||||||
const isFirstMessage = recentEpisodes.length === 0;
|
const isFirstMessage = recentEpisodes.length === 0;
|
||||||
const recentIds = new Set(recentEpisodes.map((e) => e.id));
|
const recentIds = new Set(recentEpisodes.map(e => e.id));
|
||||||
|
|
||||||
// 3. Semantic Search
|
// 4. Fused retrieval + entity search in parallel (both are independent)
|
||||||
const semanticEpisodes = await getSemanticEpisodes(
|
const [fusedWithScores, entityResults] = await Promise.all([
|
||||||
userMessage,
|
getFusedEpisodes(userMessage, session, recentIds, projectSessionIds, { semanticLimit, scoreThreshold, semanticWeight, keywordWeight }),
|
||||||
session.id,
|
getRelevantEntities(userMessage, session.project_id ?? null),
|
||||||
recentIds,
|
]);
|
||||||
projectSessionIds,
|
|
||||||
{ semanticLimit, scoreThreshold },
|
|
||||||
);
|
|
||||||
|
|
||||||
// 3b. Entity Search
|
// 5. Entity-linked episode IDs for scoring bonus
|
||||||
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
|
const entityIds = entityResults.map(e => e.id);
|
||||||
|
let entityBoostedIds = new Set();
|
||||||
|
if (entityIds.length > 0) {
|
||||||
|
try {
|
||||||
|
const result = await memory.getEpisodesByEntities(entityIds);
|
||||||
|
entityBoostedIds = new Set(result.episodeIds);
|
||||||
|
} catch (err) {
|
||||||
|
logger.debug('[orchestration] Entity-episode lookup failed, skipping bonus:', err.message);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// 4. Assemble prompt
|
// 6. Build unified scored pool and select within token budget
|
||||||
const prompt = buildPrompt(
|
const scoredPool = buildScoredPool(fusedWithScores, recentEpisodes, entityBoostedIds, { entityWeight });
|
||||||
recentEpisodes,
|
const { guaranteed, selected } = selectWithinBudget(scoredPool, contextBudget, minRecentEpisodes, recentEpisodes);
|
||||||
semanticEpisodes,
|
|
||||||
entities,
|
|
||||||
userMessage,
|
|
||||||
activeSystemPrompt,
|
|
||||||
);
|
|
||||||
|
|
||||||
// 5. Run inference
|
// 7. Graph neighborhood expansion
|
||||||
const result = await inference.complete(prompt, {...options, temperature, repeatPenalty, topP, topK});
|
let neighborhood = { nodes: [], edges: [] };
|
||||||
|
if (entityIds.length > 0) {
|
||||||
|
try {
|
||||||
|
neighborhood = await graph.getNeighbors(entityIds);
|
||||||
|
} catch (err) {
|
||||||
|
logger.warn('[orchestration] Graph neighborhood fetch failed, falling back to flat entities:', err.message);
|
||||||
|
neighborhood = { nodes: entityResults, edges: [] };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// 6. Write episode back to memory
|
// 8. Assemble prompt
|
||||||
memory
|
const prompt = buildPrompt(guaranteed, selected, neighborhood, userMessage, activeSystemPrompt);
|
||||||
.createEpisode(
|
|
||||||
session.id,
|
return {
|
||||||
userMessage,
|
session,
|
||||||
result.text,
|
prompt,
|
||||||
|
isFirstMessage,
|
||||||
|
inferenceOptions: { temperature, repeatPenalty, topP, topK },
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function chat(externalId, userMessage, options = {}) {
|
||||||
|
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
|
||||||
|
|
||||||
|
const result = await inference.complete(prompt, { ...options, ...inferenceOptions });
|
||||||
|
|
||||||
|
try {
|
||||||
|
await memory.createEpisode(
|
||||||
|
session.id, userMessage, result.text,
|
||||||
(result.evalCount || 0) + (result.promptEvalCount || 0),
|
(result.evalCount || 0) + (result.promptEvalCount || 0),
|
||||||
session.project_id ?? null,
|
session.project_id ?? null,
|
||||||
)
|
|
||||||
.catch((err) =>
|
|
||||||
console.error(`[orchestration] Failed to save episode`, err.message),
|
|
||||||
);
|
);
|
||||||
|
} catch (err) {
|
||||||
|
logger.error('[orchestration] Failed to save episode:', err.message);
|
||||||
|
}
|
||||||
|
|
||||||
// 7. Trigger summarization check (fire-and-forget)
|
|
||||||
// Pass full episodes list so summarization can sum tokens accurately
|
|
||||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||||
triggerSummary(session, allEpisodes);
|
triggerSummary(session, allEpisodes);
|
||||||
|
|
||||||
|
|
||||||
// 8. Auto-name on first message
|
|
||||||
if (isFirstMessage && !session.name) {
|
if (isFirstMessage && !session.name) {
|
||||||
autoNameSession(externalId, userMessage, result.text).catch(() => {}); // already logged inside autoNameSession
|
autoNameSession(externalId, userMessage, result.text).catch(() => {});
|
||||||
}
|
}
|
||||||
|
|
||||||
// 9. Return response
|
|
||||||
return {
|
return {
|
||||||
sessionId: externalId,
|
sessionId: externalId,
|
||||||
response: result.text,
|
response: result.text,
|
||||||
@@ -216,115 +359,44 @@ async function chat(externalId, userMessage, options = {}) {
|
|||||||
|
|
||||||
async function chatStream(externalId, userMessage, onChunk, options = {}) {
|
async function chatStream(externalId, userMessage, onChunk, options = {}) {
|
||||||
try {
|
try {
|
||||||
const { recentEpisodeLimit, semanticLimit, scoreThreshold, temperature, repeatPenalty, topP, topK, systemPrompt } = appSettings.load();
|
const { session, prompt, isFirstMessage, inferenceOptions } = await assembleContext(externalId, userMessage);
|
||||||
let session = await memory.getSessionByExternalId(externalId);
|
|
||||||
if (!session) session = await memory.createSession(externalId);
|
|
||||||
|
|
||||||
let projectSessionIds = null;
|
const res = await inference.completeStream(prompt, { ...options, ...inferenceOptions });
|
||||||
let activeSystemPrompt = systemPrompt ?? ORCHESTRATION.SYSTEM_PROMPT;
|
|
||||||
if (session.project_id) {
|
|
||||||
try {
|
|
||||||
const project = await memory.getProject(session.project_id);
|
|
||||||
if (project) {
|
|
||||||
const projectSessions = await memory.getProjectSessions(
|
|
||||||
session.project_id,
|
|
||||||
);
|
|
||||||
projectSessionIds = projectSessions.map((s) => s.id);
|
|
||||||
if (project?.system_prompt) activeSystemPrompt = project.system_prompt;
|
|
||||||
}
|
|
||||||
|
|
||||||
} catch (err) {
|
let fullText = '', model = '', tokenCount = 0, buffer = '';
|
||||||
console.warn(
|
|
||||||
"[orchestration] Failed to resolve project context:",
|
|
||||||
err.message,
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
const recentEpisodes = await memory.getRecentEpisodes(
|
|
||||||
session.id,
|
|
||||||
recentEpisodeLimit,
|
|
||||||
);
|
|
||||||
const isFirstMessage = recentEpisodes.length === 0;
|
|
||||||
const recentIds = new Set(recentEpisodes.map((e) => e.id));
|
|
||||||
const semanticEpisodes = await getSemanticEpisodes(
|
|
||||||
userMessage,
|
|
||||||
session.id,
|
|
||||||
recentIds,
|
|
||||||
projectSessionIds,
|
|
||||||
{semanticLimit, scoreThreshold }
|
|
||||||
);
|
|
||||||
|
|
||||||
const entities = await getRelevantEntities(userMessage, session.project_id ?? null);
|
|
||||||
|
|
||||||
const prompt = buildPrompt(
|
|
||||||
recentEpisodes,
|
|
||||||
semanticEpisodes,
|
|
||||||
entities,
|
|
||||||
userMessage,
|
|
||||||
activeSystemPrompt,
|
|
||||||
);
|
|
||||||
const res = await inference.completeStream(prompt, {...options, temperature, repeatPenalty, topP, topK});
|
|
||||||
|
|
||||||
let fullText = "";
|
|
||||||
let model = "";
|
|
||||||
let tokenCount = 0;
|
|
||||||
let buffer = "";
|
|
||||||
|
|
||||||
for await (const chunk of res.body) {
|
for await (const chunk of res.body) {
|
||||||
buffer += Buffer.from(chunk).toString("utf8");
|
buffer += Buffer.from(chunk).toString('utf8');
|
||||||
|
const events = buffer.split('\n\n');
|
||||||
const events = buffer.split("\n\n");
|
buffer = events.pop() || '';
|
||||||
buffer = events.pop() || "";
|
|
||||||
|
|
||||||
for (const event of events) {
|
for (const event of events) {
|
||||||
const lines = event.split("\n");
|
const dataLines = event.split('\n')
|
||||||
const dataLines = lines
|
.filter(line => line.startsWith('data: '))
|
||||||
.filter((line) => line.startsWith("data: "))
|
.map(line => line.slice(6));
|
||||||
.map((line) => line.slice(6));
|
|
||||||
|
|
||||||
if (dataLines.length === 0) continue;
|
if (!dataLines.length) continue;
|
||||||
|
const raw = dataLines.join('\n').trim();
|
||||||
const raw = dataLines.join("\n").trim();
|
if (raw === '[DONE]') continue;
|
||||||
if (raw === "[DONE]") continue;
|
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const data = JSON.parse(raw);
|
const data = JSON.parse(raw);
|
||||||
|
if (data.response) { fullText += data.response; onChunk(data.response); }
|
||||||
if (data.response) {
|
|
||||||
fullText += data.response;
|
|
||||||
onChunk(data.response);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (data.model) model = data.model;
|
if (data.model) model = data.model;
|
||||||
if (data.done && data.tokenCount !== undefined) {
|
if (data.done && data.tokenCount !== undefined) tokenCount = data.tokenCount;
|
||||||
tokenCount = data.tokenCount;
|
if (data.error) throw new Error(data.error);
|
||||||
}
|
|
||||||
|
|
||||||
if (data.error) {
|
|
||||||
throw new Error(data.error);
|
|
||||||
}
|
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error(
|
logger.error('[orchestration] Failed to parse SSE event:', raw, err.message);
|
||||||
"[orchestration] Failed to parse inference SSE event:",
|
|
||||||
raw,
|
|
||||||
err.message,
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
console.log("[orchestration] final streamed text length:", fullText.length);
|
|
||||||
|
|
||||||
if (fullText.trim()) {
|
if (fullText.trim()) {
|
||||||
console.log('[chat] tokenCount before save:', tokenCount);
|
|
||||||
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
|
await memory.createEpisode(session.id, userMessage, fullText, tokenCount, session.project_id ?? null);
|
||||||
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
const allEpisodes = await memory.getRecentEpisodes(session.id, 9999);
|
||||||
triggerSummary(session, allEpisodes);
|
triggerSummary(session, allEpisodes);
|
||||||
} else {
|
} else {
|
||||||
console.warn(
|
logger.warn('[orchestration] Stream finished with no assistant text; episode not saved');
|
||||||
"[orchestration] Stream finished with no assistant text; episode not saved",
|
|
||||||
);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
if (isFirstMessage && !session.name) {
|
if (isFirstMessage && !session.name) {
|
||||||
@@ -333,11 +405,7 @@ async function chatStream(externalId, userMessage, onChunk, options = {}) {
|
|||||||
|
|
||||||
return { model, tokenCount };
|
return { model, tokenCount };
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error(
|
logger.error('[orchestration] chatStream fatal error:', err.message, err.stack);
|
||||||
"[orchestration] chatStream fatal error:",
|
|
||||||
err.message,
|
|
||||||
err.stack,
|
|
||||||
);
|
|
||||||
throw err;
|
throw err;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
const fs = require('fs');
|
const fs = require('fs');
|
||||||
const path = require('path');
|
const path = require('path');
|
||||||
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS } = require('@nexusai/shared');
|
const { getEnv, ORCHESTRATION, INFERENCE_DEFAULTS, RETRIEVAL } = require('@nexusai/shared');
|
||||||
|
|
||||||
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
|
const SETTINGS_PATH = path.join(__dirname, '../../data/settings.json');
|
||||||
|
|
||||||
@@ -14,6 +14,11 @@ const DEFAULTS = {
|
|||||||
topP: INFERENCE_DEFAULTS.TOP_P,
|
topP: INFERENCE_DEFAULTS.TOP_P,
|
||||||
topK: INFERENCE_DEFAULTS.TOP_K,
|
topK: INFERENCE_DEFAULTS.TOP_K,
|
||||||
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
|
systemPrompt: ORCHESTRATION.SYSTEM_PROMPT,
|
||||||
|
semanticWeight: RETRIEVAL.SEMANTIC_WEIGHT,
|
||||||
|
keywordWeight: RETRIEVAL.KEYWORD_WEIGHT,
|
||||||
|
contextBudget: ORCHESTRATION.CONTEXT_BUDGET,
|
||||||
|
entityWeight: ORCHESTRATION.ENTITY_WEIGHT,
|
||||||
|
minRecentEpisodes: ORCHESTRATION.MIN_RECENT_EPISODES,
|
||||||
};
|
};
|
||||||
|
|
||||||
function load() {
|
function load() {
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
require ('dotenv').config();
|
require ('dotenv').config();
|
||||||
const express = require('express');
|
const express = require('express');
|
||||||
const {getEnv, PORTS, SERVICES, ORCHESTRATION} = require('@nexusai/shared');
|
const {getEnv, PORTS, SERVICES, ORCHESTRATION, logger} = require('@nexusai/shared');
|
||||||
|
|
||||||
/**** ROUTERS *** */
|
/**** ROUTERS *** */
|
||||||
const chatRouter = require('./routes/chat');
|
const chatRouter = require('./routes/chat');
|
||||||
@@ -10,11 +10,12 @@ const projectsRouter = require('./routes/projects');
|
|||||||
const episodesRouter = require('./routes/episodes');
|
const episodesRouter = require('./routes/episodes');
|
||||||
const settingsRouter = require('./routes/settings');
|
const settingsRouter = require('./routes/settings');
|
||||||
const healthRouter = require('./routes/health');
|
const healthRouter = require('./routes/health');
|
||||||
|
const summariesRouter = require('./routes/summaries')
|
||||||
|
|
||||||
const cors = require('cors');
|
const cors = require('cors');
|
||||||
|
|
||||||
const app = express();
|
const app = express();
|
||||||
app.use(express.json());
|
app.use(express.json({ limit: '2mb' }));
|
||||||
|
|
||||||
app.use(cors({
|
app.use(cors({
|
||||||
origin: [
|
origin: [
|
||||||
@@ -48,8 +49,9 @@ app.use('/projects', projectsRouter);
|
|||||||
app.use('/episodes', episodesRouter);
|
app.use('/episodes', episodesRouter);
|
||||||
app.use('/settings', settingsRouter);
|
app.use('/settings', settingsRouter);
|
||||||
app.use('/health/services', healthRouter);
|
app.use('/health/services', healthRouter);
|
||||||
|
app.use('/summaries', summariesRouter)
|
||||||
|
|
||||||
/******* Start the server ************/
|
/******* Start the server ************/
|
||||||
app.listen(PORT, () => {
|
app.listen(PORT, () => {
|
||||||
console.log(`Orchestration Service is running on port ${PORT}`);
|
logger.info(`Orchestration Service is running on port ${PORT}`);
|
||||||
});
|
});
|
||||||
@@ -1,6 +1,8 @@
|
|||||||
const { Router } = require('express')
|
const { Router } = require('express')
|
||||||
const { chat, chatStream } = require('../chat/index');
|
const { chat, chatStream } = require('../chat/index');
|
||||||
const memory = require('../services/memory')
|
const memory = require('../services/memory')
|
||||||
|
const logger = require('@nexusai/shared');
|
||||||
|
|
||||||
|
|
||||||
const router = Router();
|
const router = Router();
|
||||||
|
|
||||||
@@ -17,8 +19,8 @@ router.post('/', async (req, res) => {
|
|||||||
});
|
});
|
||||||
res.json(result)
|
res.json(result)
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error(`[orchestration] chat error: `, err.message)
|
logger.error(`[orchestration] chat error: `, err.message)
|
||||||
res.status(500).json ({ error: err.message})
|
res.status(500).json ({ error: 'Chat failed', detail: err.message })
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -9,7 +9,7 @@ router.get('/', async (req, res) => {
|
|||||||
const result = await memory.getEpisodes({ limit, offset, sessionId, q });
|
const result = await memory.getEpisodes({ limit, offset, sessionId, q });
|
||||||
res.json(result);
|
res.json(result);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to fetch episodes', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -18,7 +18,7 @@ router.delete('/:id', async (req, res) => {
|
|||||||
await memory.deleteEpisode(req.params.id);
|
await memory.deleteEpisode(req.params.id);
|
||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to delete episode', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ const fs = require('fs');
|
|||||||
const path = require('path');
|
const path = require('path');
|
||||||
const appSettings = require('../config/settings');
|
const appSettings = require('../config/settings');
|
||||||
|
|
||||||
const { getEnv, LLAMACPP } = require('@nexusai/shared');
|
const { getEnv, LLAMACPP, logger } = require('@nexusai/shared');
|
||||||
const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);
|
const LLAMA_URL = getEnv('LLAMA_SERVER_URL', LLAMACPP.DEFAULT_URL);
|
||||||
|
|
||||||
router.get('/', (req, res) => {
|
router.get('/', (req, res) => {
|
||||||
@@ -38,7 +38,7 @@ router.get('/', (req, res) => {
|
|||||||
|
|
||||||
res.json(models);
|
res.json(models);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error('[models] Failed to scan folder:', err.message);
|
logger.error('[models] Failed to scan folder:', err.message);
|
||||||
res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
|
res.status(500).json({ error: `Could not read models folder: ${modelsFolderPath}` });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
@@ -53,7 +53,7 @@ router.get('/props', async (req, res) => {
|
|||||||
modelAlias: data.model_alias,
|
modelAlias: data.model_alias,
|
||||||
});
|
});
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
console.error('[models/props]', err.message);
|
logger.error('[models/props]', err.message);
|
||||||
res.status(503).json({ error: 'Could not reach llama-server' });
|
res.status(503).json({ error: 'Could not reach llama-server' });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ router.get('/', async (req, res) => {
|
|||||||
try {
|
try {
|
||||||
res.json(await memory.getProjects());
|
res.json(await memory.getProjects());
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to fetch projects', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -17,7 +17,7 @@ router.post('/', async (req, res) => {
|
|||||||
try {
|
try {
|
||||||
res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
|
res.status(201).json(await memory.createProject({ name: name.trim(), description, colour, icon, isolated }));
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to create project', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -25,7 +25,7 @@ router.patch('/:id', async (req, res) => {
|
|||||||
try {
|
try {
|
||||||
res.json(await memory.updateProject(req.params.id, req.body));
|
res.json(await memory.updateProject(req.params.id, req.body));
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to update project', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -34,7 +34,7 @@ router.delete('/:id', async (req, res) => {
|
|||||||
await memory.deleteProject(req.params.id);
|
await memory.deleteProject(req.params.id);
|
||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to delete project', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -15,7 +15,7 @@ router.get('/:sessionId/history', async (req, res) => {
|
|||||||
const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
|
const history = await memory.getSessionHistory(session.id, Number(limit), Number(offset));
|
||||||
res.json({ sessionId, episodes: history });
|
res.json({ sessionId, episodes: history });
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to fetch session history', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -26,7 +26,7 @@ router.get('/', async (req, res) => {
|
|||||||
const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
|
const sessions = await memory.getSessions(Number(limit), Number(offset), parsedProjectId);
|
||||||
res.json(sessions);
|
res.json(sessions);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to fetch sessions', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -45,7 +45,7 @@ router.patch('/:sessionId', async (req, res) => {
|
|||||||
});
|
});
|
||||||
res.json(session);
|
res.json(session);
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to update session', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -54,7 +54,7 @@ router.delete('/:sessionId', async (req, res) => {
|
|||||||
await memory.deleteSession(req.params.sessionId);
|
await memory.deleteSession(req.params.sessionId);
|
||||||
res.status(204).send();
|
res.status(204).send();
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
res.status(500).json({ error: err.message });
|
res.status(500).json({ error: 'Failed to delete session', detail: err.message });
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -80,6 +80,41 @@ if (req.body.systemPrompt !== undefined) {
|
|||||||
updates.systemPrompt = val.trim() || null; // null reverts to default
|
updates.systemPrompt = val.trim() || null; // null reverts to default
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (req.body.semanticWeight !== undefined) {
|
||||||
|
const val = Number(req.body.semanticWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 5)
|
||||||
|
return res.status(400).json({ error: 'semanticWeight must be 0–5' });
|
||||||
|
updates.semanticWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.keywordWeight !== undefined) {
|
||||||
|
const val = Number(req.body.keywordWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 5)
|
||||||
|
return res.status(400).json({ error: 'keywordWeight must be 0–5' });
|
||||||
|
updates.keywordWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.contextBudget !== undefined) {
|
||||||
|
const val = Number(req.body.contextBudget);
|
||||||
|
if (!Number.isInteger(val) || val < 512 || val > 32768)
|
||||||
|
return res.status(400).json({ error: 'contextBudget must be 512–32768' });
|
||||||
|
updates.contextBudget = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.entityWeight !== undefined) {
|
||||||
|
const val = Number(req.body.entityWeight);
|
||||||
|
if (isNaN(val) || val < 0 || val > 2)
|
||||||
|
return res.status(400).json({ error: 'entityWeight must be 0–2' });
|
||||||
|
updates.entityWeight = val;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (req.body.minRecentEpisodes !== undefined) {
|
||||||
|
const val = Number(req.body.minRecentEpisodes);
|
||||||
|
if (!Number.isInteger(val) || val < 0 || val > 10)
|
||||||
|
return res.status(400).json({ error: 'minRecentEpisodes must be 0–10' });
|
||||||
|
updates.minRecentEpisodes = val;
|
||||||
|
}
|
||||||
|
|
||||||
res.json(settings.save(updates));
|
res.json(settings.save(updates));
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
48
packages/orchestration-service/src/routes/summaries.js
Normal file
48
packages/orchestration-service/src/routes/summaries.js
Normal file
@@ -0,0 +1,48 @@
|
|||||||
|
const { Router } = require('express');
|
||||||
|
const memory = require('../services/memory');
|
||||||
|
|
||||||
|
const router = Router();
|
||||||
|
|
||||||
|
// Trigger on-demand project summary generation
|
||||||
|
router.post('/project/:projectId/generate', async (req, res) => {
|
||||||
|
try {
|
||||||
|
const summary = await memory.generateProjectSummary(req.params.projectId);
|
||||||
|
res.status(201).json(summary);
|
||||||
|
} catch (err) {
|
||||||
|
// Pass through 422 from memory-service ("no session summaries yet")
|
||||||
|
const status = err.message.includes('422') ? 422 : 500;
|
||||||
|
res.status(status).json({ error: err.message });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
// Get current project overview summary
|
||||||
|
router.get('/project/:projectId/overview', async (req, res) => {
|
||||||
|
try {
|
||||||
|
const summary = await memory.getProjectOverviewSummary(req.params.projectId);
|
||||||
|
res.json(summary);
|
||||||
|
} catch (err) {
|
||||||
|
res.status(500).json({ error: 'Failed to fetch project overview summary', detail: err.message });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
router.get('/session/:sessionId', async (req, res) => {
|
||||||
|
try {
|
||||||
|
const session = await memory.getSessionByExternalId(req.params.sessionId);
|
||||||
|
if (!session) return res.status(404).json({ error: 'Session not found' });
|
||||||
|
const summaries = await memory.getSummariesBySession(session.id);
|
||||||
|
res.json(summaries);
|
||||||
|
} catch (err) {
|
||||||
|
res.status(500).json({ error: 'Failed to fetch session summaries', detail: err.message });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
router.get('/project/:projectId', async (req, res) => {
|
||||||
|
try {
|
||||||
|
const summaries = await memory.getSummariesByProject(req.params.projectId);
|
||||||
|
res.json(summaries);
|
||||||
|
} catch (err) {
|
||||||
|
res.status(500).json({ error: 'Failed to fetch project summaries', detail: err.message });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
module.exports = router;
|
||||||
15
packages/orchestration-service/src/services/graph.js
Normal file
15
packages/orchestration-service/src/services/graph.js
Normal file
@@ -0,0 +1,15 @@
|
|||||||
|
const { getEnv, SERVICES } = require('@nexusai/shared');
|
||||||
|
|
||||||
|
const MEMORY_URL = getEnv('MEMORY_SERVICE_URL', SERVICES.MEMORY_URL);
|
||||||
|
|
||||||
|
async function getNeighbors(entityIds) {
|
||||||
|
const res = await fetch(`${MEMORY_URL}/graph/neighbors`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ entityIds }),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`Graph neighbors error: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
module.exports = { getNeighbors };
|
||||||
@@ -176,6 +176,46 @@ async function updateSummary(id, { content, tokenCount, episodeRange }) {
|
|||||||
return res.json();
|
return res.json();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function getSummariesByProject(projectId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/projects/${projectId}/summaries`);
|
||||||
|
if (!res.ok) throw new Error(`Failed to fetch summaries: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function generateProjectSummary(projectId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/projects/${projectId}/summarize`, {
|
||||||
|
method: 'POST',
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`Failed to generate project summary: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function getProjectOverviewSummary(projectId) {
|
||||||
|
const res = await fetch(`${BASE_URL}/projects/${projectId}/overview`);
|
||||||
|
if (!res.ok) throw new Error(`Failed to fetch project overview: ${res.status}`);
|
||||||
|
return res.json(); // null if none exists yet
|
||||||
|
}
|
||||||
|
|
||||||
|
async function searchEpisodes(query, { limit = 10, sessionIds = null } = {}) {
|
||||||
|
const url = new URL(`${BASE_URL}/episodes/search`);
|
||||||
|
url.searchParams.set('q', query);
|
||||||
|
url.searchParams.set('limit', limit);
|
||||||
|
if (sessionIds?.length) url.searchParams.set('sessionIds', sessionIds.join(','));
|
||||||
|
const res = await fetch(url.toString());
|
||||||
|
if (!res.ok) throw new Error(`FTS search error: ${res.status}`);
|
||||||
|
return res.json();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function getEpisodesByEntities(entityIds) {
|
||||||
|
const res = await fetch(`${BASE_URL}/episodes/by-entities`, {
|
||||||
|
method: 'POST',
|
||||||
|
headers: { 'Content-Type': 'application/json' },
|
||||||
|
body: JSON.stringify({ entityIds }),
|
||||||
|
});
|
||||||
|
if (!res.ok) throw new Error(`Episodes-by-entities error: ${res.status}`);
|
||||||
|
return res.json(); // { episodeIds: [...] }
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
getSessionByExternalId,
|
getSessionByExternalId,
|
||||||
createSession,
|
createSession,
|
||||||
@@ -197,4 +237,9 @@ module.exports = {
|
|||||||
getSummariesBySession,
|
getSummariesBySession,
|
||||||
createSummary,
|
createSummary,
|
||||||
updateSummary,
|
updateSummary,
|
||||||
|
getSummariesByProject,
|
||||||
|
generateProjectSummary,
|
||||||
|
getProjectOverviewSummary,
|
||||||
|
searchEpisodes,
|
||||||
|
getEpisodesByEntities,
|
||||||
}
|
}
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
const { getEnv, SERVICES, SUMMARIES } = require('@nexusai/shared');
|
const { getEnv, SERVICES, SUMMARIES, logger } = require('@nexusai/shared');
|
||||||
|
|
||||||
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
const EXTRACTION_URL = getEnv('EXTRACTION_URL', 'http://localhost:11434');
|
||||||
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
const EXTRACTION_MODEL = getEnv('EXTRACTION_MODEL', 'qwen2.5:3b');
|
||||||
@@ -9,34 +9,37 @@ const MAX_SUMMARY_TOKENS = parseInt(getEnv('SUMMARY_MAX_TOKENS', SUMMARIES.MAX_S
|
|||||||
const MIN_EPISODES_SINCE = parseInt(getEnv('SUMMARY_MIN_EPISODES', SUMMARIES.MIN_EPISODES_SINCE));
|
const MIN_EPISODES_SINCE = parseInt(getEnv('SUMMARY_MIN_EPISODES', SUMMARIES.MIN_EPISODES_SINCE));
|
||||||
|
|
||||||
function buildSummaryPrompt(episodes, existingSummary = null) {
|
function buildSummaryPrompt(episodes, existingSummary = null) {
|
||||||
const MAX_CHARS = 3000; // truncate input to keep Phi3 focused
|
const MAX_CHARS = 3000;
|
||||||
let context = episodes
|
let context = episodes
|
||||||
.map(ep => `User: ${ep.user_message}\nAssistant: ${ep.ai_response}`)
|
.map(ep => `User: ${ep.user_message}\nAssistant: ${ep.ai_response}`)
|
||||||
.join('\n\n');
|
.join('\n\n');
|
||||||
|
|
||||||
// Truncate from the start if too long — keep the most recent exchanges
|
|
||||||
if (context.length > MAX_CHARS) {
|
if (context.length > MAX_CHARS) {
|
||||||
context = context.slice(-MAX_CHARS);
|
context = context.slice(-MAX_CHARS);
|
||||||
}
|
}
|
||||||
|
|
||||||
const instruction = existingSummary
|
const instruction = existingSummary
|
||||||
? `Update the summary below to include the new exchanges. Write 3-5 sentences in third person. Output only the updated summary text, nothing else.
|
? `Update the summary below to incorporate the new exchanges.
|
||||||
|
Write 3-5 sentences in third person. Do not quote directly — paraphrase only.
|
||||||
|
Do not include greetings, sign-offs, or filler. Output only the updated summary text.
|
||||||
|
|
||||||
Previous summary:
|
Previous summary:
|
||||||
${existingSummary}
|
${existingSummary}
|
||||||
|
|
||||||
New exchanges:
|
New exchanges:
|
||||||
${context}`
|
${context}`
|
||||||
: `Summarize the conversation below in 3-5 sentences. Write in third person. Output only the summary text, nothing else.
|
: `Summarize the conversation below in 3-5 sentences.
|
||||||
|
Write in third person. Do not quote directly — paraphrase only.
|
||||||
|
Do not include greetings, sign-offs, or filler. Output only the summary text.
|
||||||
|
|
||||||
Conversation:
|
Conversation:
|
||||||
${context}`;
|
${context}`;
|
||||||
|
|
||||||
return [
|
return [
|
||||||
'<|user|>',
|
'<|im_start|>user', // ChatML for qwen2.5
|
||||||
instruction,
|
instruction,
|
||||||
'<|end|>',
|
'<|im_end|>',
|
||||||
'<|assistant|>',
|
'<|im_start|>assistant',
|
||||||
].join('\n');
|
].join('\n');
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -52,24 +55,31 @@ async function generateSummary(episodes, existingSummary = null) {
|
|||||||
stream: false,
|
stream: false,
|
||||||
options: {
|
options: {
|
||||||
temperature: 0.2, // slightly higher than entities — summaries benefit from some fluency
|
temperature: 0.2, // slightly higher than entities — summaries benefit from some fluency
|
||||||
num_predict: 200, // generous but bounded — keeps summaries from running long
|
num_predict: 500, // generous but bounded — keeps summaries from running long
|
||||||
},
|
},
|
||||||
}),
|
}),
|
||||||
});
|
});
|
||||||
|
|
||||||
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
if (!res.ok) throw new Error(`Ollama responded ${res.status}`);
|
||||||
const data = await res.json();
|
const data = await res.json();
|
||||||
return data.response?.trim() ?? '';
|
|
||||||
|
|
||||||
|
const raw = data.response?.trim() ?? '';
|
||||||
|
// Strip any leaked ChatML tokens Qwen echoes back
|
||||||
|
const content = raw
|
||||||
|
.replace(/<\|im_start\|>.*?<\|im_end\|>/gs, '')
|
||||||
|
.replace(/<\|im_start\|>|<\|im_end\|>|<\|im_sep\|>/g, '')
|
||||||
|
.trim();
|
||||||
|
return content;
|
||||||
}
|
}
|
||||||
|
|
||||||
async function maybeSummarize(session, allEpisodes) {
|
async function maybeSummarize(session, allEpisodes) {
|
||||||
// 1. Sum total tokens for this session
|
// 1. Sum total tokens for this session
|
||||||
const totalTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
|
const totalTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
|
||||||
if (totalTokens < THRESHOLD_TOKENS) return; // under threshold — nothing to do
|
if (totalTokens < THRESHOLD_TOKENS) return; // under threshold — nothing to do
|
||||||
console.log('[summarization] fetching existing summaries...');
|
|
||||||
// 2. Fetch existing summaries for session
|
// 2. Fetch existing summaries for session
|
||||||
const summariesRes = await fetch(`${MEMORY_URL}/sessions/${session.id}/summaries`);
|
const summariesRes = await fetch(`${MEMORY_URL}/sessions/${session.id}/summaries`);
|
||||||
console.log('[summarization] summaries fetch status:', summariesRes.status);
|
|
||||||
if (!summariesRes.ok) return;
|
if (!summariesRes.ok) return;
|
||||||
const summaries = await summariesRes.json();
|
const summaries = await summariesRes.json();
|
||||||
|
|
||||||
@@ -83,19 +93,18 @@ async function maybeSummarize(session, allEpisodes) {
|
|||||||
if (newEpisodes.length < MIN_EPISODES_SINCE) return;
|
if (newEpisodes.length < MIN_EPISODES_SINCE) return;
|
||||||
}
|
}
|
||||||
|
|
||||||
// 4. Determine episode range string e.g. "1-42"
|
// 4. Determine episodes to summarize
|
||||||
const ids = allEpisodes.map(ep => ep.id).sort((a,b) => a - b);
|
|
||||||
const episodeRange = `${ids.at(0)}-${ids.at(-1)}`;
|
|
||||||
const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
|
|
||||||
|
|
||||||
// 5. Generate summary — pass existing content if updating
|
|
||||||
const episodesToSummarize = latest
|
const episodesToSummarize = latest
|
||||||
? allEpisodes.filter(ep => ep.id > lastCoveredId)
|
? allEpisodes.filter(ep => ep.id > lastCoveredId)
|
||||||
: allEpisodes;
|
: allEpisodes;
|
||||||
|
|
||||||
|
// 5. Determine episode range from the episodes actually being summarized
|
||||||
|
const summarizedIds = episodesToSummarize.map(ep => ep.id).sort((a,b) => a - b);
|
||||||
|
const episodeRange = `${summarizedIds.at(0)}-${summarizedIds.at(-1)}`;
|
||||||
|
const totalEpisodeTokens = allEpisodes.reduce((sum, ep) => sum + (ep.token_count || 0), 0);
|
||||||
|
|
||||||
// add temporarily before the generateSummary call
|
// add temporarily before the generateSummary call
|
||||||
console.log('[summarization] episodes to summarize:', episodesToSummarize.length);
|
logger.debug('[summarization] episodes to summarize:', episodesToSummarize.length);
|
||||||
console.log('[summarization] total chars:', episodesToSummarize.reduce((s, ep) => s + ep.user_message.length + ep.ai_response.length, 0));
|
|
||||||
|
|
||||||
const content = await generateSummary(
|
const content = await generateSummary(
|
||||||
episodesToSummarize,
|
episodesToSummarize,
|
||||||
@@ -117,7 +126,7 @@ async function maybeSummarize(session, allEpisodes) {
|
|||||||
episodeRange,
|
episodeRange,
|
||||||
}),
|
}),
|
||||||
});
|
});
|
||||||
console.log(`[summarization] Created new summary for session ${session.id}`);
|
logger.debug(`[summarization] Created new summary for session ${session.id}`);
|
||||||
} else {
|
} else {
|
||||||
await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
|
await fetch(`${MEMORY_URL}/summaries/${latest.id}`, {
|
||||||
method: 'PATCH',
|
method: 'PATCH',
|
||||||
@@ -128,14 +137,14 @@ async function maybeSummarize(session, allEpisodes) {
|
|||||||
episodeRange,
|
episodeRange,
|
||||||
}),
|
}),
|
||||||
});
|
});
|
||||||
console.log(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
|
logger.debug(`[summarization] Updated summary ${latest.id} for session ${session.id}`);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
async function triggerSummary(session, allEpisodes) {
|
async function triggerSummary(session, allEpisodes) {
|
||||||
// Intentionally fire-and-forget — caller doesn't await this
|
// Intentionally fire-and-forget — caller doesn't await this
|
||||||
maybeSummarize(session, allEpisodes).catch(err =>
|
maybeSummarize(session, allEpisodes).catch(err =>
|
||||||
console.warn('[summarization] Summary failed (non-critical):', err.message)
|
logger.warn('[summarization] Summary failed (non-critical):', err.message)
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -24,10 +24,13 @@ const EPISODIC = {
|
|||||||
const ORCHESTRATION = {
|
const ORCHESTRATION = {
|
||||||
RECENT_EPISODE_LIMIT: 5,
|
RECENT_EPISODE_LIMIT: 5,
|
||||||
SEMANTIC_LIMIT: 5,
|
SEMANTIC_LIMIT: 5,
|
||||||
SCORE_THRESHOLD: 0.75,
|
SCORE_THRESHOLD: 0.5,
|
||||||
ENTITIES_LIMIT: 5,
|
ENTITIES_LIMIT: 5,
|
||||||
ENTITIES_THRESHOLD: 0.75,
|
ENTITIES_THRESHOLD: 0.55,
|
||||||
TEMPERATURE: 0.7,
|
TEMPERATURE: 0.7,
|
||||||
|
CONTEXT_BUDGET: 4096,
|
||||||
|
ENTITY_WEIGHT: 0.5,
|
||||||
|
MIN_RECENT_EPISODES: 2,
|
||||||
CORS_ORIGIN: 'http://localhost:5173',
|
CORS_ORIGIN: 'http://localhost:5173',
|
||||||
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
|
SYSTEM_PROMPT: `You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses.`
|
||||||
}
|
}
|
||||||
@@ -73,7 +76,35 @@ const SUMMARIES = {
|
|||||||
THRESHOLD_TOKENS: 200, //trigger summary when session hits this many tokens
|
THRESHOLD_TOKENS: 200, //trigger summary when session hits this many tokens
|
||||||
MAX_SUMMARY_TOKENS: 800, //if existing summary exceeds this, create new instead of update
|
MAX_SUMMARY_TOKENS: 800, //if existing summary exceeds this, create new instead of update
|
||||||
MIN_EPISODES_SINCE: 5, // don't resummarize until N new episodes since last summary
|
MIN_EPISODES_SINCE: 5, // don't resummarize until N new episodes since last summary
|
||||||
|
MAX_SUMMARY_CHARS: 8000, // max chars to include from recent episodes when generating summary (to control prompt size)
|
||||||
|
MAX_PROJECT_EPISODE_LIMIT: 200, // max number of episodes to consider from the entire project when generating summary (to control prompt size)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const ENTITIES = {
|
||||||
|
TEMPERATURE: 0.1, // Low temperature, more precise extraction, less creative
|
||||||
|
NUM_PREDICT: 1500, // Max tokens to consider for entity extraction (e.g. recent conversation)
|
||||||
|
THRESHOLD: 0.55, // Minimum confidence score for an extracted entity to be included in the results
|
||||||
|
PROMOTION_THRESHOLD: 3, // mention_count threshold before entity is considered well-established
|
||||||
|
GRAPH_HOP_DEPTH: 1, // Default traversal depth for neighborhood queries
|
||||||
|
TYPES: [
|
||||||
|
'person',
|
||||||
|
'place',
|
||||||
|
'project',
|
||||||
|
'technology',
|
||||||
|
'concept',
|
||||||
|
'organization',
|
||||||
|
'character',
|
||||||
|
'event',
|
||||||
|
'topic'
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
|
const RETRIEVAL = {
|
||||||
|
RRF_K: 60, // Reciprocal Rank Fusion smoothing constant, softens rank-1 advantage, not exposed in settings
|
||||||
|
SEMANTIC_WEIGHT: 1.0, // Weight applied to semantic (QDrant) results
|
||||||
|
KEYWORD_WEIGHT: 0, // Weight applied to keyword (SQLite) results, 0 = disables, set >0 to enable and tune balance between semantic vs keyword matches
|
||||||
|
}
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
QDRANT,
|
QDRANT,
|
||||||
COLLECTIONS,
|
COLLECTIONS,
|
||||||
@@ -85,5 +116,7 @@ module.exports = {
|
|||||||
INFERENCE_DEFAULTS,
|
INFERENCE_DEFAULTS,
|
||||||
SQLITE,
|
SQLITE,
|
||||||
ORCHESTRATION,
|
ORCHESTRATION,
|
||||||
SUMMARIES
|
SUMMARIES,
|
||||||
|
ENTITIES,
|
||||||
|
RETRIEVAL,
|
||||||
};
|
};
|
||||||
@@ -1,6 +1,7 @@
|
|||||||
const {getEnv} = require('./config/env');
|
const {getEnv} = require('./config/env');
|
||||||
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES } = require('./config/constants');
|
const {QDRANT, COLLECTIONS, EPISODIC, SERVICES, OLLAMA, PORTS, LLAMACPP, INFERENCE_DEFAULTS, SQLITE, ORCHESTRATION, SUMMARIES, ENTITIES, RETRIEVAL } = require('./config/constants');
|
||||||
const {parseRow, formatEpisodeText} = require('./utils')
|
const {parseRow, formatEpisodeText} = require('./utils')
|
||||||
|
const logger = require('./utils/logger');
|
||||||
|
|
||||||
module.exports = {
|
module.exports = {
|
||||||
getEnv,
|
getEnv,
|
||||||
@@ -17,4 +18,7 @@ module.exports = {
|
|||||||
parseRow,
|
parseRow,
|
||||||
formatEpisodeText,
|
formatEpisodeText,
|
||||||
SUMMARIES,
|
SUMMARIES,
|
||||||
|
ENTITIES,
|
||||||
|
logger,
|
||||||
|
RETRIEVAL,
|
||||||
};
|
};
|
||||||
12
packages/shared/src/utils/logger.js
Normal file
12
packages/shared/src/utils/logger.js
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
const LEVELS = { error: 0, warn: 1, info: 2, debug: 3 };
|
||||||
|
|
||||||
|
const current = LEVELS[process.env.LOG_LEVEL?.toLowerCase()] ?? LEVELS.info;
|
||||||
|
|
||||||
|
const logger = {
|
||||||
|
error: (...args) => current >= LEVELS.error && console.error('[ERROR]', ...args),
|
||||||
|
warn: (...args) => current >= LEVELS.warn && console.warn( '[WARN]', ...args),
|
||||||
|
info: (...args) => current >= LEVELS.info && console.log( '[INFO]', ...args),
|
||||||
|
debug: (...args) => current >= LEVELS.debug && console.log( '[DEBUG]', ...args),
|
||||||
|
};
|
||||||
|
|
||||||
|
module.exports = logger;
|
||||||
67
test-fusion.js
Normal file
67
test-fusion.js
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
// test-fusion.js
|
||||||
|
const { RETRIEVAL } = require('./packages/shared/src/config/constants');
|
||||||
|
|
||||||
|
function fuseEpisodeResults(semanticEps, keywordEps, { semanticWeight, keywordWeight, limit }) {
|
||||||
|
const k = RETRIEVAL.RRF_K;
|
||||||
|
const scores = new Map();
|
||||||
|
semanticEps.forEach((ep, i) => {
|
||||||
|
scores.set(ep.id, { episode: ep, score: semanticWeight / (k + i + 1) });
|
||||||
|
});
|
||||||
|
keywordEps.forEach((ep, i) => {
|
||||||
|
const contrib = keywordWeight / (k + i + 1);
|
||||||
|
if (scores.has(ep.id)) {
|
||||||
|
scores.get(ep.id).score += contrib;
|
||||||
|
} else if (contrib > 0) {
|
||||||
|
scores.set(ep.id, { episode: ep, score: contrib });
|
||||||
|
}
|
||||||
|
});
|
||||||
|
return [...scores.values()]
|
||||||
|
.sort((a, b) => b.score - a.score)
|
||||||
|
.slice(0, limit)
|
||||||
|
.map(({ episode }) => episode);
|
||||||
|
}
|
||||||
|
|
||||||
|
// --- Test 1: episodes in both lists rank highest ---
|
||||||
|
const semantic = [
|
||||||
|
{ id: 1, user_message: 'ep1 — semantic only, rank 1' },
|
||||||
|
{ id: 2, user_message: 'ep2 — in both lists, rank 2 semantic' },
|
||||||
|
{ id: 3, user_message: 'ep3 — in both lists, rank 3 semantic' },
|
||||||
|
];
|
||||||
|
const keyword = [
|
||||||
|
{ id: 3, user_message: 'ep3 — rank 1 FTS' },
|
||||||
|
{ id: 2, user_message: 'ep2 — rank 2 FTS' },
|
||||||
|
{ id: 4, user_message: 'ep4 — FTS only, rank 3' },
|
||||||
|
];
|
||||||
|
|
||||||
|
const result = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||||
|
console.log('Test 1 — equal weights, episodes in both lists should rank highest:');
|
||||||
|
result.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id} "${ep.user_message}"`));
|
||||||
|
console.assert(result[0].id === 2 || result[0].id === 3, 'FAIL: ep2 or ep3 should be rank 1');
|
||||||
|
console.assert(!result.find(e => e.id === 1) || result.indexOf(result.find(e => e.id === 1)) > result.indexOf(result.find(e => e.id === 2)), 'FAIL: ep1 (semantic only) should rank below ep2');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 2: keywordWeight:0 → pure semantic passthrough ---
|
||||||
|
const result2 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 0, limit: 5 });
|
||||||
|
console.log('Test 2 — keywordWeight:0 should return only semantic results in original order:');
|
||||||
|
result2.forEach((ep, i) => console.log(` ${i + 1}. id=${ep.id}`));
|
||||||
|
console.assert(result2.length === 3, `FAIL: expected 3, got ${result2.length}`);
|
||||||
|
console.assert(result2[0].id === 1, 'FAIL: ep1 should be rank 1');
|
||||||
|
console.assert(result2[1].id === 2, 'FAIL: ep2 should be rank 2');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 3: limit is respected ---
|
||||||
|
const result3 = fuseEpisodeResults(semantic, keyword, { semanticWeight: 1, keywordWeight: 1, limit: 2 });
|
||||||
|
console.log('Test 3 — limit:2 should return exactly 2 results:');
|
||||||
|
console.assert(result3.length === 2, `FAIL: expected 2, got ${result3.length}`);
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
// --- Test 4: no overlap → all unique episodes, ordered by individual contribution ---
|
||||||
|
const semOnly = [{ id: 10, user_message: 'sem' }];
|
||||||
|
const ftsOnly = [{ id: 20, user_message: 'fts' }];
|
||||||
|
const result4 = fuseEpisodeResults(semOnly, ftsOnly, { semanticWeight: 1, keywordWeight: 1, limit: 5 });
|
||||||
|
console.log('Test 4 — no overlap, both should appear:');
|
||||||
|
console.assert(result4.length === 2, `FAIL: expected 2, got ${result4.length}`);
|
||||||
|
console.assert(result4[0].id === 10, 'FAIL: semantic rank-1 should beat fts rank-1 (same weight, both rank 1, but semantic inserted first — tie goes to semantic)');
|
||||||
|
console.log(' PASS\n');
|
||||||
|
|
||||||
|
console.log('All tests passed.');
|
||||||
Reference in New Issue
Block a user