6.7 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
See the root CLAUDE.md for overall architecture, service roles, and the end-to-end chat flow.
Running This Service
npm run orchestration # From repo root (node src/index.js)
npm -w packages/orchestration-service run dev # With --watch
Default port: 4000. Depends on memory-service, embedding-service, inference-service, and Qdrant.
Context Assembly (src/chat/index.js)
assembleContext(externalId, userMessage) is the core function that builds the inference prompt. Order of operations:
- Resolve session by
externalId(creates it if missing — every chat call is self-healing). - If session has a
project_id, load the project and fetch all sibling sessions (viagetProjectSessions, hardcodedlimit=200). - Fetch
recentEpisodeLimitrecent episodes from memory-service. - Embed the user message; search Qdrant EPISODES with
scoreThreshold:- No project:
must: [sessionId == this session] - Project:
should: [sessionId == s1, sessionId == s2, ...]across all project sessions - Dedup against recent episode IDs before including.
- No project:
- Embed and search Qdrant ENTITIES; filter by
projectIdif applicable. - Build prompt in this fixed order: system prompt → entities → semantic episodes → recent episodes → user message → "Assistant:"
The ordering prioritizes established facts (entities) and relevant past context (semantic) over pure recency.
System Prompt Resolution
Priority from highest to lowest:
project.system_prompt(stored on the project row in memory-service)settings.systemPrompt(saved indata/settings.json)ORCHESTRATION.SYSTEM_PROMPT(shared constants fallback)
Settings (src/config/settings.js)
Settings are loaded from data/settings.json merged with defaults at every GET /settings call. PATCH /settings validates each field individually with specific constraints:
| Field | Constraint |
|---|---|
recentEpisodeLimit |
integer, 1–20 |
semanticLimit |
integer, 1–20 |
scoreThreshold |
number, 0–1 |
temperature |
number, 0–2 |
repeatPenalty |
number, 1–2 |
topP |
number, 0–1 |
topK |
integer, 1–100 |
modelsFolderPath |
path must exist and be readable |
systemPrompt |
string (trimmed); null reverts to shared default |
data/settings.json is created on first save. Parent directories are created if missing.
Streaming SSE (src/chat/index.js — chatStream)
The route sets SSE headers and delegates to chatStream, which:
- Calls
inference.completeStream()→ receives a raw HTTP Response with a readable body. - Reads the body in chunks, buffers across chunk boundaries, splits on
\n\n. - For each event line starting with
data:, parses the JSON and callsonChunk(data.response). - The
[DONE]sentinel (used by some llama-server versions) is explicitly ignored. - After stream ends, saves the assembled full response as an episode (same as non-streaming).
If a chunk parse fails the error is logged and the stream continues. If the response body closes with no text accumulated, the episode is not saved (logged as warning).
Fire-and-Forget Tasks
After every successful chat turn:
- Summarization (
services/summarization.js→triggerSummary): checks token threshold → recency guard → calls Ollama → POSTs to memory-service. Only runs ifSUMMARIES.THRESHOLD_TOKENSis exceeded AND at leastSUMMARIES.MIN_EPISODES_SINCEnew episodes have occurred since the last summary. - Auto-naming (
chat/index.js→autoNameSession): only fires on the first message of a session. Uses temp 0.3,maxTokens=20, prompts for a ≤5-word title.
Both tasks catch all errors and log warnings without surfacing to the client.
Summarization Recency Guard
src/services/summarization.js reads the episode_range field of the latest existing summary (format: "<startId>-<endId>"). It counts SQLite episodes with id > endId; if fewer than SUMMARIES.MIN_EPISODES_SINCE, it skips. This prevents rapid re-summarization on high-traffic sessions.
When the existing summary's token count exceeds SUMMARIES.MAX_SUMMARY_TOKENS, it is treated as "expired" — a fresh summary is generated instead of an incremental update.
Qdrant Calls (Direct, Not Via Memory-Service)
src/services/qdrant.js makes REST calls to Qdrant directly at QDRANT_URL. This bypasses memory-service for semantic search performance. Orchestration fetches episode/entity content from memory-service by ID after getting vector search results from Qdrant.
searchEntities checks projectId !== null && projectId !== undefined before applying the filter — a session with no project skips the filter entirely and searches globally.
Models Endpoint
GET /models scans modelsFolderPath for .gguf files and optionally reads a models.json manifest (keyed by filename) for labels and descriptions. File size is reported in GB. Returns 500 if the folder is inaccessible.
GET /models/props proxies /props from llama-server and returns {contextWindow, modelAlias}. Returns 503 if llama-server is unreachable.
Health Check
GET /health/services runs parallel fetch calls to all four dependent services with a 3-second AbortSignal.timeout each. Results are returned as an array — the endpoint never returns a non-2xx itself regardless of downstream status.
Background Model (qwen2.5:3b)
Used for entity extraction and summarization via Ollama on Mini PC 1. Uses ChatML
format (<|im_start|> / <|im_end|>) — not Phi3 format. Use format: 'json'
only for structured extraction, never for free-text summarization.
API Endpoints Quick Reference
| Method | Path | Notes |
|---|---|---|
| GET | /health |
Returns service URLs |
| GET | /health/services |
Parallel status of all dependencies |
| POST | /chat |
Blocking completion |
| POST | /chat/stream |
SSE streaming |
| GET/PATCH | /settings |
Persistent settings |
| GET | /models |
.gguf file scan |
| GET | /models/props |
llama-server model info |
| GET | /sessions |
Delegates to memory-service |
| GET | /sessions/:sessionId/history |
Paginated episodes by external ID |
| PATCH | /sessions/:sessionId |
name and/or projectId |
| DELETE | /sessions/:sessionId |
|
| GET | /episodes |
Delegates; supports q for FTS |
| DELETE | /episodes/:id |
Delegates |
| GET/POST/PATCH/DELETE | /projects and /projects/:id |
Delegates |
| POST | /summaries/project/:projectId/generate |
On-demand; 422 if no data |
| GET | /summaries/project/:projectId/overview |
|
| GET | /summaries/session/:sessionId |
Resolves external ID first |
| GET | /summaries/project/:projectId |