5.5 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Development Commands
# Start individual services
npm run memory # Memory Service (port 3002)
npm run embedding # Embedding Service (port 3003)
npm run inference # Inference Service (port 3001)
npm run orchestration # Orchestration Service (port 4000)
npm run mini1 # Start memory + embedding concurrently
# Per-service dev mode (with --watch)
npm -w packages/<service-name> run dev
# Chat client
npm -w packages/chat-client run dev # Vite dev server (port 5173)
npm -w packages/chat-client run build # Production build
No test framework or linter is configured.
Architecture Overview
NexusAI is a modular AI assistant with persistent, project-scoped memory. It's a Node.js monorepo (npm workspaces) with 4 independent backend services, 1 React frontend, and 1 shared package.
Services
| Package | Port | Role |
|---|---|---|
orchestration-service |
4000 | Central gateway; coordinates all others |
memory-service |
3002 | SQLite + Qdrant hybrid memory |
embedding-service |
3003 | Text embeddings via Ollama (nomic-embed-text, 768-dim) |
inference-service |
3001 | LLM inference (Ollama or llama.cpp) |
chat-client |
5173 | React/Vite frontend |
shared |
— | Constants, env helpers, logger, formatters |
All inter-service communication is REST HTTP only — no message queues or WebSockets.
Chat Request Flow
- Client POSTs to orchestration
/chat/stream - Orchestration resolves session, fetches recent episodes (SQLite) + semantic episodes (Qdrant vector search) + entities (Qdrant, scoped by project)
- Embedding computed for user message (embedding-service)
- Prompt assembled: system message → entities → semantic memories → recent episodes → user message
- Inference streams response (inference-service)
- Episode stored in SQLite + Qdrant (fire-and-forget embedding)
- Entity extraction triggered async (qwen2.5:3b via inference-service)
- Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
- Auto-naming on first message (temp 0.3, 20 tokens max)
Memory Model
Dual store — neither works alone:
- SQLite (
better-sqlite3, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index - Qdrant — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward
Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.
Project-scoped isolation: Sessions grouped into projects; Qdrant queries use should filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.
Key File Locations
Orchestration (packages/orchestration-service/src/):
chat/index.js— Core prompt building and memory assemblyroutes/— HTTP endpoints: chat, sessions, projects, episodes, models, settings, summariesservices/— Thin HTTP clients for memory, embedding, inference, and direct Qdrant accessconfig/settings.js— Loads/savesdata/settings.json(user-tunable: model params, thresholds, system prompt)
Memory (packages/memory-service/src/):
db/schema.js— SQLite table definitions (source of truth for data model)episodic/— Episode CRUDsemantic/— Qdrant operationsentities/— Entity extraction + CRUDsummarization/— Project summary generation
Shared (packages/shared/src/):
config/constants.js— All tunables (ports, thresholds, model names, vector size)config/env.js—getEnv()helper with fallback to constantsutils.js—parseRow(),formatEpisodeText(),logger
Frontend (packages/chat-client/src/):
App.jsx— View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)hooks/—useChat,useSession,useModels,useProjects,useSettings,useContextMenuapi/orchestration.js— Fetch wrapper for all API calls- Vite proxy points to
192.168.0.205:4000(Mini PC 2 / orchestration)
Configuration
Each service uses .env via dotenv, falling back to packages/shared/src/config/constants.js. The orchestration service also serves data/settings.json to the frontend via /settings — this is the single source of truth for user-facing inference parameters and system prompt.
Deployment
Home lab across 3 nodes, managed with Docker Compose:
- Main PC — RTX A4000 (inference via llama.cpp)
- Mini PC 1 — memory + embedding services, Qdrant, Ollama
- Mini PC 2 — orchestration + chat client, Caddy reverse proxy + Authelia SSO
Docker Compose files: docker-compose.mini1.yml, docker-compose.mini2.yml. All services expose /health. Deployment docs: docs/deployment/homelab.md.
Key Development Principles
- Layer-by-layer validation — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
- New orchestration routes require changes in four places: route file,
orchestration-service/src/index.js, Caddyfile on Mini PC 2 (192.168.0.205), andvite.config.jsin the chat client - All services read settings on every request — no restart required for config changes
- Backend-first development — data layer → service endpoints → orchestration proxy → frontend