Files
nexusAI/CLAUDE.md
Storme-bit 5ad01c6ad8 clean up
2026-04-27 00:14:51 -07:00

5.5 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Commands

# Start individual services
npm run memory           # Memory Service (port 3002)
npm run embedding        # Embedding Service (port 3003)
npm run inference        # Inference Service (port 3001)
npm run orchestration    # Orchestration Service (port 4000)
npm run mini1            # Start memory + embedding concurrently

# Per-service dev mode (with --watch)
npm -w packages/<service-name> run dev

# Chat client
npm -w packages/chat-client run dev      # Vite dev server (port 5173)
npm -w packages/chat-client run build    # Production build

No test framework or linter is configured.

Architecture Overview

NexusAI is a modular AI assistant with persistent, project-scoped memory. It's a Node.js monorepo (npm workspaces) with 4 independent backend services, 1 React frontend, and 1 shared package.

Services

Package Port Role
orchestration-service 4000 Central gateway; coordinates all others
memory-service 3002 SQLite + Qdrant hybrid memory
embedding-service 3003 Text embeddings via Ollama (nomic-embed-text, 768-dim)
inference-service 3001 LLM inference (Ollama or llama.cpp)
chat-client 5173 React/Vite frontend
shared Constants, env helpers, logger, formatters

All inter-service communication is REST HTTP only — no message queues or WebSockets.

Chat Request Flow

  1. Client POSTs to orchestration /chat/stream
  2. Orchestration resolves session, fetches recent episodes (SQLite) + semantic episodes (Qdrant vector search) + entities (Qdrant, scoped by project)
  3. Embedding computed for user message (embedding-service)
  4. Prompt assembled: system message → entities → semantic memories → recent episodes → user message
  5. Inference streams response (inference-service)
  6. Episode stored in SQLite + Qdrant (fire-and-forget embedding)
  7. Entity extraction triggered async (qwen2.5:3b via inference-service)
  8. Auto-summarization checked (threshold: 200+ tokens, re-triggers every 5 episodes)
  9. Auto-naming on first message (temp 0.3, 20 tokens max)

Memory Model

Dual store — neither works alone:

  • SQLite (better-sqlite3, synchronous) — Full content: sessions, episodes, entities, relationships, projects, summaries, FTS5 index
  • Qdrant — Vector embeddings for semantic search; IDs used to fetch full content from SQLite afterward

Orchestration queries Qdrant directly (bypasses memory-service) for performance, then fetches full episode content from memory-service by ID.

Project-scoped isolation: Sessions grouped into projects; Qdrant queries use should filter on session IDs to enforce memory boundaries. Non-project sessions share a common pool.

Key File Locations

Orchestration (packages/orchestration-service/src/):

  • chat/index.js — Core prompt building and memory assembly
  • routes/ — HTTP endpoints: chat, sessions, projects, episodes, models, settings, summaries
  • services/ — Thin HTTP clients for memory, embedding, inference, and direct Qdrant access
  • config/settings.js — Loads/saves data/settings.json (user-tunable: model params, thresholds, system prompt)

Memory (packages/memory-service/src/):

  • db/schema.js — SQLite table definitions (source of truth for data model)
  • episodic/ — Episode CRUD
  • semantic/ — Qdrant operations
  • entities/ — Entity extraction + CRUD
  • summarization/ — Project summary generation

Shared (packages/shared/src/):

  • config/constants.js — All tunables (ports, thresholds, model names, vector size)
  • config/env.jsgetEnv() helper with fallback to constants
  • utils.jsparseRow(), formatEpisodeText(), logger

Frontend (packages/chat-client/src/):

  • App.jsx — View router and top-level state (views: home, chat, all-chats, all-projects, project, memory, summaries, settings)
  • hooks/useChat, useSession, useModels, useProjects, useSettings, useContextMenu
  • api/orchestration.js — Fetch wrapper for all API calls
  • Vite proxy points to 192.168.0.205:4000 (Mini PC 2 / orchestration)

Configuration

Each service uses .env via dotenv, falling back to packages/shared/src/config/constants.js. The orchestration service also serves data/settings.json to the frontend via /settings — this is the single source of truth for user-facing inference parameters and system prompt.

Deployment

Home lab across 3 nodes, managed with Docker Compose:

  • Main PC — RTX A4000 (inference via llama.cpp)
  • Mini PC 1 — memory + embedding services, Qdrant, Ollama
  • Mini PC 2 — orchestration + chat client, Caddy reverse proxy + Authelia SSO

Docker Compose files: docker-compose.mini1.yml, docker-compose.mini2.yml. All services expose /health. Deployment docs: docs/deployment/homelab.md.

Key Development Principles

  • Layer-by-layer validation — always build and test backend → orchestration → frontend in sequence, curl-testing each layer before proceeding
  • New orchestration routes require changes in four places: route file, orchestration-service/src/index.js, Caddyfile on Mini PC 2 (192.168.0.205), and vite.config.js in the chat client
  • All services read settings on every request — no restart required for config changes
  • Backend-first development — data layer → service endpoints → orchestration proxy → frontend