# NexusAI — Master Roadmap > A modular, memory-centric AI assistant and personal second brain. > Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp. > Repo: `https://gitea.jellystorm.com/storme/nexusAI` --- ## Current State (Completed) ### Backend — Core Four Services - ✅ **Shared package** — `getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`) - ✅ **Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path - ✅ **Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch` - ✅ **Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE) - ✅ **Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back ### Memory System - ✅ Episodic memory — full conversation history in SQLite - ✅ Semantic memory — Qdrant vector search across episodes and entities - ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama) - ✅ Automatic summarization — triggered at context threshold, cumulative summary updates - ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions ### Chat Client - ✅ React/Vite frontend served via Caddy - ✅ Sidebar navigation — recent chats, projects, settings - ✅ Project management — CRUD, colour coding, isolated flag, ProjectView - ✅ Session management — auto-naming, project assignment, SessionModal - ✅ Streaming chat interface — SSE token-by-token rendering - ✅ Memory viewer — episode browsing, deletion, health panel - ✅ Settings panel — models section, configuration ### Infrastructure - ✅ Caddy reverse proxy with Authelia SSO - ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM) - ✅ npm workspaces monorepo - ✅ Gitea self-hosted repo --- ## Phase 1 — Loose Ends & Stability - COMPLETE ✅ *Target: Next development session (Saturday)* ### Bug Fixes ✅ **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON ✅ **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches ### Tech Debt ✅ **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production ✅ **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape ✅ **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config ✅ **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules --- ## Phase 2 — Memory System Upgrades *The core intelligence layer* ### 1. Knowledge Graph (SQLite) The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things." - [x] Graph schema — `nodes` and `edges` tables with typed relationships - [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2) - [x] Relationship traversal queries - [x] Graph-aware context assembly in orchestration ### 2. Retrieval Fusion + Full-Text Search ✅ Multi-strategy retrieval merged into a single ranked result set. - [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results - [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`) - [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned) ### 3. Memory Consolidation Lifecycle Prevents long-term memory degradation and enables compression. - [ ] Episode aging — score/weight episodes by recency and access frequency - [ ] Consolidation pass — merge related low-weight episodes into summary nodes - [ ] Orphan cleanup — remove entities no longer referenced by active episodes ### 4. User Preference Model Automatically maintained profile injected into every system prompt. - [ ] Preference schema — communication style, interests, known facts, tone preferences - [ ] Auto-update from conversation history - [ ] Manual override / review UI ### 5. Confidence-Based Routing *(inspired by acid2lake)* Short-circuit simple requests before they reach the LLM. - [ ] Intent classifier in orchestration — categorise incoming messages - [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context) - [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls ### 6. Smarter Context Assembly *(inspired by acid2lake)* Budget-aware context selection instead of dumping all relevant memory into the prompt. - [ ] Token budget manager in orchestration - [ ] Priority scoring — recency × relevance × entity weight - [ ] Configurable context budget via env var ### 7. Procedural Memory Store *(inspired by acid2lake)* Learns "how NexusAI has successfully handled this type of request before." - [ ] Procedural memory schema — trigger pattern, steps, success count, confidence - [ ] Auto-population from successful interaction traces - [ ] Procedural context injection for matched request types ### 8. Reflection / Self-Summarization NexusAI periodically reviews and synthesises its own memory. - [ ] Scheduled reflection pass — background job, configurable interval - [ ] Cross-session insight extraction - [ ] Summary nodes written back to knowledge graph - *Requires: Knowledge graph + consolidation lifecycle* ### 9. Proactive Agent Loop The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps. - [ ] Tool calling framework in orchestration - [ ] Built-in tools — memory search, entity lookup, summarize, web fetch - [ ] Reasoning loop — think → act → observe → respond - [ ] Agent mode toggle per session - *Requires: All Phase 2 items above* --- ## Phase 3 — Client Features *Making the daily driver experience excellent* ### Core Chat Enhancements - [ ] Message regeneration — re-roll last AI response - [ ] Edit & resend — edit a previous message, clear subsequent history - [ ] Copy message button — hover icon per message - [ ] Message timestamps — subtle, toggleable - [ ] Token count display — per-response usage indicator ### Memory Visibility - [ ] **"What I remember" panel** — show which episodes/entities were injected into context - [ ] Memory pinning — mark episodes as always-include - [x] Session summary view — on-demand or auto-generated session summary - [ ] Memory attribution — subtle indicator on responses that were memory-informed ### Session & Project Management - [ ] Session search — full-text search across all sessions - [ ] Session tagging — freeform tags beyond project assignment - [ ] Session export — download as markdown or JSON - [ ] Pinned sessions — pin frequently used sessions to sidebar top - [ ] Bulk session actions — delete, move to project ### Model & Persona Controls *(high priority — circles back to companion origins)* - [ ] Per-session model switching — override default model per session - [x] System prompt editor — per-project custom prompts - [ ] System prompt editor — per-session custom prompts - [ ] Persona profiles — saved configurations (model + system prompt + temperature) - Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode" - [ ] Temperature / parameter sliders — collapsible panel for power users ### Second Brain Features - [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat - [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships - [ ] Memory search page — dedicated search UI across all episodes and entities - [ ] Daily digest — generated summary of recent activity and learned facts ### Quality of Life - [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send - [ ] Dark/light theme toggle - [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs - [ ] Notification support — browser notifications for long completions --- ## Phase 4 — Coding Copilot *After core is feature-complete* ### Project Directory Awareness - [ ] Directory watcher service — monitors a VS Code workspace for changes - [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite - [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save - [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag ### Coding-Specific Memory - [ ] Procedural patterns per language/framework — stored in procedural memory layer - [ ] Skill compilation — successful coding solutions abstracted into reusable patterns - [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent --- ## Phase 5 — Stretch Goals ### Voice Layer - [ ] TTS output — text-to-speech for AI responses - [ ] STT input — speech-to-text for voice messages - [ ] Hardware-dependent — deferred until appropriate hardware available - *Architecturally clean addition — new input/output modality only* ### Homelab Enhancements - [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data - [ ] Security hardening — network segmentation, service-level auth - [ ] IP webcam integration - [ ] Home Assistant integration --- ## Architecture Reference ### Services & Nodes | Service | Host | Port | Role | |---|---|---|---| | Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` | | Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD | | Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation | | Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections | | Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management | | Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy | | Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO | ### Primary Models | Role | Model | Notes | |---|---|---| | Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical | | Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | | | Coding | DeepSeek Coder V2 Lite Instruct Q6_K | | | Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization | ### Key Design Principles - **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer - **Fire-and-forget async** — embedding and entity extraction never block the chat response - **All services read settings on every request** — no restart required for config changes - **Backend-first development** — data layer → endpoints → orchestration proxy → frontend --- *Last updated: April 2026*