diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100644 index 0000000..1688416 --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,227 @@ +# NexusAI — Master Roadmap + +> A modular, memory-centric AI assistant and personal second brain. +> Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp. +> Repo: `https://gitea.jellystorm.com/storme/nexusAI` + +--- + +## Current State (Completed) + +### Backend — Core Four Services +- ✅ **Shared package** — `getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`) +- ✅ **Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path +- ✅ **Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch` +- ✅ **Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE) +- ✅ **Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back + +### Memory System +- ✅ Episodic memory — full conversation history in SQLite +- ✅ Semantic memory — Qdrant vector search across episodes and entities +- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama) +- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates +- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions + +### Chat Client +- ✅ React/Vite frontend served via Caddy +- ✅ Sidebar navigation — recent chats, projects, settings +- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView +- ✅ Session management — auto-naming, project assignment, SessionModal +- ✅ Streaming chat interface — SSE token-by-token rendering +- ✅ Memory viewer — episode browsing, deletion, health panel +- ✅ Settings panel — models section, configuration + +### Infrastructure +- ✅ Caddy reverse proxy with Authelia SSO +- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM) +- ✅ npm workspaces monorepo +- ✅ Gitea self-hosted repo + +--- + +## Phase 1 — Loose Ends & Stability +*Target: Next development session (Saturday)* + +### Bug Fixes +- [ ] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON +- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches + +### Tech Debt +- [ ] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production +- [ ] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape +- [ ] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config +- [ ] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules + +--- + +## Phase 2 — Memory System Upgrades +*The core intelligence layer* + +### 1. Knowledge Graph (SQLite) +The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things." +- [ ] Graph schema — `nodes` and `edges` tables with typed relationships +- [ ] Entity → node promotion pipeline +- [ ] Relationship traversal queries +- [ ] Graph-aware context assembly in orchestration + +### 2. Retrieval Fusion + Full-Text Search +Multi-strategy retrieval merged into a single ranked result set. +- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results +- [ ] Configurable weights per retrieval strategy +- [ ] Score threshold tuning per collection + +### 3. Memory Consolidation Lifecycle +Prevents long-term memory degradation and enables compression. +- [ ] Episode aging — score/weight episodes by recency and access frequency +- [ ] Consolidation pass — merge related low-weight episodes into summary nodes +- [ ] Orphan cleanup — remove entities no longer referenced by active episodes + +### 4. User Preference Model +Automatically maintained profile injected into every system prompt. +- [ ] Preference schema — communication style, interests, known facts, tone preferences +- [ ] Auto-update from conversation history +- [ ] Manual override / review UI + +### 5. Confidence-Based Routing *(inspired by acid2lake)* +Short-circuit simple requests before they reach the LLM. +- [ ] Intent classifier in orchestration — categorise incoming messages +- [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context) +- [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls + +### 6. Smarter Context Assembly *(inspired by acid2lake)* +Budget-aware context selection instead of dumping all relevant memory into the prompt. +- [ ] Token budget manager in orchestration +- [ ] Priority scoring — recency × relevance × entity weight +- [ ] Configurable context budget via env var + +### 7. Procedural Memory Store *(inspired by acid2lake)* +Learns "how NexusAI has successfully handled this type of request before." +- [ ] Procedural memory schema — trigger pattern, steps, success count, confidence +- [ ] Auto-population from successful interaction traces +- [ ] Procedural context injection for matched request types + +### 8. Reflection / Self-Summarization +NexusAI periodically reviews and synthesises its own memory. +- [ ] Scheduled reflection pass — background job, configurable interval +- [ ] Cross-session insight extraction +- [ ] Summary nodes written back to knowledge graph +- *Requires: Knowledge graph + consolidation lifecycle* + +### 9. Proactive Agent Loop +The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps. +- [ ] Tool calling framework in orchestration +- [ ] Built-in tools — memory search, entity lookup, summarize, web fetch +- [ ] Reasoning loop — think → act → observe → respond +- [ ] Agent mode toggle per session +- *Requires: All Phase 2 items above* + +--- + +## Phase 3 — Client Features +*Making the daily driver experience excellent* + +### Core Chat Enhancements +- [ ] Message regeneration — re-roll last AI response +- [ ] Edit & resend — edit a previous message, clear subsequent history +- [ ] Copy message button — hover icon per message +- [ ] Message timestamps — subtle, toggleable +- [ ] Token count display — per-response usage indicator + +### Memory Visibility +- [ ] **"What I remember" panel** — show which episodes/entities were injected into context +- [ ] Memory pinning — mark episodes as always-include +- [ ] Session summary view — on-demand or auto-generated session summary +- [ ] Memory attribution — subtle indicator on responses that were memory-informed + +### Session & Project Management +- [ ] Session search — full-text search across all sessions +- [ ] Session tagging — freeform tags beyond project assignment +- [ ] Session export — download as markdown or JSON +- [ ] Pinned sessions — pin frequently used sessions to sidebar top +- [ ] Bulk session actions — delete, move to project + +### Model & Persona Controls *(high priority — circles back to companion origins)* +- [ ] Per-session model switching — override default model per session +- [ ] System prompt editor — per-session and per-project custom prompts +- [ ] Persona profiles — saved configurations (model + system prompt + temperature) + - Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode" +- [ ] Temperature / parameter sliders — collapsible panel for power users + +### Second Brain Features +- [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat +- [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships +- [ ] Memory search page — dedicated search UI across all episodes and entities +- [ ] Daily digest — generated summary of recent activity and learned facts + +### Quality of Life +- [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send +- [ ] Dark/light theme toggle +- [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs +- [ ] Notification support — browser notifications for long completions + +--- + +## Phase 4 — Coding Copilot +*After core is feature-complete* + +### Project Directory Awareness +- [ ] Directory watcher service — monitors a VS Code workspace for changes +- [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite +- [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save +- [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag + +### Coding-Specific Memory +- [ ] Procedural patterns per language/framework — stored in procedural memory layer +- [ ] Skill compilation — successful coding solutions abstracted into reusable patterns +- [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent + +--- + +## Phase 5 — Stretch Goals + +### Voice Layer +- [ ] TTS output — text-to-speech for AI responses +- [ ] STT input — speech-to-text for voice messages +- [ ] Hardware-dependent — deferred until appropriate hardware available +- *Architecturally clean addition — new input/output modality only* + +### Homelab Enhancements +- [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data +- [ ] Security hardening — network segmentation, service-level auth +- [ ] IP webcam integration +- [ ] Home Assistant integration + +--- + +## Architecture Reference + +### Services & Nodes + +| Service | Host | Port | Role | +|---|---|---|---| +| Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` | +| Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD | +| Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation | +| Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections | +| Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management | +| Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy | +| Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO | + +### Primary Models + +| Role | Model | Notes | +|---|---|---| +| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical | +| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | | +| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | | +| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization | + +### Key Design Principles +- **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer +- **Fire-and-forget async** — embedding and entity extraction never block the chat response +- **All services read settings on every request** — no restart required for config changes +- **Backend-first development** — data layer → endpoints → orchestration proxy → frontend + +--- + +*Last updated: April 2026* \ No newline at end of file