Files
nexusAI/docs/roadmap.md
2026-04-27 20:17:05 -07:00

228 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# NexusAI — Master Roadmap
> A modular, memory-centric AI assistant and personal second brain.
> Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
> Repo: `https://gitea.jellystorm.com/storme/nexusAI`
---
## Current State (Completed)
### Backend — Core Four Services
-**Shared package**`getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`)
-**Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
-**Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch`
-**Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE)
-**Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back
### Memory System
- ✅ Episodic memory — full conversation history in SQLite
- ✅ Semantic memory — Qdrant vector search across episodes and entities
- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates
- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions
### Chat Client
- ✅ React/Vite frontend served via Caddy
- ✅ Sidebar navigation — recent chats, projects, settings
- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView
- ✅ Session management — auto-naming, project assignment, SessionModal
- ✅ Streaming chat interface — SSE token-by-token rendering
- ✅ Memory viewer — episode browsing, deletion, health panel
- ✅ Settings panel — models section, configuration
### Infrastructure
- ✅ Caddy reverse proxy with Authelia SSO
- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
- ✅ npm workspaces monorepo
- ✅ Gitea self-hosted repo
---
## Phase 1 — Loose Ends & Stability - COMPLETE ✅
*Target: Next development session (Saturday)*
### Bug Fixes
**Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
**Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
### Tech Debt
**Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
**Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
**Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
**Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
---
## Phase 2 — Memory System Upgrades
*The core intelligence layer*
### 1. Knowledge Graph (SQLite) ✅
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
- [x] Graph schema — `nodes` and `edges` tables with typed relationships
- [x] Entity → node promotion pipeline (`mention_count` tracked; threshold gating deferred to Phase 2)
- [x] Relationship traversal queries
- [x] Graph-aware context assembly in orchestration
### 2. Retrieval Fusion + Full-Text Search ✅
Multi-strategy retrieval merged into a single ranked result set.
- [x] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- [x] Configurable weights per retrieval strategy (`semanticWeight`, `keywordWeight` via `PATCH /settings`)
- [x] Score threshold retained per-strategy; FTS scoped to session/project sessions; `keywordWeight: 0` default (disabled until tuned)
### 3. Memory Consolidation Lifecycle
Prevents long-term memory degradation and enables compression.
- [ ] Episode aging — score/weight episodes by recency and access frequency
- [ ] Consolidation pass — merge related low-weight episodes into summary nodes
- [ ] Orphan cleanup — remove entities no longer referenced by active episodes
### 4. User Preference Model
Automatically maintained profile injected into every system prompt.
- [ ] Preference schema — communication style, interests, known facts, tone preferences
- [ ] Auto-update from conversation history
- [ ] Manual override / review UI
### 5. Confidence-Based Routing *(inspired by acid2lake)*
Short-circuit simple requests before they reach the LLM.
- [ ] Intent classifier in orchestration — categorise incoming messages
- [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
- [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls
### 6. Smarter Context Assembly *(inspired by acid2lake)*
Budget-aware context selection instead of dumping all relevant memory into the prompt.
- [ ] Token budget manager in orchestration
- [ ] Priority scoring — recency × relevance × entity weight
- [ ] Configurable context budget via env var
### 7. Procedural Memory Store *(inspired by acid2lake)*
Learns "how NexusAI has successfully handled this type of request before."
- [ ] Procedural memory schema — trigger pattern, steps, success count, confidence
- [ ] Auto-population from successful interaction traces
- [ ] Procedural context injection for matched request types
### 8. Reflection / Self-Summarization
NexusAI periodically reviews and synthesises its own memory.
- [ ] Scheduled reflection pass — background job, configurable interval
- [ ] Cross-session insight extraction
- [ ] Summary nodes written back to knowledge graph
- *Requires: Knowledge graph + consolidation lifecycle*
### 9. Proactive Agent Loop
The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.
- [ ] Tool calling framework in orchestration
- [ ] Built-in tools — memory search, entity lookup, summarize, web fetch
- [ ] Reasoning loop — think → act → observe → respond
- [ ] Agent mode toggle per session
- *Requires: All Phase 2 items above*
---
## Phase 3 — Client Features
*Making the daily driver experience excellent*
### Core Chat Enhancements
- [ ] Message regeneration — re-roll last AI response
- [ ] Edit & resend — edit a previous message, clear subsequent history
- [ ] Copy message button — hover icon per message
- [ ] Message timestamps — subtle, toggleable
- [ ] Token count display — per-response usage indicator
### Memory Visibility
- [ ] **"What I remember" panel** — show which episodes/entities were injected into context
- [ ] Memory pinning — mark episodes as always-include
- [x] Session summary view — on-demand or auto-generated session summary
- [ ] Memory attribution — subtle indicator on responses that were memory-informed
### Session & Project Management
- [ ] Session search — full-text search across all sessions
- [ ] Session tagging — freeform tags beyond project assignment
- [ ] Session export — download as markdown or JSON
- [ ] Pinned sessions — pin frequently used sessions to sidebar top
- [ ] Bulk session actions — delete, move to project
### Model & Persona Controls *(high priority — circles back to companion origins)*
- [ ] Per-session model switching — override default model per session
- [x] System prompt editor — per-project custom prompts
- [ ] System prompt editor — per-session custom prompts
- [ ] Persona profiles — saved configurations (model + system prompt + temperature)
- Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
- [ ] Temperature / parameter sliders — collapsible panel for power users
### Second Brain Features
- [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat
- [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships
- [ ] Memory search page — dedicated search UI across all episodes and entities
- [ ] Daily digest — generated summary of recent activity and learned facts
### Quality of Life
- [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send
- [ ] Dark/light theme toggle
- [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs
- [ ] Notification support — browser notifications for long completions
---
## Phase 4 — Coding Copilot
*After core is feature-complete*
### Project Directory Awareness
- [ ] Directory watcher service — monitors a VS Code workspace for changes
- [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
- [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save
- [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag
### Coding-Specific Memory
- [ ] Procedural patterns per language/framework — stored in procedural memory layer
- [ ] Skill compilation — successful coding solutions abstracted into reusable patterns
- [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent
---
## Phase 5 — Stretch Goals
### Voice Layer
- [ ] TTS output — text-to-speech for AI responses
- [ ] STT input — speech-to-text for voice messages
- [ ] Hardware-dependent — deferred until appropriate hardware available
- *Architecturally clean addition — new input/output modality only*
### Homelab Enhancements
- [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data
- [ ] Security hardening — network segmentation, service-level auth
- [ ] IP webcam integration
- [ ] Home Assistant integration
---
## Architecture Reference
### Services & Nodes
| Service | Host | Port | Role |
|---|---|---|---|
| Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` |
| Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD |
| Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation |
| Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections |
| Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management |
| Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy |
| Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO |
### Primary Models
| Role | Model | Notes |
|---|---|---|
| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical |
| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | |
| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | |
| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization |
### Key Design Principles
- **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer
- **Fire-and-forget async** — embedding and entity extraction never block the chat response
- **All services read settings on every request** — no restart required for config changes
- **Backend-first development** — data layer → endpoints → orchestration proxy → frontend
---
*Last updated: April 2026*