NexusAI roadmap addition
This commit is contained in:
227
docs/roadmap.md
Normal file
227
docs/roadmap.md
Normal file
@@ -0,0 +1,227 @@
|
|||||||
|
# NexusAI — Master Roadmap
|
||||||
|
|
||||||
|
> A modular, memory-centric AI assistant and personal second brain.
|
||||||
|
> Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
|
||||||
|
> Repo: `https://gitea.jellystorm.com/storme/nexusAI`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Current State (Completed)
|
||||||
|
|
||||||
|
### Backend — Core Four Services
|
||||||
|
- ✅ **Shared package** — `getEnv`, constants (`QDRANT`, `COLLECTIONS`, `EPISODIC`, `SERVICES`)
|
||||||
|
- ✅ **Memory service** (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
|
||||||
|
- ✅ **Embedding service** (port 3003, Mini PC 1) — `nomic-embed-text` via Ollama, 768-dim vectors, `/embed` and `/embed/batch`
|
||||||
|
- ✅ **Inference service** (port 3001, Main PC) — provider pattern (`INFERENCE_PROVIDER`), llama.cpp provider, `/complete` and `/complete/stream` (SSE)
|
||||||
|
- ✅ **Orchestration service** (port 4000, Mini PC 2) — `/chat` and `/chat/stream`, session auto-create, dual-layer context assembly (recency + semantic), episode write-back
|
||||||
|
|
||||||
|
### Memory System
|
||||||
|
- ✅ Episodic memory — full conversation history in SQLite
|
||||||
|
- ✅ Semantic memory — Qdrant vector search across episodes and entities
|
||||||
|
- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
|
||||||
|
- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates
|
||||||
|
- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions
|
||||||
|
|
||||||
|
### Chat Client
|
||||||
|
- ✅ React/Vite frontend served via Caddy
|
||||||
|
- ✅ Sidebar navigation — recent chats, projects, settings
|
||||||
|
- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView
|
||||||
|
- ✅ Session management — auto-naming, project assignment, SessionModal
|
||||||
|
- ✅ Streaming chat interface — SSE token-by-token rendering
|
||||||
|
- ✅ Memory viewer — episode browsing, deletion, health panel
|
||||||
|
- ✅ Settings panel — models section, configuration
|
||||||
|
|
||||||
|
### Infrastructure
|
||||||
|
- ✅ Caddy reverse proxy with Authelia SSO
|
||||||
|
- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
|
||||||
|
- ✅ npm workspaces monorepo
|
||||||
|
- ✅ Gitea self-hosted repo
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 1 — Loose Ends & Stability
|
||||||
|
*Target: Next development session (Saturday)*
|
||||||
|
|
||||||
|
### Bug Fixes
|
||||||
|
- [ ] **Entity extraction JSON parsing** — robustify response parser in `extraction.js` to handle model returning markdown fences or preamble around JSON
|
||||||
|
- [ ] **Qdrant entity search empty results** — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
|
||||||
|
|
||||||
|
### Tech Debt
|
||||||
|
- [ ] **Logging** — introduce `LOG_LEVEL` env var across all services; reduce noise in production
|
||||||
|
- [ ] **Error response consistency** — audit all endpoints for uniform `{ error, detail }` shape
|
||||||
|
- [ ] **Constants audit** — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
|
||||||
|
- [ ] **Orchestration `chat/index.js` review** — extract any logic that has grown beyond its intended scope into dedicated modules
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 2 — Memory System Upgrades
|
||||||
|
*The core intelligence layer*
|
||||||
|
|
||||||
|
### 1. Knowledge Graph (SQLite)
|
||||||
|
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
|
||||||
|
- [ ] Graph schema — `nodes` and `edges` tables with typed relationships
|
||||||
|
- [ ] Entity → node promotion pipeline
|
||||||
|
- [ ] Relationship traversal queries
|
||||||
|
- [ ] Graph-aware context assembly in orchestration
|
||||||
|
|
||||||
|
### 2. Retrieval Fusion + Full-Text Search
|
||||||
|
Multi-strategy retrieval merged into a single ranked result set.
|
||||||
|
- [ ] Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
|
||||||
|
- [ ] Configurable weights per retrieval strategy
|
||||||
|
- [ ] Score threshold tuning per collection
|
||||||
|
|
||||||
|
### 3. Memory Consolidation Lifecycle
|
||||||
|
Prevents long-term memory degradation and enables compression.
|
||||||
|
- [ ] Episode aging — score/weight episodes by recency and access frequency
|
||||||
|
- [ ] Consolidation pass — merge related low-weight episodes into summary nodes
|
||||||
|
- [ ] Orphan cleanup — remove entities no longer referenced by active episodes
|
||||||
|
|
||||||
|
### 4. User Preference Model
|
||||||
|
Automatically maintained profile injected into every system prompt.
|
||||||
|
- [ ] Preference schema — communication style, interests, known facts, tone preferences
|
||||||
|
- [ ] Auto-update from conversation history
|
||||||
|
- [ ] Manual override / review UI
|
||||||
|
|
||||||
|
### 5. Confidence-Based Routing *(inspired by acid2lake)*
|
||||||
|
Short-circuit simple requests before they reach the LLM.
|
||||||
|
- [ ] Intent classifier in orchestration — categorise incoming messages
|
||||||
|
- [ ] Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
|
||||||
|
- [ ] Fast-path handlers — direct memory queries, session lookups, factual recalls
|
||||||
|
|
||||||
|
### 6. Smarter Context Assembly *(inspired by acid2lake)*
|
||||||
|
Budget-aware context selection instead of dumping all relevant memory into the prompt.
|
||||||
|
- [ ] Token budget manager in orchestration
|
||||||
|
- [ ] Priority scoring — recency × relevance × entity weight
|
||||||
|
- [ ] Configurable context budget via env var
|
||||||
|
|
||||||
|
### 7. Procedural Memory Store *(inspired by acid2lake)*
|
||||||
|
Learns "how NexusAI has successfully handled this type of request before."
|
||||||
|
- [ ] Procedural memory schema — trigger pattern, steps, success count, confidence
|
||||||
|
- [ ] Auto-population from successful interaction traces
|
||||||
|
- [ ] Procedural context injection for matched request types
|
||||||
|
|
||||||
|
### 8. Reflection / Self-Summarization
|
||||||
|
NexusAI periodically reviews and synthesises its own memory.
|
||||||
|
- [ ] Scheduled reflection pass — background job, configurable interval
|
||||||
|
- [ ] Cross-session insight extraction
|
||||||
|
- [ ] Summary nodes written back to knowledge graph
|
||||||
|
- *Requires: Knowledge graph + consolidation lifecycle*
|
||||||
|
|
||||||
|
### 9. Proactive Agent Loop
|
||||||
|
The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.
|
||||||
|
- [ ] Tool calling framework in orchestration
|
||||||
|
- [ ] Built-in tools — memory search, entity lookup, summarize, web fetch
|
||||||
|
- [ ] Reasoning loop — think → act → observe → respond
|
||||||
|
- [ ] Agent mode toggle per session
|
||||||
|
- *Requires: All Phase 2 items above*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 3 — Client Features
|
||||||
|
*Making the daily driver experience excellent*
|
||||||
|
|
||||||
|
### Core Chat Enhancements
|
||||||
|
- [ ] Message regeneration — re-roll last AI response
|
||||||
|
- [ ] Edit & resend — edit a previous message, clear subsequent history
|
||||||
|
- [ ] Copy message button — hover icon per message
|
||||||
|
- [ ] Message timestamps — subtle, toggleable
|
||||||
|
- [ ] Token count display — per-response usage indicator
|
||||||
|
|
||||||
|
### Memory Visibility
|
||||||
|
- [ ] **"What I remember" panel** — show which episodes/entities were injected into context
|
||||||
|
- [ ] Memory pinning — mark episodes as always-include
|
||||||
|
- [ ] Session summary view — on-demand or auto-generated session summary
|
||||||
|
- [ ] Memory attribution — subtle indicator on responses that were memory-informed
|
||||||
|
|
||||||
|
### Session & Project Management
|
||||||
|
- [ ] Session search — full-text search across all sessions
|
||||||
|
- [ ] Session tagging — freeform tags beyond project assignment
|
||||||
|
- [ ] Session export — download as markdown or JSON
|
||||||
|
- [ ] Pinned sessions — pin frequently used sessions to sidebar top
|
||||||
|
- [ ] Bulk session actions — delete, move to project
|
||||||
|
|
||||||
|
### Model & Persona Controls *(high priority — circles back to companion origins)*
|
||||||
|
- [ ] Per-session model switching — override default model per session
|
||||||
|
- [ ] System prompt editor — per-session and per-project custom prompts
|
||||||
|
- [ ] Persona profiles — saved configurations (model + system prompt + temperature)
|
||||||
|
- Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
|
||||||
|
- [ ] Temperature / parameter sliders — collapsible panel for power users
|
||||||
|
|
||||||
|
### Second Brain Features
|
||||||
|
- [ ] **Quick capture** — minimal input to save a thought directly to memory without starting a chat
|
||||||
|
- [ ] **Knowledge graph visualiser** — interactive node/edge view of entities and relationships
|
||||||
|
- [ ] Memory search page — dedicated search UI across all episodes and entities
|
||||||
|
- [ ] Daily digest — generated summary of recent activity and learned facts
|
||||||
|
|
||||||
|
### Quality of Life
|
||||||
|
- [ ] Keyboard shortcuts — `Ctrl+K` command palette, `Ctrl+Enter` to send
|
||||||
|
- [ ] Dark/light theme toggle
|
||||||
|
- [ ] Mobile layout polish — collapsible sidebar, touch-friendly inputs
|
||||||
|
- [ ] Notification support — browser notifications for long completions
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 4 — Coding Copilot
|
||||||
|
*After core is feature-complete*
|
||||||
|
|
||||||
|
### Project Directory Awareness
|
||||||
|
- [ ] Directory watcher service — monitors a VS Code workspace for changes
|
||||||
|
- [ ] Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
|
||||||
|
- [ ] Diagnostic indexer — compiler errors/warnings per file, triggered on save
|
||||||
|
- [ ] Maps to existing project isolation — coding project = NexusAI project with `indexedDirectory` flag
|
||||||
|
|
||||||
|
### Coding-Specific Memory
|
||||||
|
- [ ] Procedural patterns per language/framework — stored in procedural memory layer
|
||||||
|
- [ ] Skill compilation — successful coding solutions abstracted into reusable patterns
|
||||||
|
- [ ] Codebase semantic search — embed code chunks into Qdrant, search by intent
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Phase 5 — Stretch Goals
|
||||||
|
|
||||||
|
### Voice Layer
|
||||||
|
- [ ] TTS output — text-to-speech for AI responses
|
||||||
|
- [ ] STT input — speech-to-text for voice messages
|
||||||
|
- [ ] Hardware-dependent — deferred until appropriate hardware available
|
||||||
|
- *Architecturally clean addition — new input/output modality only*
|
||||||
|
|
||||||
|
### Homelab Enhancements
|
||||||
|
- [ ] Backup improvements — automated, verified backups of SQLite + Qdrant data
|
||||||
|
- [ ] Security hardening — network segmentation, service-level auth
|
||||||
|
- [ ] IP webcam integration
|
||||||
|
- [ ] Home Assistant integration
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Architecture Reference
|
||||||
|
|
||||||
|
### Services & Nodes
|
||||||
|
|
||||||
|
| Service | Host | Port | Role |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Inference | Main PC `192.168.0.79` | 3001 | llama.cpp provider, `/complete`, `/complete/stream` |
|
||||||
|
| Memory | Mini PC 1 `192.168.0.81` | 3002 | SQLite, episode/entity/summary CRUD |
|
||||||
|
| Embedding | Mini PC 1 `192.168.0.81` | 3003 | nomic-embed-text via Ollama, vector generation |
|
||||||
|
| Qdrant | Mini PC 1 `192.168.0.81` | 6333 | Vector store — episodes, entities, summaries collections |
|
||||||
|
| Orchestration | Hub `192.168.0.205` | 4000 | Chat pipeline, context assembly, session management |
|
||||||
|
| Chat Client | Hub `192.168.0.205` | — | React/Vite, served via Caddy |
|
||||||
|
| Caddy + Authelia | Hub `192.168.0.205` | 443 | Reverse proxy, SSO |
|
||||||
|
|
||||||
|
### Primary Models
|
||||||
|
|
||||||
|
| Role | Model | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | `--reasoning off` flag critical |
|
||||||
|
| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | |
|
||||||
|
| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | |
|
||||||
|
| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization |
|
||||||
|
|
||||||
|
### Key Design Principles
|
||||||
|
- **Layer-by-layer validation** — backend → orchestration → frontend, curl-test each layer
|
||||||
|
- **Fire-and-forget async** — embedding and entity extraction never block the chat response
|
||||||
|
- **All services read settings on every request** — no restart required for config changes
|
||||||
|
- **Backend-first development** — data layer → endpoints → orchestration proxy → frontend
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Last updated: April 2026*
|
||||||
Reference in New Issue
Block a user