11 KiB
NexusAI — Master Roadmap
A modular, memory-centric AI assistant and personal second brain.
Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
Repo:https://gitea.jellystorm.com/storme/nexusAI
Current State (Completed)
Backend — Core Four Services
- ✅ Shared package —
getEnv, constants (QDRANT,COLLECTIONS,EPISODIC,SERVICES) - ✅ Memory service (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
- ✅ Embedding service (port 3003, Mini PC 1) —
nomic-embed-textvia Ollama, 768-dim vectors,/embedand/embed/batch - ✅ Inference service (port 3001, Main PC) — provider pattern (
INFERENCE_PROVIDER), llama.cpp provider,/completeand/complete/stream(SSE) - ✅ Orchestration service (port 4000, Mini PC 2) —
/chatand/chat/stream, session auto-create, dual-layer context assembly (recency + semantic), episode write-back
Memory System
- ✅ Episodic memory — full conversation history in SQLite
- ✅ Semantic memory — Qdrant vector search across episodes and entities
- ✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
- ✅ Automatic summarization — triggered at context threshold, cumulative summary updates
- ✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions
Chat Client
- ✅ React/Vite frontend served via Caddy
- ✅ Sidebar navigation — recent chats, projects, settings
- ✅ Project management — CRUD, colour coding, isolated flag, ProjectView
- ✅ Session management — auto-naming, project assignment, SessionModal
- ✅ Streaming chat interface — SSE token-by-token rendering
- ✅ Memory viewer — episode browsing, deletion, health panel
- ✅ Settings panel — models section, configuration
Infrastructure
- ✅ Caddy reverse proxy with Authelia SSO
- ✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
- ✅ npm workspaces monorepo
- ✅ Gitea self-hosted repo
Phase 1 — Loose Ends & Stability
Target: Next development session (Saturday)
Bug Fixes
- Entity extraction JSON parsing — robustify response parser in
extraction.jsto handle model returning markdown fences or preamble around JSON - Qdrant entity search empty results — verify entities embedded post-isolation-fix are surfacing correctly in project session searches
Tech Debt
- Logging — introduce
LOG_LEVELenv var across all services; reduce noise in production - Error response consistency — audit all endpoints for uniform
{ error, detail }shape - Constants audit — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
- Orchestration
chat/index.jsreview — extract any logic that has grown beyond its intended scope into dedicated modules
Phase 2 — Memory System Upgrades
The core intelligence layer
1. Knowledge Graph (SQLite)
The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."
- Graph schema —
nodesandedgestables with typed relationships - Entity → node promotion pipeline
- Relationship traversal queries
- Graph-aware context assembly in orchestration
2. Retrieval Fusion + Full-Text Search
Multi-strategy retrieval merged into a single ranked result set.
- Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
- Configurable weights per retrieval strategy
- Score threshold tuning per collection
3. Memory Consolidation Lifecycle
Prevents long-term memory degradation and enables compression.
- Episode aging — score/weight episodes by recency and access frequency
- Consolidation pass — merge related low-weight episodes into summary nodes
- Orphan cleanup — remove entities no longer referenced by active episodes
4. User Preference Model
Automatically maintained profile injected into every system prompt.
- Preference schema — communication style, interests, known facts, tone preferences
- Auto-update from conversation history
- Manual override / review UI
5. Confidence-Based Routing (inspired by acid2lake)
Short-circuit simple requests before they reach the LLM.
- Intent classifier in orchestration — categorise incoming messages
- Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
- Fast-path handlers — direct memory queries, session lookups, factual recalls
6. Smarter Context Assembly (inspired by acid2lake)
Budget-aware context selection instead of dumping all relevant memory into the prompt.
- Token budget manager in orchestration
- Priority scoring — recency × relevance × entity weight
- Configurable context budget via env var
7. Procedural Memory Store (inspired by acid2lake)
Learns "how NexusAI has successfully handled this type of request before."
- Procedural memory schema — trigger pattern, steps, success count, confidence
- Auto-population from successful interaction traces
- Procedural context injection for matched request types
8. Reflection / Self-Summarization
NexusAI periodically reviews and synthesises its own memory.
- Scheduled reflection pass — background job, configurable interval
- Cross-session insight extraction
- Summary nodes written back to knowledge graph
- Requires: Knowledge graph + consolidation lifecycle
9. Proactive Agent Loop
The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.
- Tool calling framework in orchestration
- Built-in tools — memory search, entity lookup, summarize, web fetch
- Reasoning loop — think → act → observe → respond
- Agent mode toggle per session
- Requires: All Phase 2 items above
Phase 3 — Client Features
Making the daily driver experience excellent
Core Chat Enhancements
- Message regeneration — re-roll last AI response
- Edit & resend — edit a previous message, clear subsequent history
- Copy message button — hover icon per message
- Message timestamps — subtle, toggleable
- Token count display — per-response usage indicator
Memory Visibility
- "What I remember" panel — show which episodes/entities were injected into context
- Memory pinning — mark episodes as always-include
- Session summary view — on-demand or auto-generated session summary
- Memory attribution — subtle indicator on responses that were memory-informed
Session & Project Management
- Session search — full-text search across all sessions
- Session tagging — freeform tags beyond project assignment
- Session export — download as markdown or JSON
- Pinned sessions — pin frequently used sessions to sidebar top
- Bulk session actions — delete, move to project
Model & Persona Controls (high priority — circles back to companion origins)
- Per-session model switching — override default model per session
- System prompt editor — per-session and per-project custom prompts
- Persona profiles — saved configurations (model + system prompt + temperature)
- Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
- Temperature / parameter sliders — collapsible panel for power users
Second Brain Features
- Quick capture — minimal input to save a thought directly to memory without starting a chat
- Knowledge graph visualiser — interactive node/edge view of entities and relationships
- Memory search page — dedicated search UI across all episodes and entities
- Daily digest — generated summary of recent activity and learned facts
Quality of Life
- Keyboard shortcuts —
Ctrl+Kcommand palette,Ctrl+Enterto send - Dark/light theme toggle
- Mobile layout polish — collapsible sidebar, touch-friendly inputs
- Notification support — browser notifications for long completions
Phase 4 — Coding Copilot
After core is feature-complete
Project Directory Awareness
- Directory watcher service — monitors a VS Code workspace for changes
- Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
- Diagnostic indexer — compiler errors/warnings per file, triggered on save
- Maps to existing project isolation — coding project = NexusAI project with
indexedDirectoryflag
Coding-Specific Memory
- Procedural patterns per language/framework — stored in procedural memory layer
- Skill compilation — successful coding solutions abstracted into reusable patterns
- Codebase semantic search — embed code chunks into Qdrant, search by intent
Phase 5 — Stretch Goals
Voice Layer
- TTS output — text-to-speech for AI responses
- STT input — speech-to-text for voice messages
- Hardware-dependent — deferred until appropriate hardware available
- Architecturally clean addition — new input/output modality only
Homelab Enhancements
- Backup improvements — automated, verified backups of SQLite + Qdrant data
- Security hardening — network segmentation, service-level auth
- IP webcam integration
- Home Assistant integration
Architecture Reference
Services & Nodes
| Service | Host | Port | Role |
|---|---|---|---|
| Inference | Main PC 192.168.0.79 |
3001 | llama.cpp provider, /complete, /complete/stream |
| Memory | Mini PC 1 192.168.0.81 |
3002 | SQLite, episode/entity/summary CRUD |
| Embedding | Mini PC 1 192.168.0.81 |
3003 | nomic-embed-text via Ollama, vector generation |
| Qdrant | Mini PC 1 192.168.0.81 |
6333 | Vector store — episodes, entities, summaries collections |
| Orchestration | Hub 192.168.0.205 |
4000 | Chat pipeline, context assembly, session management |
| Chat Client | Hub 192.168.0.205 |
— | React/Vite, served via Caddy |
| Caddy + Authelia | Hub 192.168.0.205 |
443 | Reverse proxy, SSO |
Primary Models
| Role | Model | Notes |
|---|---|---|
| Daily driver | Gemma 4 26B Claude Distill APEX I-Mini | --reasoning off flag critical |
| Creative/worldbuilding | Gemma 4 21B REAP Q5_K_M | |
| Coding | DeepSeek Coder V2 Lite Instruct Q6_K | |
| Background tasks | qwen2.5:3b via Ollama | Entity extraction, summarization |
Key Design Principles
- Layer-by-layer validation — backend → orchestration → frontend, curl-test each layer
- Fire-and-forget async — embedding and entity extraction never block the chat response
- All services read settings on every request — no restart required for config changes
- Backend-first development — data layer → endpoints → orchestration proxy → frontend
Last updated: April 2026