Files

Storme-bit 84f01ef209 NexusAI roadmap addition

2026-04-26 21:14:04 -07:00

11 KiB

Raw Blame History

NexusAI — Master Roadmap

A modular, memory-centric AI assistant and personal second brain.
Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
Repo: https://gitea.jellystorm.com/storme/nexusAI

Current State (Completed)

Backend — Core Four Services

✅ Shared package — getEnv, constants (QDRANT, COLLECTIONS, EPISODIC, SERVICES)
✅ Memory service (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
✅ Embedding service (port 3003, Mini PC 1) — nomic-embed-text via Ollama, 768-dim vectors, /embed and /embed/batch
✅ Inference service (port 3001, Main PC) — provider pattern (INFERENCE_PROVIDER), llama.cpp provider, /complete and /complete/stream (SSE)
✅ Orchestration service (port 4000, Mini PC 2) — /chat and /chat/stream, session auto-create, dual-layer context assembly (recency + semantic), episode write-back

Memory System

✅ Episodic memory — full conversation history in SQLite
✅ Semantic memory — Qdrant vector search across episodes and entities
✅ Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
✅ Automatic summarization — triggered at context threshold, cumulative summary updates
✅ Project memory isolation — project sessions fully isolated from each other and from non-project sessions

Chat Client

✅ React/Vite frontend served via Caddy
✅ Sidebar navigation — recent chats, projects, settings
✅ Project management — CRUD, colour coding, isolated flag, ProjectView
✅ Session management — auto-naming, project assignment, SessionModal
✅ Streaming chat interface — SSE token-by-token rendering
✅ Memory viewer — episode browsing, deletion, health panel
✅ Settings panel — models section, configuration

Infrastructure

✅ Caddy reverse proxy with Authelia SSO
✅ Prometheus + Grafana monitoring (VRAM, CPU, RAM)
✅ npm workspaces monorepo
✅ Gitea self-hosted repo

Phase 1 — Loose Ends & Stability

Target: Next development session (Saturday)

Bug Fixes

Entity extraction JSON parsing — robustify response parser in extraction.js to handle model returning markdown fences or preamble around JSON
Qdrant entity search empty results — verify entities embedded post-isolation-fix are surfacing correctly in project session searches

Tech Debt

Logging — introduce LOG_LEVEL env var across all services; reduce noise in production
Error response consistency — audit all endpoints for uniform { error, detail } shape
Constants audit — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
Orchestration chat/index.js review — extract any logic that has grown beyond its intended scope into dedicated modules

Phase 2 — Memory System Upgrades

The core intelligence layer

1. Knowledge Graph (SQLite)

The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."

Graph schema — nodes and edges tables with typed relationships
Entity → node promotion pipeline
Relationship traversal queries
Graph-aware context assembly in orchestration

2. Retrieval Fusion + Full-Text Search

Multi-strategy retrieval merged into a single ranked result set.

Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
Configurable weights per retrieval strategy
Score threshold tuning per collection

3. Memory Consolidation Lifecycle

Prevents long-term memory degradation and enables compression.

Episode aging — score/weight episodes by recency and access frequency
Consolidation pass — merge related low-weight episodes into summary nodes
Orphan cleanup — remove entities no longer referenced by active episodes

4. User Preference Model

Automatically maintained profile injected into every system prompt.

Preference schema — communication style, interests, known facts, tone preferences
Auto-update from conversation history
Manual override / review UI

5. Confidence-Based Routing (inspired by acid2lake)

Short-circuit simple requests before they reach the LLM.

Intent classifier in orchestration — categorise incoming messages
Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
Fast-path handlers — direct memory queries, session lookups, factual recalls

6. Smarter Context Assembly (inspired by acid2lake)

Budget-aware context selection instead of dumping all relevant memory into the prompt.

Token budget manager in orchestration
Priority scoring — recency × relevance × entity weight
Configurable context budget via env var

7. Procedural Memory Store (inspired by acid2lake)

Learns "how NexusAI has successfully handled this type of request before."

Procedural memory schema — trigger pattern, steps, success count, confidence
Auto-population from successful interaction traces
Procedural context injection for matched request types

8. Reflection / Self-Summarization

NexusAI periodically reviews and synthesises its own memory.

Scheduled reflection pass — background job, configurable interval
Cross-session insight extraction
Summary nodes written back to knowledge graph
Requires: Knowledge graph + consolidation lifecycle

9. Proactive Agent Loop

The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.

Tool calling framework in orchestration
Built-in tools — memory search, entity lookup, summarize, web fetch
Reasoning loop — think → act → observe → respond
Agent mode toggle per session
Requires: All Phase 2 items above

Phase 3 — Client Features

Making the daily driver experience excellent

Core Chat Enhancements

Message regeneration — re-roll last AI response
Edit & resend — edit a previous message, clear subsequent history
Copy message button — hover icon per message
Message timestamps — subtle, toggleable
Token count display — per-response usage indicator

Memory Visibility

"What I remember" panel — show which episodes/entities were injected into context
Memory pinning — mark episodes as always-include
✅ Session summary view — on-demand or auto-generated session summary
Memory attribution — subtle indicator on responses that were memory-informed

Session & Project Management

Session search — full-text search across all sessions
Session tagging — freeform tags beyond project assignment
Session export — download as markdown or JSON
Pinned sessions — pin frequently used sessions to sidebar top
Bulk session actions — delete, move to project

Model & Persona Controls (high priority — circles back to companion origins)

Per-session model switching — override default model per session
✅ System prompt editor — per-project custom prompts
System prompt editor — per-session custom prompts
Persona profiles — saved configurations (model + system prompt + temperature)
- Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
Temperature / parameter sliders — collapsible panel for power users

Second Brain Features

Quick capture — minimal input to save a thought directly to memory without starting a chat
Knowledge graph visualiser — interactive node/edge view of entities and relationships
Memory search page — dedicated search UI across all episodes and entities
Daily digest — generated summary of recent activity and learned facts

Quality of Life

Keyboard shortcuts — Ctrl+K command palette, Ctrl+Enter to send
Dark/light theme toggle
Mobile layout polish — collapsible sidebar, touch-friendly inputs
Notification support — browser notifications for long completions

Phase 4 — Coding Copilot

After core is feature-complete

Project Directory Awareness

Directory watcher service — monitors a VS Code workspace for changes
Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
Diagnostic indexer — compiler errors/warnings per file, triggered on save
Maps to existing project isolation — coding project = NexusAI project with indexedDirectory flag

Coding-Specific Memory

Procedural patterns per language/framework — stored in procedural memory layer
Skill compilation — successful coding solutions abstracted into reusable patterns
Codebase semantic search — embed code chunks into Qdrant, search by intent

Phase 5 — Stretch Goals

Voice Layer

TTS output — text-to-speech for AI responses
STT input — speech-to-text for voice messages
Hardware-dependent — deferred until appropriate hardware available
Architecturally clean addition — new input/output modality only

Homelab Enhancements

Backup improvements — automated, verified backups of SQLite + Qdrant data
Security hardening — network segmentation, service-level auth
IP webcam integration
Home Assistant integration

Architecture Reference

Services & Nodes

Service	Host	Port	Role
Inference	Main PC `192.168.0.79`	3001	llama.cpp provider, `/complete`, `/complete/stream`
Memory	Mini PC 1 `192.168.0.81`	3002	SQLite, episode/entity/summary CRUD
Embedding	Mini PC 1 `192.168.0.81`	3003	nomic-embed-text via Ollama, vector generation
Qdrant	Mini PC 1 `192.168.0.81`	6333	Vector store — episodes, entities, summaries collections
Orchestration	Hub `192.168.0.205`	4000	Chat pipeline, context assembly, session management
Chat Client	Hub `192.168.0.205`	—	React/Vite, served via Caddy
Caddy + Authelia	Hub `192.168.0.205`	443	Reverse proxy, SSO

Primary Models

Role	Model	Notes
Daily driver	Gemma 4 26B Claude Distill APEX I-Mini	`--reasoning off` flag critical
Creative/worldbuilding	Gemma 4 21B REAP Q5_K_M
Coding	DeepSeek Coder V2 Lite Instruct Q6_K
Background tasks	qwen2.5:3b via Ollama	Entity extraction, summarization

Key Design Principles

Layer-by-layer validation — backend → orchestration → frontend, curl-test each layer
Fire-and-forget async — embedding and entity extraction never block the chat response
All services read settings on every request — no restart required for config changes
Backend-first development — data layer → endpoints → orchestration proxy → frontend

Last updated: April 2026

11 KiB Raw Blame History Unescape Escape