Files
nexusAI/docs/roadmap.md
2026-04-26 21:14:04 -07:00

11 KiB
Raw Blame History

NexusAI — Master Roadmap

A modular, memory-centric AI assistant and personal second brain.
Built on Node.js, React/Vite, SQLite, Qdrant, and llama.cpp.
Repo: https://gitea.jellystorm.com/storme/nexusAI


Current State (Completed)

Backend — Core Four Services

  • Shared packagegetEnv, constants (QDRANT, COLLECTIONS, EPISODIC, SERVICES)
  • Memory service (port 3002, Mini PC 1) — SQLite schema (sessions, episodes, entities, relationships, summaries), FTS5 search, full CRUD endpoints, Qdrant semantic layer (3 collections), embedding write path
  • Embedding service (port 3003, Mini PC 1) — nomic-embed-text via Ollama, 768-dim vectors, /embed and /embed/batch
  • Inference service (port 3001, Main PC) — provider pattern (INFERENCE_PROVIDER), llama.cpp provider, /complete and /complete/stream (SSE)
  • Orchestration service (port 4000, Mini PC 2) — /chat and /chat/stream, session auto-create, dual-layer context assembly (recency + semantic), episode write-back

Memory System

  • Episodic memory — full conversation history in SQLite
  • Semantic memory — Qdrant vector search across episodes and entities
  • Entity extraction — background inference pass after each episode (qwen2.5:3b via Ollama)
  • Automatic summarization — triggered at context threshold, cumulative summary updates
  • Project memory isolation — project sessions fully isolated from each other and from non-project sessions

Chat Client

  • React/Vite frontend served via Caddy
  • Sidebar navigation — recent chats, projects, settings
  • Project management — CRUD, colour coding, isolated flag, ProjectView
  • Session management — auto-naming, project assignment, SessionModal
  • Streaming chat interface — SSE token-by-token rendering
  • Memory viewer — episode browsing, deletion, health panel
  • Settings panel — models section, configuration

Infrastructure

  • Caddy reverse proxy with Authelia SSO
  • Prometheus + Grafana monitoring (VRAM, CPU, RAM)
  • npm workspaces monorepo
  • Gitea self-hosted repo

Phase 1 — Loose Ends & Stability

Target: Next development session (Saturday)

Bug Fixes

  • Entity extraction JSON parsing — robustify response parser in extraction.js to handle model returning markdown fences or preamble around JSON
  • Qdrant entity search empty results — verify entities embedded post-isolation-fix are surfacing correctly in project session searches

Tech Debt

  • Logging — introduce LOG_LEVEL env var across all services; reduce noise in production
  • Error response consistency — audit all endpoints for uniform { error, detail } shape
  • Constants audit — move any remaining inline magic numbers (limits, thresholds, timeouts) to shared config
  • Orchestration chat/index.js review — extract any logic that has grown beyond its intended scope into dedicated modules

Phase 2 — Memory System Upgrades

The core intelligence layer

1. Knowledge Graph (SQLite)

The highest-leverage memory upgrade. Transforms NexusAI from "remembers conversations" to "understands relationships between things."

  • Graph schema — nodes and edges tables with typed relationships
  • Entity → node promotion pipeline
  • Relationship traversal queries
  • Graph-aware context assembly in orchestration

Multi-strategy retrieval merged into a single ranked result set.

  • Reciprocal Rank Fusion (RRF) — merge semantic (Qdrant) + keyword (FTS5) results
  • Configurable weights per retrieval strategy
  • Score threshold tuning per collection

3. Memory Consolidation Lifecycle

Prevents long-term memory degradation and enables compression.

  • Episode aging — score/weight episodes by recency and access frequency
  • Consolidation pass — merge related low-weight episodes into summary nodes
  • Orphan cleanup — remove entities no longer referenced by active episodes

4. User Preference Model

Automatically maintained profile injected into every system prompt.

  • Preference schema — communication style, interests, known facts, tone preferences
  • Auto-update from conversation history
  • Manual override / review UI

5. Confidence-Based Routing (inspired by acid2lake)

Short-circuit simple requests before they reach the LLM.

  • Intent classifier in orchestration — categorise incoming messages
  • Confidence bands — FAST PATH (memory lookup only) vs FULL (LLM + context)
  • Fast-path handlers — direct memory queries, session lookups, factual recalls

6. Smarter Context Assembly (inspired by acid2lake)

Budget-aware context selection instead of dumping all relevant memory into the prompt.

  • Token budget manager in orchestration
  • Priority scoring — recency × relevance × entity weight
  • Configurable context budget via env var

7. Procedural Memory Store (inspired by acid2lake)

Learns "how NexusAI has successfully handled this type of request before."

  • Procedural memory schema — trigger pattern, steps, success count, confidence
  • Auto-population from successful interaction traces
  • Procedural context injection for matched request types

8. Reflection / Self-Summarization

NexusAI periodically reviews and synthesises its own memory.

  • Scheduled reflection pass — background job, configurable interval
  • Cross-session insight extraction
  • Summary nodes written back to knowledge graph
  • Requires: Knowledge graph + consolidation lifecycle

9. Proactive Agent Loop

The JARVIS moment — NexusAI reasons, plans, and acts across multiple steps.

  • Tool calling framework in orchestration
  • Built-in tools — memory search, entity lookup, summarize, web fetch
  • Reasoning loop — think → act → observe → respond
  • Agent mode toggle per session
  • Requires: All Phase 2 items above

Phase 3 — Client Features

Making the daily driver experience excellent

Core Chat Enhancements

  • Message regeneration — re-roll last AI response
  • Edit & resend — edit a previous message, clear subsequent history
  • Copy message button — hover icon per message
  • Message timestamps — subtle, toggleable
  • Token count display — per-response usage indicator

Memory Visibility

  • "What I remember" panel — show which episodes/entities were injected into context
  • Memory pinning — mark episodes as always-include
  • Session summary view — on-demand or auto-generated session summary
  • Memory attribution — subtle indicator on responses that were memory-informed

Session & Project Management

  • Session search — full-text search across all sessions
  • Session tagging — freeform tags beyond project assignment
  • Session export — download as markdown or JSON
  • Pinned sessions — pin frequently used sessions to sidebar top
  • Bulk session actions — delete, move to project

Model & Persona Controls (high priority — circles back to companion origins)

  • Per-session model switching — override default model per session
  • System prompt editor — per-project custom prompts
  • System prompt editor — per-session custom prompts
  • Persona profiles — saved configurations (model + system prompt + temperature)
    • Examples: "Daily Driver", "Creative Mode", "Concise Mode", "Coding Mode"
  • Temperature / parameter sliders — collapsible panel for power users

Second Brain Features

  • Quick capture — minimal input to save a thought directly to memory without starting a chat
  • Knowledge graph visualiser — interactive node/edge view of entities and relationships
  • Memory search page — dedicated search UI across all episodes and entities
  • Daily digest — generated summary of recent activity and learned facts

Quality of Life

  • Keyboard shortcuts — Ctrl+K command palette, Ctrl+Enter to send
  • Dark/light theme toggle
  • Mobile layout polish — collapsible sidebar, touch-friendly inputs
  • Notification support — browser notifications for long completions

Phase 4 — Coding Copilot

After core is feature-complete

Project Directory Awareness

  • Directory watcher service — monitors a VS Code workspace for changes
  • Symbol indexer — AST parsing via Tree-sitter, file → symbol map in SQLite
  • Diagnostic indexer — compiler errors/warnings per file, triggered on save
  • Maps to existing project isolation — coding project = NexusAI project with indexedDirectory flag

Coding-Specific Memory

  • Procedural patterns per language/framework — stored in procedural memory layer
  • Skill compilation — successful coding solutions abstracted into reusable patterns
  • Codebase semantic search — embed code chunks into Qdrant, search by intent

Phase 5 — Stretch Goals

Voice Layer

  • TTS output — text-to-speech for AI responses
  • STT input — speech-to-text for voice messages
  • Hardware-dependent — deferred until appropriate hardware available
  • Architecturally clean addition — new input/output modality only

Homelab Enhancements

  • Backup improvements — automated, verified backups of SQLite + Qdrant data
  • Security hardening — network segmentation, service-level auth
  • IP webcam integration
  • Home Assistant integration

Architecture Reference

Services & Nodes

Service Host Port Role
Inference Main PC 192.168.0.79 3001 llama.cpp provider, /complete, /complete/stream
Memory Mini PC 1 192.168.0.81 3002 SQLite, episode/entity/summary CRUD
Embedding Mini PC 1 192.168.0.81 3003 nomic-embed-text via Ollama, vector generation
Qdrant Mini PC 1 192.168.0.81 6333 Vector store — episodes, entities, summaries collections
Orchestration Hub 192.168.0.205 4000 Chat pipeline, context assembly, session management
Chat Client Hub 192.168.0.205 React/Vite, served via Caddy
Caddy + Authelia Hub 192.168.0.205 443 Reverse proxy, SSO

Primary Models

Role Model Notes
Daily driver Gemma 4 26B Claude Distill APEX I-Mini --reasoning off flag critical
Creative/worldbuilding Gemma 4 21B REAP Q5_K_M
Coding DeepSeek Coder V2 Lite Instruct Q6_K
Background tasks qwen2.5:3b via Ollama Entity extraction, summarization

Key Design Principles

  • Layer-by-layer validation — backend → orchestration → frontend, curl-test each layer
  • Fire-and-forget async — embedding and entity extraction never block the chat response
  • All services read settings on every request — no restart required for config changes
  • Backend-first development — data layer → endpoints → orchestration proxy → frontend

Last updated: April 2026