update documentation

2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions
--- a/docs/deployment/homelab.md
+++ b/docs/deployment/homelab.md
@@ -7,50 +7,73 @@ services appropriate for its hardware.

 ## Mini PC 1 — 192.168.0.81

-Runs: Qdrant, Memory Service, Embedding Service
+Runs: Qdrant, Memory Service, Embedding Service, Ollama
+
 ```bash
-ssh username@192.168.0.81
-cd ~/nexusai
+ssh storme@192.168.0.81
 docker compose -f docker-compose.mini1.yml up -d  # Qdrant
-npm run memory
-npm run embedding
+npm run memory      # port 3002
+npm run embedding   # port 3003
+ollama serve        # port 11434 — must bind 0.0.0.0 (OLLAMA_HOST=0.0.0.0)
 ```

+> Ollama must be started with `OLLAMA_HOST=0.0.0.0` to accept connections
+> from other services on the LAN. Without this, embedding requests from the
+> memory service will be refused.
+
 ## Mini PC 2 — 192.168.0.205

-Runs: Gitea, Orchestration Service, Chat Client (via Caddy)
-```bash
-ssh username@192.168.0.205
+Runs: Orchestration Service, Chat Client (via Caddy), Gitea, Caddy, Authelia

-cd ~/gitea
-docker compose up -d        # Gitea
+```bash
+ssh storme@192.168.0.205

 cd /opt/stacks/network
 docker compose up -d        # Caddy, Authelia, and other network services

-cd ~/nexusai
-npm run orchestration
+cd ~/nexusAI
+npm run orchestration       # port 4000
 ```

-## Main PC
+## Main PC — 192.168.0.79

-Runs: Ollama, Inference Service
-```bash
-ollama serve
-npm run inference
+Runs: Inference Service, llama-server
+
+```powershell
+# Start llama-server first — inference service depends on it
+.\llama-gpu\llama-server.exe `
+  -m .\models\gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf `
+  -ngl 99 --reasoning off --host 0.0.0.0 --port 8080 -c 64000
+
+# Then start inference service
+npm run inference            # port 3001
 ```

 ## Chat Client Deployment

-The chat client is a React + Vite app build to static files and served by Caddy on Mini PC 2 (Infrastructure node).  It does not run as a Node process
+The chat client is a React + Vite app built to static files and served by
+Caddy on Mini PC 2. It does not run as a Node process.
+
 ```bash
-# On dev machine or Mini PC 2 after git pull
+# On Mini PC 2 after git pull
 cd ~/nexusAI/packages/chat-client
-npm run build
+
+# Set production URL before building
+VITE_ORCHESTRATION_URL=https://nexus.jellystorm.com npm run build
+
 # Output lands in packages/chat-client/dist/
-# Caddy serves this directory directly via volume mount
+# Caddy serves this directory directly via Docker volume mount
 ```
-Caddy config (`/opt/docker/caddy/Caddyfile`):
+
+> Do NOT set `VITE_ORCHESTRATION_URL` during local dev — Vite's proxy handles
+> routing and setting the HTTPS domain will cause Authelia to intercept API
+> requests, producing confusing JSON parse errors.
+
+## Caddy Configuration
+
+The Caddyfile on Mini PC 2 must include a handle block for each route prefix
+the client needs to reach. Current required blocks for NexusAI:
+
 ```caddy
 nexus.jellystorm.com {
    import authelia
@@ -63,6 +86,14 @@ nexus.jellystorm.com {
        reverse_proxy 192.168.0.205:4000
    }

+    handle /models* {
+        reverse_proxy 192.168.0.205:4000
+    }
+
+    handle /projects* {
+        reverse_proxy 192.168.0.205:4000
+    }
+
    handle {
        root * /srv/nexusai
        try_files {path} /index.html
@@ -71,18 +102,45 @@ nexus.jellystorm.com {
 }
 ```

-The Caddy container mounts the dist directory via Docker volume:
+When adding new top-level routes to the orchestration service, add a matching
+handle block here and reload Caddy:
+
+```bash
+caddy reload --config /path/to/Caddyfile
+```
+
+The Caddy container mounts the `dist` directory via Docker volume:
+
 ```yaml
 - /home/storme/nexusAI/packages/chat-client/dist:/srv/nexusai
 ```

 > After adding or changing volume mounts, a full `docker compose down caddy && docker compose up -d caddy`
-> is required. Caddyfile-only changes only need `docker compose restart caddy`.
-
-
+> is required. Caddyfile-only changes only need `caddy reload`.

 ## Environment Files

-Each node needs a `.env` file in the relevant service package directory.
-These are not committed to git. See each service's documentation for
-required variables.
+Each service needs a `.env` file in its package directory. These are not
+committed to git. See each service's documentation for required variables.
+
+| Service | Location | Key Variables |
+|---|---|---|
+| Memory | `packages/memory-service/.env` | `SQLITE_PATH`, `QDRANT_URL`, `EMBEDDING_SERVICE_URL` |
+| Embedding | `packages/embedding-service/.env` | `OLLAMA_URL`, `EMBEDDING_MODEL` |
+| Inference | `packages/inference-service/.env` | `INFERENCE_PROVIDER`, `INFERENCE_URL`, `DEFAULT_MODEL` |
+| Orchestration | `packages/orchestration-service/src/.env` | `MEMORY_SERVICE_URL`, `EMBEDDING_SERVICE_URL`, `INFERENCE_SERVICE_URL`, `QDRANT_URL`, `MODELS_MANIFEST_PATH` |
+| Chat client | `packages/chat-client/.env` | `VITE_ORCHESTRATION_URL` (production builds only) |
+
+## Models Manifest
+
+The models manifest (`models.json`) lives on the Main PC alongside the model
+files, accessible to orchestration via an SMB mount at `/mnt/nexus-models`.
+
+```json
+[
+  { "value": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf", "label": "Gemma 4 26B Claude Distill" }
+]
+```
+
+`value` must exactly match the model name as reported by `llama-server`
+(including `.gguf` extension). No service restart needed to pick up changes.