update documentation

2026-04-17 03:46:17 -07:00
parent 27e3c98304
commit 5145b9a7db
13 changed files with 822 additions and 794 deletions
--- a/docs/services/embedding-service.md
+++ b/docs/services/embedding-service.md
@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
 | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
 | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |

+> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
+> from other services.
+
 ## Model

-**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
-This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
+**nomic-embed-text** via Ollama produces **768-dimension** vectors with
+**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.

 If the embedding model is changed, the Qdrant collections must be reinitialized
-with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
-the single change required to keep everything consistent.
+with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
+is the single change required to keep everything consistent.

 ## Ollama API

-Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
+Uses the `/api/embed` endpoint (Ollama v0.4+):
+
 ```json
+// Request
 { "model": "nomic-embed-text", "input": "text to embed" }
-```
-Response key is `embeddings[0]` — an array of 768 floats.

-## Endpoints
-
-### Health
-
-| Method | Path | Description |
-|---|---|---|
-| GET | /health | Service health check |
-
-### Embed
-
-| Method | Path | Description |
-|---|---|---|
-| POST | /embed | Embed a single text string |
-| POST | /embed/batch | Embed an array of text strings |
-
---
-
-**POST /embed**
-
-Embeds a single text string and returns the vector.
-
-Request body:
-```json
-{
-  "text": "Hello from NexusAI"
-}
+// Response key
+embeddings[0]  // array of 768 floats
 ```

-Response:
-```json
-{
-  "embedding": [0.123, -0.456, ...],
-  "model": "nomic-embed-text",
-  "dimensions": 768
-}
-```
+> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
+> returned `embedding` (singular). Use `/api/embed`, `input`, and
+> `embeddings[0]` for Ollama v0.4+.

---
+## Usage in NexusAI

-**POST /embed/batch**
+The embedding service is called in two places:

-Embeds an array of strings sequentially and returns all vectors in the same order.
-Ollama does not natively parallelize embeddings, so requests are processed one at a time.
+1. **Memory service** — after each episode is saved to SQLite, the combined
+   `User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
+   This is fire-and-forget — failures are logged but don't affect the response.

-Request body:
-```json
-{
-  "texts": ["first sentence", "second sentence"]
-}
-```
+2. **Orchestration service** — the user's message is embedded at the start of
+   the chat pipeline to perform semantic search against past episodes.

-Response:
-```json
-{
-  "embeddings": [[0.123, ...], [0.456, ...]],
-  "model": "nomic-embed-text",
-  "dimensions": 768,
-  "count": 2
-}
-```
+For all HTTP endpoints, see `api-routes.md`.