Updated embedding service documentation

2026-04-04 21:45:29 -07:00
parent f9ed2f6609
commit cb05beaed1
1 changed files with 77 additions and 5 deletions
--- a/docs/services/embedding-service.md
+++ b/docs/services/embedding-service.md
@@ -7,15 +7,17 @@

 ## Purpose

-Converts text into vector embeddings for storage in Qdrant. Keeps
-embedding workload off the main inference node.
+Converts text into vector embeddings via Ollama for storage in Qdrant.
+Keeps embedding workload co-located with the memory service on Mini PC 1,
+minimizing network hops on the memory write path.

 ## Dependencies

 - `express` — HTTP API
- `ollama` — Ollama client for embedding model
- `dotenv` — environment variable loading
 - `@nexusai/shared` — shared utilities
+- `dotenv` — environment variable loading
+
+> Uses Node.js built-in `fetch` — no additional HTTP client library needed.

 ## Environment Variables

@@ -25,10 +27,80 @@ embedding workload off the main inference node.
 | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
 | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |

+## Model
+
+**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
+This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
+
+If the embedding model is changed, the Qdrant collections must be reinitialized
+with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
+the single change required to keep everything consistent.
+
+## Ollama API
+
+Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
+```json
+{ "model": "nomic-embed-text", "input": "text to embed" }
+```
+Response key is `embeddings[0]` — an array of 768 floats.
+
 ## Endpoints

+### Health
+
 | Method | Path | Description |
 |---|---|---|
 | GET | /health | Service health check |

-> Further endpoints will be documented as the service is built out.
+### Embed
+
+| Method | Path | Description |
+|---|---|---|
+| POST | /embed | Embed a single text string |
+| POST | /embed/batch | Embed an array of text strings |
+
+---
+
+**POST /embed**
+
+Embeds a single text string and returns the vector.
+
+Request body:
+```json
+{
+  "text": "Hello from NexusAI"
+}
+```
+
+Response:
+```json
+{
+  "embedding": [0.123, -0.456, ...],
+  "model": "nomic-embed-text",
+  "dimensions": 768
+}
+```
+
+---
+
+**POST /embed/batch**
+
+Embeds an array of strings sequentially and returns all vectors in the same order.
+Ollama does not natively parallelize embeddings, so requests are processed one at a time.
+
+Request body:
+```json
+{
+  "texts": ["first sentence", "second sentence"]
+}
+```
+
+Response:
+```json
+{
+  "embeddings": [[0.123, ...], [0.456, ...]],
+  "model": "nomic-embed-text",
+  "dimensions": 768,
+  "count": 2
+}
+```