diff --git a/docs/services/embedding-service.md b/docs/services/embedding-service.md index cc9819c..1a4870a 100644 --- a/docs/services/embedding-service.md +++ b/docs/services/embedding-service.md @@ -7,15 +7,17 @@ ## Purpose -Converts text into vector embeddings for storage in Qdrant. Keeps -embedding workload off the main inference node. +Converts text into vector embeddings via Ollama for storage in Qdrant. +Keeps embedding workload co-located with the memory service on Mini PC 1, +minimizing network hops on the memory write path. ## Dependencies - `express` — HTTP API -- `ollama` — Ollama client for embedding model -- `dotenv` — environment variable loading - `@nexusai/shared` — shared utilities +- `dotenv` — environment variable loading + +> Uses Node.js built-in `fetch` — no additional HTTP client library needed. ## Environment Variables @@ -25,10 +27,80 @@ embedding workload off the main inference node. | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL | | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use | +## Model + +**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**. +This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`. + +If the embedding model is changed, the Qdrant collections must be reinitialized +with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is +the single change required to keep everything consistent. + +## Ollama API + +Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape: +```json +{ "model": "nomic-embed-text", "input": "text to embed" } +``` +Response key is `embeddings[0]` — an array of 768 floats. + ## Endpoints +### Health + | Method | Path | Description | |---|---|---| | GET | /health | Service health check | -> Further endpoints will be documented as the service is built out. \ No newline at end of file +### Embed + +| Method | Path | Description | +|---|---|---| +| POST | /embed | Embed a single text string | +| POST | /embed/batch | Embed an array of text strings | + +--- + +**POST /embed** + +Embeds a single text string and returns the vector. + +Request body: +```json +{ + "text": "Hello from NexusAI" +} +``` + +Response: +```json +{ + "embedding": [0.123, -0.456, ...], + "model": "nomic-embed-text", + "dimensions": 768 +} +``` + +--- + +**POST /embed/batch** + +Embeds an array of strings sequentially and returns all vectors in the same order. +Ollama does not natively parallelize embeddings, so requests are processed one at a time. + +Request body: +```json +{ + "texts": ["first sentence", "second sentence"] +} +``` + +Response: +```json +{ + "embeddings": [[0.123, ...], [0.456, ...]], + "model": "nomic-embed-text", + "dimensions": 768, + "count": 2 +} +``` \ No newline at end of file