# Embedding Service **Package:** `@nexusai/embedding-service` **Location:** `packages/embedding-service` **Deployed on:** Mini PC 1 (192.168.0.81) **Port:** 3003 ## Purpose Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path. ## Dependencies - `express` — HTTP API - `@nexusai/shared` — shared utilities - `dotenv` — environment variable loading > Uses Node.js built-in `fetch` — no additional HTTP client library needed. ## Environment Variables | Variable | Required | Default | Description | |---|---|---|---| | PORT | No | 3003 | Port to listen on | | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL | | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use | ## Model **nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**. This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`. If the embedding model is changed, the Qdrant collections must be reinitialized with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is the single change required to keep everything consistent. ## Ollama API Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape: ```json { "model": "nomic-embed-text", "input": "text to embed" } ``` Response key is `embeddings[0]` — an array of 768 floats. ## Endpoints ### Health | Method | Path | Description | |---|---|---| | GET | /health | Service health check | ### Embed | Method | Path | Description | |---|---|---| | POST | /embed | Embed a single text string | | POST | /embed/batch | Embed an array of text strings | --- **POST /embed** Embeds a single text string and returns the vector. Request body: ```json { "text": "Hello from NexusAI" } ``` Response: ```json { "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 } ``` --- **POST /embed/batch** Embeds an array of strings sequentially and returns all vectors in the same order. Ollama does not natively parallelize embeddings, so requests are processed one at a time. Request body: ```json { "texts": ["first sentence", "second sentence"] } ``` Response: ```json { "embeddings": [[0.123, ...], [0.456, ...]], "model": "nomic-embed-text", "dimensions": 768, "count": 2 } ```