# Embedding Service **Package:** `@nexusai/embedding-service` **Location:** `packages/embedding-service` **Deployed on:** Mini PC 1 (192.168.0.81) **Port:** 3003 ## Purpose Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path. ## Dependencies - `express` — HTTP API - `@nexusai/shared` — shared utilities - `dotenv` — environment variable loading > Uses Node.js built-in `fetch` — no additional HTTP client library needed. ## Environment Variables | Variable | Required | Default | Description | |---|---|---|---| | PORT | No | 3003 | Port to listen on | | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL | | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use | > Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections > from other services. ## Model **nomic-embed-text** via Ollama produces **768-dimension** vectors with **Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`. If the embedding model is changed, the Qdrant collections must be reinitialized with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js` is the single change required to keep everything consistent. ## Ollama API Uses the `/api/embed` endpoint (Ollama v0.4+): ```json // Request { "model": "nomic-embed-text", "input": "text to embed" } // Response key embeddings[0] // array of 768 floats ``` > Earlier Ollama versions used `/api/embeddings` with a `prompt` key and > returned `embedding` (singular). Use `/api/embed`, `input`, and > `embeddings[0]` for Ollama v0.4+. ## Usage in NexusAI The embedding service is called in two places: 1. **Memory service** — after each episode is saved to SQLite, the combined `User: ..\nAssistant: ..` text is embedded and upserted into Qdrant. This is fire-and-forget — failures are logged but don't affect the response. 2. **Orchestration service** — the user's message is embedded at the start of the chat pipeline to perform semantic search against past episodes. For all HTTP endpoints, see `api-routes.md`.