2.2 KiB
Embedding Service
Package: @nexusai/embedding-service
Location: packages/embedding-service
Deployed on: Mini PC 1 (192.168.0.81)
Port: 3003
Purpose
Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path.
Dependencies
express— HTTP API@nexusai/shared— shared utilitiesdotenv— environment variable loading
Uses Node.js built-in
fetch— no additional HTTP client library needed.
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3003 | Port to listen on |
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
Ollama must be running with
OLLAMA_HOST=0.0.0.0to accept LAN connections from other services.
Model
nomic-embed-text via Ollama produces 768-dimension vectors with
Cosine similarity. This must match QDRANT.VECTOR_SIZE in @nexusai/shared.
If the embedding model is changed, the Qdrant collections must be reinitialized
with the new vector dimension. Updating QDRANT.VECTOR_SIZE in constants.js
is the single change required to keep everything consistent.
Ollama API
Uses the /api/embed endpoint (Ollama v0.4+):
// Request
{ "model": "nomic-embed-text", "input": "text to embed" }
// Response key
embeddings[0] // array of 768 floats
Earlier Ollama versions used
/api/embeddingswith apromptkey and returnedembedding(singular). Use/api/embed,input, andembeddings[0]for Ollama v0.4+.
Usage in NexusAI
The embedding service is called in two places:
-
Memory service — after each episode is saved to SQLite, the combined
User: ..\nAssistant: ..text is embedded and upserted into Qdrant. This is fire-and-forget — failures are logged but don't affect the response. -
Orchestration service — the user's message is embedded at the start of the chat pipeline to perform semantic search against past episodes.
For all HTTP endpoints, see api-routes.md.