Embedding Service

Package: @nexusai/embedding-service
Location: packages/embedding-service
Deployed on: Mini PC 1 (192.168.0.81)
Port: 3003

Purpose

Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path.

Dependencies

express — HTTP API
@nexusai/shared — shared utilities
dotenv — environment variable loading

Uses Node.js built-in fetch — no additional HTTP client library needed.

Environment Variables

Variable	Required	Default	Description
PORT	No	3003	Port to listen on
OLLAMA_URL	No	http://localhost:11434	Ollama instance URL
EMBEDDING_MODEL	No	nomic-embed-text	Ollama embedding model to use

Ollama must be running with OLLAMA_HOST=0.0.0.0 to accept LAN connections from other services.

Model

nomic-embed-text via Ollama produces 768-dimension vectors with Cosine similarity. This must match QDRANT.VECTOR_SIZE in @nexusai/shared.

If the embedding model is changed, the Qdrant collections must be reinitialized with the new vector dimension. Updating QDRANT.VECTOR_SIZE in constants.js is the single change required to keep everything consistent.

Ollama API

Uses the /api/embed endpoint (Ollama v0.4+):

// Request
{ "model": "nomic-embed-text", "input": "text to embed" }

// Response key
embeddings[0]  // array of 768 floats

Earlier Ollama versions used /api/embeddings with a prompt key and returned embedding (singular). Use /api/embed, input, and embeddings[0] for Ollama v0.4+.

Usage in NexusAI

The embedding service is called in two places:

Memory service — after each episode is saved to SQLite, the combined User: ..\nAssistant: .. text is embedded and upserted into Qdrant. This is fire-and-forget — failures are logged but don't affect the response.
Orchestration service — the user's message is embedded at the start of the chat pipeline to perform semantic search against past episodes.

For all HTTP endpoints, see api-routes.md.

2.2 KiB Raw Blame History