Files
nexusAI/docs/services/embedding-service.md
2026-04-17 03:46:17 -07:00

2.2 KiB

Embedding Service

Package: @nexusai/embedding-service
Location: packages/embedding-service
Deployed on: Mini PC 1 (192.168.0.81)
Port: 3003

Purpose

Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path.

Dependencies

  • express — HTTP API
  • @nexusai/shared — shared utilities
  • dotenv — environment variable loading

Uses Node.js built-in fetch — no additional HTTP client library needed.

Environment Variables

Variable Required Default Description
PORT No 3003 Port to listen on
OLLAMA_URL No http://localhost:11434 Ollama instance URL
EMBEDDING_MODEL No nomic-embed-text Ollama embedding model to use

Ollama must be running with OLLAMA_HOST=0.0.0.0 to accept LAN connections from other services.

Model

nomic-embed-text via Ollama produces 768-dimension vectors with Cosine similarity. This must match QDRANT.VECTOR_SIZE in @nexusai/shared.

If the embedding model is changed, the Qdrant collections must be reinitialized with the new vector dimension. Updating QDRANT.VECTOR_SIZE in constants.js is the single change required to keep everything consistent.

Ollama API

Uses the /api/embed endpoint (Ollama v0.4+):

// Request
{ "model": "nomic-embed-text", "input": "text to embed" }

// Response key
embeddings[0]  // array of 768 floats

Earlier Ollama versions used /api/embeddings with a prompt key and returned embedding (singular). Use /api/embed, input, and embeddings[0] for Ollama v0.4+.

Usage in NexusAI

The embedding service is called in two places:

  1. Memory service — after each episode is saved to SQLite, the combined User: ..\nAssistant: .. text is embedded and upserted into Qdrant. This is fire-and-forget — failures are logged but don't affect the response.

  2. Orchestration service — the user's message is embedded at the start of the chat pipeline to perform semantic search against past episodes.

For all HTTP endpoints, see api-routes.md.