Embedding Service

Package: @nexusai/embedding-service
Location: packages/embedding-service
Deployed on: Mini PC 1 (192.168.0.81)
Port: 3003

Purpose

Converts text into vector embeddings via Ollama for storage in Qdrant. Keeps embedding workload co-located with the memory service on Mini PC 1, minimizing network hops on the memory write path.

Dependencies

express — HTTP API
@nexusai/shared — shared utilities
dotenv — environment variable loading

Uses Node.js built-in fetch — no additional HTTP client library needed.

Environment Variables

Variable	Required	Default	Description
PORT	No	3003	Port to listen on
OLLAMA_URL	No	http://localhost:11434	Ollama instance URL
EMBEDDING_MODEL	No	nomic-embed-text	Ollama embedding model to use

Model

nomic-embed-text via Ollama produces 768-dimension vectors using Cosine similarity. This must match the QDRANT.VECTOR_SIZE constant in @nexusai/shared.

If the embedding model is changed, the Qdrant collections must be reinitialized with the new vector dimension — updating QDRANT.VECTOR_SIZE in constants.js is the single change required to keep everything consistent.

Ollama API

Uses the /api/embed endpoint (Ollama v0.4+). Request shape:

{ "model": "nomic-embed-text", "input": "text to embed" }

Response key is embeddings[0] — an array of 768 floats.

Endpoints

Health

Method	Path	Description
GET	/health	Service health check

Embed

Method	Path	Description
POST	/embed	Embed a single text string
POST	/embed/batch	Embed an array of text strings

POST /embed

Embeds a single text string and returns the vector.

Request body:

{
  "text": "Hello from NexusAI"
}

Response:

{
  "embedding": [0.123, -0.456, ...],
  "model": "nomic-embed-text",
  "dimensions": 768
}

POST /embed/batch

Embeds an array of strings sequentially and returns all vectors in the same order. Ollama does not natively parallelize embeddings, so requests are processed one at a time.