nexusAI/docs/services/embedding-service.md

# Embedding Service

**Package:** `@nexusai/embedding-service`
**Location:** `packages/embedding-service`
**Deployed on:** Mini PC 1 (192.168.0.81)
**Port:** 3003

## Purpose

Converts text into vector embeddings via Ollama for storage in Qdrant.
Keeps embedding workload co-located with the memory service on Mini PC 1,
minimizing network hops on the memory write path.

## Dependencies

- `express` — HTTP API
- `@nexusai/shared` — shared utilities
- `dotenv` — environment variable loading

> Uses Node.js built-in `fetch` — no additional HTTP client library needed.

## Environment Variables

| Variable | Required | Default | Description |
|---|---|---|---|
| PORT | No | 3003 | Port to listen on |
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |

## Model

**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.

If the embedding model is changed, the Qdrant collections must be reinitialized
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
the single change required to keep everything consistent.

## Ollama API

Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
```json
{ "model": "nomic-embed-text", "input": "text to embed" }
```
Response key is `embeddings[0]` — an array of 768 floats.

## Endpoints

### Health

| Method | Path | Description |
|---|---|---|
| GET | /health | Service health check |

### Embed

| Method | Path | Description |
|---|---|---|
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |

---

**POST /embed**

Embeds a single text string and returns the vector.

Request body:
```json
{
  "text": "Hello from NexusAI"
}
```

Response:
```json
{
  "embedding": [0.123, -0.456, ...],
  "model": "nomic-embed-text",
  "dimensions": 768
}
```

---

**POST /embed/batch**

Embeds an array of strings sequentially and returns all vectors in the same order.
Ollama does not natively parallelize embeddings, so requests are processed one at a time.

Request body:
```json
{
  "texts": ["first sentence", "second sentence"]
}
```

Response:
```json
{
  "embeddings": [[0.123, ...], [0.456, ...]],
  "model": "nomic-embed-text",
  "dimensions": 768,
  "count": 2
}
```