Updated embedding service documentation

This commit is contained in:
Storme-bit
2026-04-04 21:45:29 -07:00
parent f9ed2f6609
commit cb05beaed1

View File

@@ -7,15 +7,17 @@
## Purpose ## Purpose
Converts text into vector embeddings for storage in Qdrant. Keeps Converts text into vector embeddings via Ollama for storage in Qdrant.
embedding workload off the main inference node. Keeps embedding workload co-located with the memory service on Mini PC 1,
minimizing network hops on the memory write path.
## Dependencies ## Dependencies
- `express` — HTTP API - `express` — HTTP API
- `ollama` — Ollama client for embedding model
- `dotenv` — environment variable loading
- `@nexusai/shared` — shared utilities - `@nexusai/shared` — shared utilities
- `dotenv` — environment variable loading
> Uses Node.js built-in `fetch` — no additional HTTP client library needed.
## Environment Variables ## Environment Variables
@@ -25,10 +27,80 @@ embedding workload off the main inference node.
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL | | OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use | | EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
## Model
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
If the embedding model is changed, the Qdrant collections must be reinitialized
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
the single change required to keep everything consistent.
## Ollama API
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
```json
{ "model": "nomic-embed-text", "input": "text to embed" }
```
Response key is `embeddings[0]` — an array of 768 floats.
## Endpoints ## Endpoints
### Health
| Method | Path | Description | | Method | Path | Description |
|---|---|---| |---|---|---|
| GET | /health | Service health check | | GET | /health | Service health check |
> Further endpoints will be documented as the service is built out. ### Embed
| Method | Path | Description |
|---|---|---|
| POST | /embed | Embed a single text string |
| POST | /embed/batch | Embed an array of text strings |
---
**POST /embed**
Embeds a single text string and returns the vector.
Request body:
```json
{
"text": "Hello from NexusAI"
}
```
Response:
```json
{
"embedding": [0.123, -0.456, ...],
"model": "nomic-embed-text",
"dimensions": 768
}
```
---
**POST /embed/batch**
Embeds an array of strings sequentially and returns all vectors in the same order.
Ollama does not natively parallelize embeddings, so requests are processed one at a time.
Request body:
```json
{
"texts": ["first sentence", "second sentence"]
}
```
Response:
```json
{
"embeddings": [[0.123, ...], [0.456, ...]],
"model": "nomic-embed-text",
"dimensions": 768,
"count": 2
}
```