Updated embedding service documentation
This commit is contained in:
@@ -7,15 +7,17 @@
|
|||||||
|
|
||||||
## Purpose
|
## Purpose
|
||||||
|
|
||||||
Converts text into vector embeddings for storage in Qdrant. Keeps
|
Converts text into vector embeddings via Ollama for storage in Qdrant.
|
||||||
embedding workload off the main inference node.
|
Keeps embedding workload co-located with the memory service on Mini PC 1,
|
||||||
|
minimizing network hops on the memory write path.
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
- `express` — HTTP API
|
- `express` — HTTP API
|
||||||
- `ollama` — Ollama client for embedding model
|
|
||||||
- `dotenv` — environment variable loading
|
|
||||||
- `@nexusai/shared` — shared utilities
|
- `@nexusai/shared` — shared utilities
|
||||||
|
- `dotenv` — environment variable loading
|
||||||
|
|
||||||
|
> Uses Node.js built-in `fetch` — no additional HTTP client library needed.
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
@@ -25,10 +27,80 @@ embedding workload off the main inference node.
|
|||||||
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
||||||
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
||||||
|
|
||||||
|
## Model
|
||||||
|
|
||||||
|
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
|
||||||
|
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
|
||||||
|
|
||||||
|
If the embedding model is changed, the Qdrant collections must be reinitialized
|
||||||
|
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
|
||||||
|
the single change required to keep everything consistent.
|
||||||
|
|
||||||
|
## Ollama API
|
||||||
|
|
||||||
|
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
|
||||||
|
```json
|
||||||
|
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||||
|
```
|
||||||
|
Response key is `embeddings[0]` — an array of 768 floats.
|
||||||
|
|
||||||
## Endpoints
|
## Endpoints
|
||||||
|
|
||||||
|
### Health
|
||||||
|
|
||||||
| Method | Path | Description |
|
| Method | Path | Description |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
| GET | /health | Service health check |
|
| GET | /health | Service health check |
|
||||||
|
|
||||||
> Further endpoints will be documented as the service is built out.
|
### Embed
|
||||||
|
|
||||||
|
| Method | Path | Description |
|
||||||
|
|---|---|---|
|
||||||
|
| POST | /embed | Embed a single text string |
|
||||||
|
| POST | /embed/batch | Embed an array of text strings |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**POST /embed**
|
||||||
|
|
||||||
|
Embeds a single text string and returns the vector.
|
||||||
|
|
||||||
|
Request body:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"text": "Hello from NexusAI"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"embedding": [0.123, -0.456, ...],
|
||||||
|
"model": "nomic-embed-text",
|
||||||
|
"dimensions": 768
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**POST /embed/batch**
|
||||||
|
|
||||||
|
Embeds an array of strings sequentially and returns all vectors in the same order.
|
||||||
|
Ollama does not natively parallelize embeddings, so requests are processed one at a time.
|
||||||
|
|
||||||
|
Request body:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"texts": ["first sentence", "second sentence"]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Response:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"embeddings": [[0.123, ...], [0.456, ...]],
|
||||||
|
"model": "nomic-embed-text",
|
||||||
|
"dimensions": 768,
|
||||||
|
"count": 2
|
||||||
|
}
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user