69 lines
2.2 KiB
Markdown
69 lines
2.2 KiB
Markdown
# Embedding Service
|
|
|
|
**Package:** `@nexusai/embedding-service`
|
|
**Location:** `packages/embedding-service`
|
|
**Deployed on:** Mini PC 1 (192.168.0.81)
|
|
**Port:** 3003
|
|
|
|
## Purpose
|
|
|
|
Converts text into vector embeddings via Ollama for storage in Qdrant.
|
|
Keeps embedding workload co-located with the memory service on Mini PC 1,
|
|
minimizing network hops on the memory write path.
|
|
|
|
## Dependencies
|
|
|
|
- `express` — HTTP API
|
|
- `@nexusai/shared` — shared utilities
|
|
- `dotenv` — environment variable loading
|
|
|
|
> Uses Node.js built-in `fetch` — no additional HTTP client library needed.
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Required | Default | Description |
|
|
|---|---|---|---|
|
|
| PORT | No | 3003 | Port to listen on |
|
|
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
|
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
|
|
|
> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
|
|
> from other services.
|
|
|
|
## Model
|
|
|
|
**nomic-embed-text** via Ollama produces **768-dimension** vectors with
|
|
**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
|
|
|
|
If the embedding model is changed, the Qdrant collections must be reinitialized
|
|
with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
|
|
is the single change required to keep everything consistent.
|
|
|
|
## Ollama API
|
|
|
|
Uses the `/api/embed` endpoint (Ollama v0.4+):
|
|
|
|
```json
|
|
// Request
|
|
{ "model": "nomic-embed-text", "input": "text to embed" }
|
|
|
|
// Response key
|
|
embeddings[0] // array of 768 floats
|
|
```
|
|
|
|
> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
|
|
> returned `embedding` (singular). Use `/api/embed`, `input`, and
|
|
> `embeddings[0]` for Ollama v0.4+.
|
|
|
|
## Usage in NexusAI
|
|
|
|
The embedding service is called in two places:
|
|
|
|
1. **Memory service** — after each episode is saved to SQLite, the combined
|
|
`User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
|
|
This is fire-and-forget — failures are logged but don't affect the response.
|
|
|
|
2. **Orchestration service** — the user's message is embedded at the start of
|
|
the chat pipeline to perform semantic search against past episodes.
|
|
|
|
For all HTTP endpoints, see `api-routes.md`. |