update documentation
This commit is contained in:
@@ -27,80 +27,43 @@ minimizing network hops on the memory write path.
|
||||
| OLLAMA_URL | No | http://localhost:11434 | Ollama instance URL |
|
||||
| EMBEDDING_MODEL | No | nomic-embed-text | Ollama embedding model to use |
|
||||
|
||||
> Ollama must be running with `OLLAMA_HOST=0.0.0.0` to accept LAN connections
|
||||
> from other services.
|
||||
|
||||
## Model
|
||||
|
||||
**nomic-embed-text** via Ollama produces **768-dimension** vectors using **Cosine similarity**.
|
||||
This must match the `QDRANT.VECTOR_SIZE` constant in `@nexusai/shared`.
|
||||
**nomic-embed-text** via Ollama produces **768-dimension** vectors with
|
||||
**Cosine similarity**. This must match `QDRANT.VECTOR_SIZE` in `@nexusai/shared`.
|
||||
|
||||
If the embedding model is changed, the Qdrant collections must be reinitialized
|
||||
with the new vector dimension — updating `QDRANT.VECTOR_SIZE` in `constants.js` is
|
||||
the single change required to keep everything consistent.
|
||||
with the new vector dimension. Updating `QDRANT.VECTOR_SIZE` in `constants.js`
|
||||
is the single change required to keep everything consistent.
|
||||
|
||||
## Ollama API
|
||||
|
||||
Uses the `/api/embed` endpoint (Ollama v0.4+). Request shape:
|
||||
Uses the `/api/embed` endpoint (Ollama v0.4+):
|
||||
|
||||
```json
|
||||
// Request
|
||||
{ "model": "nomic-embed-text", "input": "text to embed" }
|
||||
```
|
||||
Response key is `embeddings[0]` — an array of 768 floats.
|
||||
|
||||
## Endpoints
|
||||
|
||||
### Health
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| GET | /health | Service health check |
|
||||
|
||||
### Embed
|
||||
|
||||
| Method | Path | Description |
|
||||
|---|---|---|
|
||||
| POST | /embed | Embed a single text string |
|
||||
| POST | /embed/batch | Embed an array of text strings |
|
||||
|
||||
---
|
||||
|
||||
**POST /embed**
|
||||
|
||||
Embeds a single text string and returns the vector.
|
||||
|
||||
Request body:
|
||||
```json
|
||||
{
|
||||
"text": "Hello from NexusAI"
|
||||
}
|
||||
// Response key
|
||||
embeddings[0] // array of 768 floats
|
||||
```
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"embedding": [0.123, -0.456, ...],
|
||||
"model": "nomic-embed-text",
|
||||
"dimensions": 768
|
||||
}
|
||||
```
|
||||
> Earlier Ollama versions used `/api/embeddings` with a `prompt` key and
|
||||
> returned `embedding` (singular). Use `/api/embed`, `input`, and
|
||||
> `embeddings[0]` for Ollama v0.4+.
|
||||
|
||||
---
|
||||
## Usage in NexusAI
|
||||
|
||||
**POST /embed/batch**
|
||||
The embedding service is called in two places:
|
||||
|
||||
Embeds an array of strings sequentially and returns all vectors in the same order.
|
||||
Ollama does not natively parallelize embeddings, so requests are processed one at a time.
|
||||
1. **Memory service** — after each episode is saved to SQLite, the combined
|
||||
`User: ..\nAssistant: ..` text is embedded and upserted into Qdrant.
|
||||
This is fire-and-forget — failures are logged but don't affect the response.
|
||||
|
||||
Request body:
|
||||
```json
|
||||
{
|
||||
"texts": ["first sentence", "second sentence"]
|
||||
}
|
||||
```
|
||||
2. **Orchestration service** — the user's message is embedded at the start of
|
||||
the chat pipeline to perform semantic search against past episodes.
|
||||
|
||||
Response:
|
||||
```json
|
||||
{
|
||||
"embeddings": [[0.123, ...], [0.456, ...]],
|
||||
"model": "nomic-embed-text",
|
||||
"dimensions": 768,
|
||||
"count": 2
|
||||
}
|
||||
```
|
||||
For all HTTP endpoints, see `api-routes.md`.
|
||||
Reference in New Issue
Block a user