Files
nexusAI/docs/services/shared.md

210 lines
6.9 KiB
Markdown

# Shared Package
**Package:** `@nexusai/shared`
**Location:** `packages/shared`
## Purpose
Common utilities and configuration used across all NexusAI services.
Keeping these here avoids duplication and ensures consistent behaviour.
## Exports
### `getEnv(key, defaultValue?)`
Loads an environment variable by key. If no default is provided and the
variable is missing, throws at startup rather than failing silently later.
```js
const { getEnv } = require('@nexusai/shared');
const PORT = getEnv('PORT', '3002'); // optional — falls back to 3002
const DB = getEnv('SQLITE_PATH'); // required — throws if missing
```
---
### `parseRow(row)`
Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields
into plain objects. Returns `null` if the row is `null` or `undefined`.
```js
const { parseRow } = require('@nexusai/shared');
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));
```
---
### `formatEpisodeText(userMessage, aiResponse)`
Combines a user message and AI response into the canonical text format used
for embedding:
```
User: {userMessage}
Assistant: {aiResponse}
```
Used by the memory service's embedding write path to ensure consistent
vector representations across all episodes.
---
### Constants
Tuneable values and shared identifiers are centralised in `constants.js`
rather than hardcoded across services. Import the relevant group by name.
```js
const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');
```
#### `QDRANT`
Vector store configuration. Values here must stay in sync with the
embedding model and Qdrant collection setup.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL |
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |
#### `COLLECTIONS`
Canonical Qdrant collection names.
| Key | Value |
|---|---|
| `EPISODES` | `'episodes'` |
| `ENTITIES` | `'entities'` |
| `SUMMARIES` | `'summaries'` |
#### `EPISODIC`
Default pagination and result limits for SQLite episode queries.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
| `DEFAULT_OFFSET` | `0` | Default pagination offset |
| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return |
#### `SERVICES`
Default URLs for inter-service communication. Used as fallback values
when the corresponding environment variable is not set.
| Key | Value | Description |
|---|---|---|
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL |
| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL |
#### `PORTS`
Default port numbers for each service.
| Key | Value |
|---|---|
| `INFERENCE` | `'3001'` |
| `MEMORY` | `'3002'` |
| `EMBEDDING` | `'3003'` |
| `ORCHESTRATION` | `'4000'` |
#### `OLLAMA`
Ollama runtime defaults — used by the Ollama inference provider.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL |
| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model |
| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model |
#### `LLAMACPP`
llama.cpp runtime defaults — used by the llama.cpp inference provider.
| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL |
| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) |
> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model
> name reported by `llama-server` (including `.gguf` extension). The shared
> constant is a last-resort fallback only.
#### `INFERENCE_DEFAULTS`
Default inference parameters applied when not specified in a request.
These are used as fallbacks in `resolveOptions()` in both providers.
Orchestration reads live values from `settings.json` and forwards them
on every request — these constants are the fallback layer only.
| Key | Value | Description |
|---|---|---|
| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) |
| `MAX_TOKENS` | `1024` | Maximum tokens to generate |
| `TOP_P` | `0.9` | Nucleus sampling probability mass |
| `TOP_K` | `40` | Top-K candidates at each step |
| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens |
| `SEED` | `null` | null = random; set integer for reproducible outputs |
#### `ORCHESTRATION`
Orchestration pipeline defaults. Used as fallback values in
`config/settings.js` when `settings.json` doesn't contain a key.
| Key | Value | Description |
|---|---|---|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
| `ENTITIES_LIMIT` | `5` | Max entity search results to inject into prompt |
| `ENTITIES_THRESHOLD` | `0.55` | Minimum similarity score for entity results |
| `TEMPERATURE` | `0.7` | Default inference temperature |
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
> `ENTITIES_THRESHOLD` is set to `0.55` — lower than `SCORE_THRESHOLD` because
> entity notes generated by a 3B model tend to embed with lower cosine similarity
> than full episode text. Tune upward if irrelevant entities appear in context.
> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
> since those constants already define the canonical values.
Default system prompt:
> "You are a helpful, context-aware AI assistant. You have access to memories
> of past conversations with the user. Use them to provide consistent,
> personalised responses."
#### `SUMMARIES`
Controls the automatic session summarisation system in `orchestration-service/src/services/summarization.js`.
| Key | Value | Description |
|---|---|---|
| `THRESHOLD_TOKENS` | `200` | Minimum total session tokens before summarisation is considered |
| `MAX_SUMMARY_TOKENS` | `800` | If existing summary exceeds this length (chars), create a new row instead of updating |
| `MIN_EPISODES_SINCE` | `5` | Minimum new episodes since last summary before re-summarising |
These can be overridden per-deployment via environment variables in the
orchestration service `.env`:
```
SUMMARY_THRESHOLD_TOKENS=200
SUMMARY_MAX_TOKENS=800
SUMMARY_MIN_EPISODES=5
```
#### `SQLITE`
| Key | Value | Description |
|---|---|---|
| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path |