nexusAI/docs/services/shared.md

# Shared Package

**Package:** `@nexusai/shared`
**Location:** `packages/shared`

## Purpose

Common utilities and configuration used across all NexusAI services.
Keeping these here avoids duplication and ensures consistent behaviour.

## Exports

### `getEnv(key, defaultValue?)`

Loads an environment variable by key. If no default is provided and the
variable is missing, throws at startup rather than failing silently later.

```js
const { getEnv } = require('@nexusai/shared');

const PORT = getEnv('PORT', '3002');   // optional — falls back to 3002
const DB   = getEnv('SQLITE_PATH');    // required — throws if missing
```

---

### `parseRow(row)`

Parses a SQLite row object, deserialising any JSON-encoded `metadata` fields
into plain objects. Returns `null` if the row is `null` or `undefined`.

```js
const { parseRow } = require('@nexusai/shared');
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));
```

---

### `formatEpisodeText(userMessage, aiResponse)`

Combines a user message and AI response into the canonical text format used
for embedding:

```
User: {userMessage}
Assistant: {aiResponse}
```

Used by the memory service's embedding write path to ensure consistent
vector representations across all episodes.

---

### Constants

Tuneable values and shared identifiers are centralised in `constants.js`
rather than hardcoded across services. Import the relevant group by name.

```js
const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');
```

#### `QDRANT`

Vector store configuration. Values here must stay in sync with the
embedding model and Qdrant collection setup.

| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:6333` | Fallback Qdrant URL |
| `VECTOR_SIZE` | `768` | Output dimensions of `nomic-embed-text` |
| `DISTANCE_METRIC` | `'Cosine'` | Similarity metric used for all collections |
| `DEFAULT_LIMIT` | `10` | Default top-k for vector searches |

#### `COLLECTIONS`

Canonical Qdrant collection names.

| Key | Value |
|---|---|
| `EPISODES` | `'episodes'` |
| `ENTITIES` | `'entities'` |
| `SUMMARIES` | `'summaries'` |

#### `EPISODIC`

Default pagination and result limits for SQLite episode queries.

| Key | Value | Description |
|---|---|---|
| `DEFAULT_RECENT_LIMIT` | `10` | Default number of recent episodes to retrieve |
| `DEFAULT_PAGE_SIZE` | `20` | Default episodes per page for paginated queries |
| `DEFAULT_SEARCH_LIMIT` | `10` | Default number of FTS search results to return |
| `DEFAULT_OFFSET` | `0` | Default pagination offset |
| `DEFAULT_SESSIONS_LIMIT` | `20` | Default number of sessions to return |

#### `SERVICES`

Default URLs for inter-service communication. Used as fallback values
when the corresponding environment variable is not set.

| Key | Value | Description |
|---|---|---|
| `EMBEDDING_URL` | `http://localhost:3003` | Fallback embedding service URL |
| `MEMORY_URL` | `http://localhost:3002` | Fallback memory service URL |
| `INFERENCE_URL` | `http://localhost:3001` | Fallback inference service URL |

#### `PORTS`

Default port numbers for each service.

| Key | Value |
|---|---|
| `INFERENCE` | `'3001'` |
| `MEMORY` | `'3002'` |
| `EMBEDDING` | `'3003'` |
| `ORCHESTRATION` | `'4000'` |

#### `OLLAMA`

Ollama runtime defaults — used by the Ollama inference provider.

| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:11434` | Fallback Ollama URL |
| `EMBED_MODEL` | `'nomic-embed-text'` | Default embedding model |
| `OLLAMA_MODEL` | `'companion:latest'` | Default chat model |

#### `LLAMACPP`

llama.cpp runtime defaults — used by the llama.cpp inference provider.

| Key | Value | Description |
|---|---|---|
| `DEFAULT_URL` | `http://localhost:8080` | Fallback llama-server URL |
| `DEFAULT_MODEL` | `'local-model'` | Fallback model name (override via `DEFAULT_MODEL` env var) |

> Always set `DEFAULT_MODEL` in the inference service `.env` to the exact model
> name reported by `llama-server` (including `.gguf` extension). The shared
> constant is a last-resort fallback only.

#### `INFERENCE_DEFAULTS`

Default inference parameters applied when not specified in a request.
These are used as fallbacks in `resolveOptions()` in both providers.
Orchestration reads live values from `settings.json` and forwards them
on every request — these constants are the fallback layer only.

| Key | Value | Description |
|---|---|---|
| `TEMPERATURE` | `0.7` | Controls randomness (0 = deterministic, 1 = creative) |
| `MAX_TOKENS` | `1024` | Maximum tokens to generate |
| `TOP_P` | `0.9` | Nucleus sampling probability mass |
| `TOP_K` | `40` | Top-K candidates at each step |
| `REPEAT_PENALTY` | `1.1` | Penalty for recently used tokens |
| `SEED` | `null` | null = random; set integer for reproducible outputs |

#### `ORCHESTRATION`

Orchestration pipeline defaults. Used as fallback values in
`config/settings.js` when `settings.json` doesn't contain a key.

| Key | Value | Description |
|---|---|---|
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
| `ENTITIES_LIMIT` | `5` | Max entity search results to inject into prompt |
| `ENTITIES_THRESHOLD` | `0.55` | Minimum similarity score for entity results |
| `TEMPERATURE` | `0.7` | Default inference temperature |
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |

> `ENTITIES_THRESHOLD` is set to `0.55` — lower than `SCORE_THRESHOLD` because
> entity notes generated by a 3B model tend to embed with lower cosine similarity
> than full episode text. Tune upward if irrelevant entities appear in context.

> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
> since those constants already define the canonical values.

Default system prompt:
> "You are a helpful, context-aware AI assistant. You have access to memories
> of past conversations with the user. Use them to provide consistent,
> personalised responses."

#### `SUMMARIES`

Controls the automatic session summarisation system in `orchestration-service/src/services/summarization.js`.

| Key | Value | Description |
|---|---|---|
| `THRESHOLD_TOKENS` | `200` | Minimum total session tokens before summarisation is considered |
| `MAX_SUMMARY_TOKENS` | `800` | If existing summary exceeds this length (chars), create a new row instead of updating |
| `MIN_EPISODES_SINCE` | `5` | Minimum new episodes since last summary before re-summarising |

These can be overridden per-deployment via environment variables in the
orchestration service `.env`:

```
SUMMARY_THRESHOLD_TOKENS=200
SUMMARY_MAX_TOKENS=800
SUMMARY_MIN_EPISODES=5
```

#### `SQLITE`

| Key | Value | Description |
|---|---|---|
| `DEFAULT_PATH` | `'./data/nexusai.db'` | Fallback SQLite database path |