storme/nexusAI

Fork 0

Files

Storme-bit 44989a2b8b documentation updated for model inference settings

2026-04-18 06:41:50 -07:00

5.8 KiB

Raw Blame History

Shared Package

Package: @nexusai/shared
Location: packages/shared

Purpose

Common utilities and configuration used across all NexusAI services. Keeping these here avoids duplication and ensures consistent behaviour.

Exports

`getEnv(key, defaultValue?)`

Loads an environment variable by key. If no default is provided and the variable is missing, throws at startup rather than failing silently later.

const { getEnv } = require('@nexusai/shared');

const PORT = getEnv('PORT', '3002');   // optional — falls back to 3002
const DB   = getEnv('SQLITE_PATH');    // required — throws if missing

`parseRow(row)`

Parses a SQLite row object, deserialising any JSON-encoded metadata fields into plain objects. Returns null if the row is null or undefined.

const { parseRow } = require('@nexusai/shared');
const session = parseRow(db.prepare('SELECT * FROM sessions WHERE id = ?').get(id));

`formatEpisodeText(userMessage, aiResponse)`

Combines a user message and AI response into the canonical text format used for embedding:

User: {userMessage}
Assistant: {aiResponse}

Used by the memory service's embedding write path to ensure consistent vector representations across all episodes.

Constants

Tuneable values and shared identifiers are centralised in constants.js rather than hardcoded across services. Import the relevant group by name.

const { QDRANT, COLLECTIONS, EPISODIC, LLAMACPP } = require('@nexusai/shared');

`QDRANT`

Vector store configuration. Values here must stay in sync with the embedding model and Qdrant collection setup.

Key	Value	Description
`DEFAULT_URL`	`http://localhost:6333`	Fallback Qdrant URL
`VECTOR_SIZE`	`768`	Output dimensions of `nomic-embed-text`
`DISTANCE_METRIC`	`'Cosine'`	Similarity metric used for all collections
`DEFAULT_LIMIT`	`10`	Default top-k for vector searches

`COLLECTIONS`

Canonical Qdrant collection names.

Key	Value
`EPISODES`	`'episodes'`
`ENTITIES`	`'entities'`
`SUMMARIES`	`'summaries'`

`EPISODIC`

Default pagination and result limits for SQLite episode queries.

Key	Value	Description
`DEFAULT_RECENT_LIMIT`	`10`	Default number of recent episodes to retrieve
`DEFAULT_PAGE_SIZE`	`20`	Default episodes per page for paginated queries
`DEFAULT_SEARCH_LIMIT`	`10`	Default number of FTS search results to return
`DEFAULT_OFFSET`	`0`	Default pagination offset
`DEFAULT_SESSIONS_LIMIT`	`20`	Default number of sessions to return

`SERVICES`

Default URLs for inter-service communication. Used as fallback values when the corresponding environment variable is not set.

Key	Value	Description
`EMBEDDING_URL`	`http://localhost:3003`	Fallback embedding service URL
`MEMORY_URL`	`http://localhost:3002`	Fallback memory service URL
`INFERENCE_URL`	`http://localhost:3001`	Fallback inference service URL

`PORTS`

Default port numbers for each service.

Key	Value
`INFERENCE`	`'3001'`
`MEMORY`	`'3002'`
`EMBEDDING`	`'3003'`
`ORCHESTRATION`	`'4000'`

`OLLAMA`

Ollama runtime defaults — used by the Ollama inference provider.

Key	Value	Description
`DEFAULT_URL`	`http://localhost:11434`	Fallback Ollama URL
`EMBED_MODEL`	`'nomic-embed-text'`	Default embedding model
`OLLAMA_MODEL`	`'companion:latest'`	Default chat model

`LLAMACPP`

llama.cpp runtime defaults — used by the llama.cpp inference provider.

Key	Value	Description
`DEFAULT_URL`	`http://localhost:8080`	Fallback llama-server URL
`DEFAULT_MODEL`	`'local-model'`	Fallback model name (override via `DEFAULT_MODEL` env var)

Always set DEFAULT_MODEL in the inference service .env to the exact model name reported by llama-server (including .gguf extension). The shared constant is a last-resort fallback only.

`INFERENCE_DEFAULTS`

Default inference parameters applied when not specified in a request. These are used as fallbacks in resolveOptions() in both providers. Orchestration reads live values from settings.json and forwards them on every request — these constants are the fallback layer only.

Key	Value	Description
`TEMPERATURE`	`0.7`	Controls randomness (0 = deterministic, 1 = creative)
`MAX_TOKENS`	`1024`	Maximum tokens to generate
`TOP_P`	`0.9`	Nucleus sampling probability mass
`TOP_K`	`40`	Top-K candidates at each step
`REPEAT_PENALTY`	`1.1`	Penalty for recently used tokens
`SEED`	`null`	null = random; set integer for reproducible outputs

`ORCHESTRATION`

Orchestration pipeline defaults. Used as fallback values in config/settings.js when settings.json doesn't contain a key.

Key	Value	Description
`RECENT_EPISODE_LIMIT`	`5`	Recent episodes to inject into prompt
`SEMANTIC_LIMIT`	`5`	Semantic search results to inject into prompt
`SCORE_THRESHOLD`	`0.75`	Minimum similarity score for semantic results
`TEMPERATURE`	`0.7`	Default inference temperature
`CORS_ORIGIN`	`'http://localhost:5173'`	Fallback allowed CORS origin
`SYSTEM_PROMPT`	(see below)	Default system prompt

repeatPenalty, topP, and topK defaults are sourced from INFERENCE_DEFAULTS in config/settings.js rather than ORCHESTRATION, since those constants already define the canonical values.

Default system prompt:

"You are a helpful, context-aware AI assistant. You have access to memories of past conversations with the user. Use them to provide consistent, personalised responses."

`SQLITE`

Key	Value	Description
`DEFAULT_PATH`	`'./data/nexusai.db'`	Fallback SQLite database path

5.8 KiB Raw Blame History

Shared Package

Purpose

Exports

getEnv(key, defaultValue?)

parseRow(row)

formatEpisodeText(userMessage, aiResponse)

Constants

QDRANT

COLLECTIONS

EPISODIC

SERVICES

PORTS

OLLAMA

LLAMACPP

INFERENCE_DEFAULTS

ORCHESTRATION

SQLITE