Files

Storme-bit 44989a2b8b documentation updated for model inference settings

2026-04-18 06:41:50 -07:00

9.7 KiB

Raw Blame History

API Routes

All HTTP endpoints across NexusAI services. Clients communicate only with the orchestration service (port 4000) — memory service routes are listed here for reference and direct debugging use.

Orchestration Service — port 4000

Health

Method	Path	Description
GET	/health	Service health check

Chat

Method	Path	Description
POST	/chat	Send a message, receive full response
POST	/chat/stream	Send a message, receive SSE token stream

POST /chat and POST /chat/stream — request body:

{
  "sessionId": "your-session-uuid",
  "message": "Hello, my name is Tim.",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7
}

model and temperature are optional. Inference parameters (temperature, topP, topK, repeatPenalty) are read from settings.json on every request — the request body values are not used for these; they are controlled via PATCH /settings.

POST /chat — response:

{
  "sessionId": "your-session-uuid",
  "response": "Hello Tim! How can I help you today?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "tokenCount": 87
}

POST /chat/stream — response (SSE):

data: {"text":"Hello"}
data: {"text":" Tim"}
data: {"done":true,"model":"gemma-4-26B...gguf","tokenCount":87}

Sessions

Method	Path	Description
GET	/sessions	Paginated session list
GET	/sessions/:sessionId/history	Paginated episode history for a session
PATCH	/sessions/:sessionId	Update session name and/or project assignment
DELETE	/sessions/:sessionId	Delete session and all its episodes

GET /sessions — query params:

Param	Default	Description
limit	20	Sessions per page
offset	0	Pagination offset
projectId	—	Filter by project (integer ID)

PATCH /sessions/:sessionId — body:

{ "name": "My Session", "projectId": 3 }

Either name or projectId is required. Both can be sent together. Returns the updated session object.

GET /sessions/:sessionId/history — query params:

Param	Default	Description
limit	20	Episodes per page
offset	0	Pagination offset

Returns { sessionId, episodes: [...] }. Episodes ordered newest first.

Projects

Method	Path	Description
GET	/projects	Get all projects
POST	/projects	Create a new project
PATCH	/projects/:id	Update a project
DELETE	/projects/:id	Delete a project (nulls session assignments)

POST /projects — body:

{
  "name": "My Project",
  "description": "Optional description",
  "colour": "#3d3a79",
  "icon": null,
  "isolated": 0
}

name is required. All other fields optional. isolated is 0 or 1. Returns 201 with the created project object.

PATCH /projects/:id — body: same fields as POST, all optional.

Models

Method	Path	Description
GET	/models	Available models scanned live from models folder
GET	/models/props	Live model props from llama-server (context window, loaded model)

GET /models — returns array:

[{ "value": "model-name.gguf", "label": "Display Name", "description": null, "size": "19.7 GB" }]

Scans .gguf files live from modelsFolderPath (set in settings). Merges with models.json in the same folder for label and description metadata.

GET /models/props — returns:

{ "contextWindow": 64000, "modelAlias": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf" }

Fetches directly from llama-server /props. Returns 503 if llama-server is unreachable.

Settings

Method	Path	Description
GET	/settings	Get all current settings
PATCH	/settings	Update one or more settings

GET /settings — response:

{
  "recentEpisodeLimit": 9,
  "semanticLimit": 5,
  "scoreThreshold": 0.6,
  "modelsFolderPath": "/mnt/nexus-models",
  "temperature": 0.65,
  "repeatPenalty": 1.3,
  "topP": 0.9,
  "topK": 41
}

PATCH /settings — body: any subset of the above fields.

Field	Type	Range	Description
`recentEpisodeLimit`	integer	1–20	Recent episodes injected into prompt
`semanticLimit`	integer	1–20	Max semantic search results
`scoreThreshold`	float	0–1	Minimum similarity score
`modelsFolderPath`	string	—	Path to folder containing .gguf files
`temperature`	float	0–2	Inference randomness
`repeatPenalty`	float	1–2	Repeat token penalty
`topP`	float	0–1	Nucleus sampling probability mass
`topK`	integer	1–100	Top-K token candidates per step

Settings are persisted to data/settings.json and read on every request — changes take effect immediately without a service restart.

Episodes

Method	Path	Description
GET	/episodes	Paginated episode list across all sessions
DELETE	/episodes/:id	Delete an episode (SQLite + Qdrant)

GET /episodes — query params:

Param	Default	Description
limit	20	Episodes per page
offset	0	Pagination offset
q	—	Keyword search (FTS)

Memory Service — port 3002

Direct access is for debugging only. All client traffic goes through orchestration.

Health

Method	Path	Description
GET	/health	Service health check

Sessions

Method	Path	Description
POST	/sessions	Create a new session
GET	/sessions	Paginated session list with optional projectId filter
GET	/sessions/:id	Get session by internal ID
GET	/sessions/by-external/:externalId	Get session by external ID
PATCH	/sessions/by-external/:externalId	Update session fields
DELETE	/sessions/by-external/:externalId	Delete session (cascades to episodes)

Route ordering: by-external/:externalId must be defined before /:id to prevent by-external being captured as an ID param.

POST /sessions — body:

{ "externalId": "unique-uuid", "metadata": {} }

PATCH /sessions/by-external/:externalId — body:

{ "name": "Session Name", "projectId": 3 }

Both fields are optional. Only provided fields are updated — other fields are not touched.

Episodes

Method	Path	Description
POST	/episodes	Create episode + auto-embed into Qdrant
GET	/episodes	Paginated episode list across all sessions
GET	/episodes/search?q=&limit=	FTS keyword search across all episodes
GET	/episodes/:id	Get episode by ID
GET	/sessions/:id/episodes?limit=&offset=	Paginated episodes for a session
DELETE	/episodes/:id	Delete episode (SQLite + Qdrant cleanup)

Route ordering: /episodes/search must be defined before /episodes/:id.

POST /episodes — body:

{
  "sessionId": 1,
  "userMessage": "Hello",
  "aiResponse": "Hi there!",
  "tokenCount": 10
}

Projects

Method	Path	Description
POST	/projects	Create a new project
GET	/projects	Get all projects
GET	/projects/:id	Get project by ID
PATCH	/projects/:id	Update a project
DELETE	/projects/:id	Delete project + null session assignments

Same request/response shape as orchestration /projects above.

Entities

Method	Path	Description
POST	/entities	Upsert entity (creates or updates by name + type)
GET	/entities/by-type/:type	All entities of a given type
GET	/entities/:id	Get entity by ID
DELETE	/entities/:id	Delete entity (cascades to relationships)

Route ordering: /entities/by-type/:type must be before /entities/:id.

POST /entities — body:

{
  "name": "NexusAI",
  "type": "project",
  "notes": "My AI memory project",
  "metadata": {}
}

Relationships

Method	Path	Description
POST	/relationships	Upsert a relationship between two entities
GET	/entities/:id/relationships	All relationships for an entity
DELETE	/relationships	Delete a specific relationship

POST /relationships — body:

{ "fromId": 1, "toId": 2, "label": "uses", "metadata": {} }

DELETE /relationships — body:

{ "fromId": 1, "toId": 2, "label": "uses" }

Relationships are identified by the composite key (fromId, toId, label). Delete uses request body rather than URL params since this three-part key is awkward to encode in a path.

Embedding Service — port 3003

Method	Path	Description
GET	/health	Service health check
POST	/embed	Embed a single text string
POST	/embed/batch	Embed an array of text strings

POST /embed — body:

{ "text": "Hello from NexusAI" }

POST /embed — response:

{ "embedding": [0.123, -0.456, ...], "model": "nomic-embed-text", "dimensions": 768 }

Inference Service — port 3001

Method	Path	Description
GET	/health	Health check — reports active provider and model
POST	/complete	Full completion — awaits entire response
POST	/complete/stream	Streaming completion via SSE

POST /complete — body:

{
  "prompt": "What is the capital of France?",
  "model": "gemma-4-26B-A4B-Claude-Distill-APEX-I-Mini.gguf",
  "temperature": 0.7,
  "maxTokens": 1024,
  "topP": 0.9,
  "topK": 40,
  "repeatPenalty": 1.1
}

All fields except prompt are optional. In normal usage these are forwarded from orchestration, which reads them from settings.json.

POST /complete — response:

{
  "text": "The capital of France is Paris.",
  "model": "gemma-4-26B...gguf",
  "done": true,
  "evalCount": 8,
  "promptEvalCount": 41
}

9.7 KiB Raw Blame History Unescape Escape

API Routes

Orchestration Service — port 4000

Health

Chat

Sessions

Projects

Models

Settings

Episodes

Memory Service — port 3002

Health

Sessions

Episodes

Projects

Entities

Relationships

Embedding Service — port 3003

Inference Service — port 3001

9.7 KiB

Raw Blame History