documentation updates for entity extraction and summarization

This commit is contained in:
Storme-bit
2026-04-21 03:50:38 -07:00
parent 32365e67f4
commit acda21317b
6 changed files with 540 additions and 107 deletions

View File

@@ -165,10 +165,16 @@ Orchestration pipeline defaults. Used as fallback values in
| `RECENT_EPISODE_LIMIT` | `5` | Recent episodes to inject into prompt |
| `SEMANTIC_LIMIT` | `5` | Semantic search results to inject into prompt |
| `SCORE_THRESHOLD` | `0.75` | Minimum similarity score for semantic results |
| `ENTITIES_LIMIT` | `5` | Max entity search results to inject into prompt |
| `ENTITIES_THRESHOLD` | `0.55` | Minimum similarity score for entity results |
| `TEMPERATURE` | `0.7` | Default inference temperature |
| `CORS_ORIGIN` | `'http://localhost:5173'` | Fallback allowed CORS origin |
| `SYSTEM_PROMPT` | *(see below)* | Default system prompt |
> `ENTITIES_THRESHOLD` is set to `0.55` — lower than `SCORE_THRESHOLD` because
> entity notes generated by a 3B model tend to embed with lower cosine similarity
> than full episode text. Tune upward if irrelevant entities appear in context.
> `repeatPenalty`, `topP`, and `topK` defaults are sourced from
> `INFERENCE_DEFAULTS` in `config/settings.js` rather than `ORCHESTRATION`,
> since those constants already define the canonical values.
@@ -178,6 +184,25 @@ Default system prompt:
> of past conversations with the user. Use them to provide consistent,
> personalised responses."
#### `SUMMARIES`
Controls the automatic session summarisation system in `orchestration-service/src/services/summarization.js`.
| Key | Value | Description |
|---|---|---|
| `THRESHOLD_TOKENS` | `200` | Minimum total session tokens before summarisation is considered |
| `MAX_SUMMARY_TOKENS` | `800` | If existing summary exceeds this length (chars), create a new row instead of updating |
| `MIN_EPISODES_SINCE` | `5` | Minimum new episodes since last summary before re-summarising |
These can be overridden per-deployment via environment variables in the
orchestration service `.env`:
```
SUMMARY_THRESHOLD_TOKENS=200
SUMMARY_MAX_TOKENS=800
SUMMARY_MIN_EPISODES=5
```
#### `SQLITE`
| Key | Value | Description |