updated documentation for entity implementation

2026-04-17 07:00:28 -07:00
parent 7e50e82d8c
commit 91e4f68a8c
3 changed files with 48 additions and 8 deletions
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -76,17 +76,22 @@ difference is how the inference response is delivered to the client.
   recent episodes. Non-critical — if it fails, pipeline continues with
   recency-only context.

-5. **Prompt assembly** — combine system prompt, semantic episodes, recent
-   episodes, and user message.
+5. **Entity search** — reuse the embedded user message vector to query the
+   `entities` Qdrant collection (score threshold 0.6, limit 5). Returns
+   entity payloads (`name`, `type`, `notes`) directly — no SQLite roundtrip
+   needed. Non-critical — if it fails, pipeline continues without entity context.

-6. **Inference** — send to inference service. `/chat` awaits full response;
+6. **Prompt assembly** — combine system prompt, entity context, semantic
+   episodes, recent episodes, and user message.
+
+7. **Inference** — send to inference service. `/chat` awaits full response;
   `/chat/stream` pipes SSE chunks to the client.

-7. **Episode write** — write the exchange back to memory. Fire-and-forget
+8. **Episode write** — write the exchange back to memory. Fire-and-forget
   for `/chat`; awaited for `/chat/stream` to ensure the full text is
   accumulated before saving.

-8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
+9. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
   inference call with a naming prompt (max 20 tokens, temperature 0.3) and
   write the result back as `session.name`. Fully fire-and-forget.

@@ -95,6 +100,10 @@ difference is how the inference response is delivered to the client.
 ```
 [System prompt]

+Here is what you know about entities relevant to this conversation:
+- {name} ({type}): {notes}
+... (up to 5 entity results)
+---
 Here are some relevant memories from earlier conversations:
 User: {past user message}
 Assistant: {past ai response}
@@ -110,8 +119,9 @@ User: {current message}
 Assistant:
 ```

-Semantic episodes appear before recent episodes so the model sees
-long-range context before the immediate conversation flow.
+Entity context appears first — before episodic memory — because structured
+facts about known entities are the most stable and reliable context. Semantic
+episodes follow, then recent episodes as the immediate conversation flow.

 ## SSE Stream Format