updated documentation for entity implementation
This commit is contained in:
@@ -74,6 +74,7 @@ service by ID after the vector search.
|
|||||||
The core four-service architecture is complete and operational. Key capabilities:
|
The core four-service architecture is complete and operational. Key capabilities:
|
||||||
|
|
||||||
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
- **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
|
||||||
|
- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
|
||||||
- **Projects** — sessions grouped with shared or isolated memory pools
|
- **Projects** — sessions grouped with shared or isolated memory pools
|
||||||
- **Auto-naming** — sessions named automatically from first exchange via inference
|
- **Auto-naming** — sessions named automatically from first exchange via inference
|
||||||
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
- **Project-scoped semantic search** — Qdrant filtered by project session IDs
|
||||||
|
|||||||
@@ -28,6 +28,8 @@ service to generate and store a vector in Qdrant.
|
|||||||
| SQLITE_PATH | Yes | — | Path to SQLite database file |
|
| SQLITE_PATH | Yes | — | Path to SQLite database file |
|
||||||
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
| QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
|
||||||
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
| EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
|
||||||
|
| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
|
||||||
|
| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
|
||||||
|
|
||||||
## Internal Structure
|
## Internal Structure
|
||||||
|
|
||||||
@@ -42,7 +44,8 @@ src/
|
|||||||
├── semantic/
|
├── semantic/
|
||||||
│ └── index.js # Qdrant collection management, upsert, search, delete
|
│ └── index.js # Qdrant collection management, upsert, search, delete
|
||||||
├── entities/
|
├── entities/
|
||||||
│ └── index.js # Entity + relationship CRUD
|
│ ├── index.js # Entity + relationship CRUD
|
||||||
|
│ └── extraction.js # Automatic entity extraction via qwen2.5:3b on Ollama
|
||||||
└── index.js # Express app + all route definitions
|
└── index.js # Express app + all route definitions
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -143,6 +146,32 @@ constraints to prevent duplicates:
|
|||||||
- `UNIQUE(from_id, to_id, label)` on relationships
|
- `UNIQUE(from_id, to_id, label)` on relationships
|
||||||
- `ON DELETE CASCADE` on relationship foreign keys
|
- `ON DELETE CASCADE` on relationship foreign keys
|
||||||
|
|
||||||
|
### Automatic Entity Extraction
|
||||||
|
|
||||||
|
After each episode is saved, `extraction.js` automatically extracts named
|
||||||
|
entities from the conversation using `qwen2.5:3b` running on Ollama (Mini PC 1).
|
||||||
|
This runs **fire-and-forget** — the episode is already saved and returned
|
||||||
|
before extraction begins.
|
||||||
|
|
||||||
|
**Entity types extracted:** `person`, `place`, `project`, `technology`,
|
||||||
|
`concept`, `organization`
|
||||||
|
|
||||||
|
The extraction prompt uses ChatML format (native to qwen2.5) and primes the
|
||||||
|
response by ending with `[` to steer the model directly into JSON array output.
|
||||||
|
A list of already-known entities is injected into the prompt so the model
|
||||||
|
reuses existing `(name, type)` pairs rather than creating duplicates with
|
||||||
|
different types.
|
||||||
|
|
||||||
|
After extraction, each entity is:
|
||||||
|
1. Upserted into SQLite via `upsertEntity` — notes are only written if
|
||||||
|
the entity is new (`COALESCE(entities.notes, excluded.notes)` prevents
|
||||||
|
overwriting existing notes with speculative updates)
|
||||||
|
2. Embedded via the embedding service and upserted into the `entities`
|
||||||
|
Qdrant collection with `{ name, type, notes }` as payload
|
||||||
|
|
||||||
|
The Qdrant payload stores enough information to reconstruct entity context
|
||||||
|
at retrieval time without a SQLite roundtrip.
|
||||||
|
|
||||||
## Project Delete Behaviour
|
## Project Delete Behaviour
|
||||||
|
|
||||||
Deleting a project runs as a transaction — it first nulls out `project_id`
|
Deleting a project runs as a transaction — it first nulls out `project_id`
|
||||||
|
|||||||
@@ -76,17 +76,22 @@ difference is how the inference response is delivered to the client.
|
|||||||
recent episodes. Non-critical — if it fails, pipeline continues with
|
recent episodes. Non-critical — if it fails, pipeline continues with
|
||||||
recency-only context.
|
recency-only context.
|
||||||
|
|
||||||
5. **Prompt assembly** — combine system prompt, semantic episodes, recent
|
5. **Entity search** — reuse the embedded user message vector to query the
|
||||||
episodes, and user message.
|
`entities` Qdrant collection (score threshold 0.6, limit 5). Returns
|
||||||
|
entity payloads (`name`, `type`, `notes`) directly — no SQLite roundtrip
|
||||||
|
needed. Non-critical — if it fails, pipeline continues without entity context.
|
||||||
|
|
||||||
6. **Inference** — send to inference service. `/chat` awaits full response;
|
6. **Prompt assembly** — combine system prompt, entity context, semantic
|
||||||
|
episodes, recent episodes, and user message.
|
||||||
|
|
||||||
|
7. **Inference** — send to inference service. `/chat` awaits full response;
|
||||||
`/chat/stream` pipes SSE chunks to the client.
|
`/chat/stream` pipes SSE chunks to the client.
|
||||||
|
|
||||||
7. **Episode write** — write the exchange back to memory. Fire-and-forget
|
8. **Episode write** — write the exchange back to memory. Fire-and-forget
|
||||||
for `/chat`; awaited for `/chat/stream` to ensure the full text is
|
for `/chat`; awaited for `/chat/stream` to ensure the full text is
|
||||||
accumulated before saving.
|
accumulated before saving.
|
||||||
|
|
||||||
8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
9. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
|
||||||
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
inference call with a naming prompt (max 20 tokens, temperature 0.3) and
|
||||||
write the result back as `session.name`. Fully fire-and-forget.
|
write the result back as `session.name`. Fully fire-and-forget.
|
||||||
|
|
||||||
@@ -95,6 +100,10 @@ difference is how the inference response is delivered to the client.
|
|||||||
```
|
```
|
||||||
[System prompt]
|
[System prompt]
|
||||||
|
|
||||||
|
Here is what you know about entities relevant to this conversation:
|
||||||
|
- {name} ({type}): {notes}
|
||||||
|
... (up to 5 entity results)
|
||||||
|
---
|
||||||
Here are some relevant memories from earlier conversations:
|
Here are some relevant memories from earlier conversations:
|
||||||
User: {past user message}
|
User: {past user message}
|
||||||
Assistant: {past ai response}
|
Assistant: {past ai response}
|
||||||
@@ -110,8 +119,9 @@ User: {current message}
|
|||||||
Assistant:
|
Assistant:
|
||||||
```
|
```
|
||||||
|
|
||||||
Semantic episodes appear before recent episodes so the model sees
|
Entity context appears first — before episodic memory — because structured
|
||||||
long-range context before the immediate conversation flow.
|
facts about known entities are the most stable and reliable context. Semantic
|
||||||
|
episodes follow, then recent episodes as the immediate conversation flow.
|
||||||
|
|
||||||
## SSE Stream Format
|
## SSE Stream Format
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user