diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md
index 25e54aa..f7d3d89 100644
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -74,6 +74,7 @@ service by ID after the vector search.
 The core four-service architecture is complete and operational. Key capabilities:
 
 - **Hybrid memory retrieval** — recent episodes + semantic search combined into every prompt
+- **Entity layer** — automatic extraction of named entities from conversations via qwen2.5:3b, stored in SQLite and Qdrant, injected into every prompt as structured knowledge
 - **Projects** — sessions grouped with shared or isolated memory pools
 - **Auto-naming** — sessions named automatically from first exchange via inference
 - **Project-scoped semantic search** — Qdrant filtered by project session IDs
diff --git a/docs/services/memory-service.md b/docs/services/memory-service.md
index ccbc5ab..bf88ebd 100644
--- a/docs/services/memory-service.md
+++ b/docs/services/memory-service.md
@@ -28,6 +28,8 @@ service to generate and store a vector in Qdrant.
 | SQLITE_PATH | Yes | — | Path to SQLite database file |
 | QDRANT_URL | No | http://localhost:6333 | Qdrant instance URL |
 | EMBEDDING_SERVICE_URL | No | http://localhost:3003 | Embedding service URL |
+| EXTRACTION_URL | No | http://localhost:11434 | Ollama URL for entity extraction |
+| EXTRACTION_MODEL | No | qwen2.5:3b | Ollama model used for entity extraction |
 
 ## Internal Structure
 
@@ -42,7 +44,8 @@ src/
 ├── semantic/
 │   └── index.js       # Qdrant collection management, upsert, search, delete
 ├── entities/
-│   └── index.js       # Entity + relationship CRUD
+│   ├── index.js       # Entity + relationship CRUD
+│   └── extraction.js  # Automatic entity extraction via qwen2.5:3b on Ollama
 └── index.js           # Express app + all route definitions
 ```
 
@@ -143,6 +146,32 @@ constraints to prevent duplicates:
 - `UNIQUE(from_id, to_id, label)` on relationships
 - `ON DELETE CASCADE` on relationship foreign keys
 
+### Automatic Entity Extraction
+
+After each episode is saved, `extraction.js` automatically extracts named
+entities from the conversation using `qwen2.5:3b` running on Ollama (Mini PC 1).
+This runs **fire-and-forget** — the episode is already saved and returned
+before extraction begins.
+
+**Entity types extracted:** `person`, `place`, `project`, `technology`,
+`concept`, `organization`
+
+The extraction prompt uses ChatML format (native to qwen2.5) and primes the
+response by ending with `[` to steer the model directly into JSON array output.
+A list of already-known entities is injected into the prompt so the model
+reuses existing `(name, type)` pairs rather than creating duplicates with
+different types.
+
+After extraction, each entity is:
+1. Upserted into SQLite via `upsertEntity` — notes are only written if
+   the entity is new (`COALESCE(entities.notes, excluded.notes)` prevents
+   overwriting existing notes with speculative updates)
+2. Embedded via the embedding service and upserted into the `entities`
+   Qdrant collection with `{ name, type, notes }` as payload
+
+The Qdrant payload stores enough information to reconstruct entity context
+at retrieval time without a SQLite roundtrip.
+
 ## Project Delete Behaviour
 
 Deleting a project runs as a transaction — it first nulls out `project_id`
diff --git a/docs/services/orchestration-service.md b/docs/services/orchestration-service.md
index 3807e7e..56a4f98 100644
--- a/docs/services/orchestration-service.md
+++ b/docs/services/orchestration-service.md
@@ -76,17 +76,22 @@ difference is how the inference response is delivered to the client.
    recent episodes. Non-critical — if it fails, pipeline continues with
    recency-only context.
 
-5. **Prompt assembly** — combine system prompt, semantic episodes, recent
-   episodes, and user message.
+5. **Entity search** — reuse the embedded user message vector to query the
+   `entities` Qdrant collection (score threshold 0.6, limit 5). Returns
+   entity payloads (`name`, `type`, `notes`) directly — no SQLite roundtrip
+   needed. Non-critical — if it fails, pipeline continues without entity context.
 
-6. **Inference** — send to inference service. `/chat` awaits full response;
+6. **Prompt assembly** — combine system prompt, entity context, semantic
+   episodes, recent episodes, and user message.
+
+7. **Inference** — send to inference service. `/chat` awaits full response;
    `/chat/stream` pipes SSE chunks to the client.
 
-7. **Episode write** — write the exchange back to memory. Fire-and-forget
+8. **Episode write** — write the exchange back to memory. Fire-and-forget
    for `/chat`; awaited for `/chat/stream` to ensure the full text is
    accumulated before saving.
 
-8. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
+9. **Auto-naming** — on `isFirstMessage && !session.name`, fire a secondary
    inference call with a naming prompt (max 20 tokens, temperature 0.3) and
    write the result back as `session.name`. Fully fire-and-forget.
 
@@ -95,6 +100,10 @@ difference is how the inference response is delivered to the client.
 ```
 [System prompt]
 
+Here is what you know about entities relevant to this conversation:
+- {name} ({type}): {notes}
+... (up to 5 entity results)
+---
 Here are some relevant memories from earlier conversations:
 User: {past user message}
 Assistant: {past ai response}
@@ -110,8 +119,9 @@ User: {current message}
 Assistant:
 ```
 
-Semantic episodes appear before recent episodes so the model sees
-long-range context before the immediate conversation flow.
+Entity context appears first — before episodic memory — because structured
+facts about known entities are the most stable and reliable context. Semantic
+episodes follow, then recent episodes as the immediate conversation flow.
 
 ## SSE Stream Format