Enrich embedding pipeline for better semantic search

## Context

Semantic search quality is limited by what goes into the embeddings. Today, the entity serializer only embeds URN, type, name, and description. Properties (column names, tags, owners), attached documents, and freshness signals are all ignored. This means:

- Searching for a column name like "bounce_rate" won't find the table that has it
- Searching for "incident" won't surface entities whose runbooks describe incidents
- A freshly updated entity ranks the same as one untouched for a year

## Scope

### 1. Embed entity properties and tags

The `properties` JSONB field often contains the most useful metadata — column names, schema details, owners, tags, labels. The entity serializer (`core/chunking/serializer.go`) should flatten and include relevant properties in the text sent to the embedding provider.

Example: a BigQuery table entity with `properties: {columns: ["user_id", "session_duration", "bounce_rate"], owner: "analytics-team", tags: ["pii", "tier-1"]}` should produce an embedding that understands "bounce_rate", "analytics-team", and "tier-1".

### 2. Cross-embed entity + document content

When a document is attached to an entity, the document's content should enrich the entity's embedding context. If a runbook for `table:user_sessions` mentions "incident", "SLA", and "late-arriving events", searching for those terms should boost that entity in semantic results.

Approach options:
- **At embedding time**: When an entity is embedded, also pull its document content into the embedding context (heavier, richer)
- **At search time**: When semantic search returns document chunks, propagate their scores to the parent entity (lighter, but less precise)

### 3. Freshness decay in ranking

Add a mild freshness boost to search and context assembly scoring. Entities with a recent `updated_at` get a small multiplier. This is not a popularity signal — it's an objective liveness indicator.

This applies to:
- `SearchEntities` hybrid ranking (RRF score adjustment)
- `AssembleContext` entity scoring (alongside intent weights)

## Design Considerations

- Property embedding should be selective — not all JSONB fields are useful. A configurable allowlist or heuristic (e.g., skip fields > 1000 chars) may be needed.
- Cross-embedding creates a dependency: document upsert should trigger re-embedding of the parent entity. The pipeline already handles async re-embedding on entity upsert, so this is an extension of existing behavior.
- Freshness decay should be gentle — a 1.1-1.2x multiplier for entities updated in the last 7 days, not a hard penalty for old entities. Old but relevant entities should still surface.
- All changes should degrade gracefully when embeddings are disabled.

## Related

- #237 — Graph-aware ranking (complementary: centrality scoring alongside richer embeddings)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich embedding pipeline for better semantic search #252

Context

Scope

1. Embed entity properties and tags

2. Cross-embed entity + document content

3. Freshness decay in ranking

Design Considerations

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enrich embedding pipeline for better semantic search #252

Description

Context

Scope

1. Embed entity properties and tags

2. Cross-embed entity + document content

3. Freshness decay in ranking

Design Considerations

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions