Add model selector to freebuff with per-model queues#524
Conversation
Lets users pick between glm-5.1 and minimax-m2.7 from the waiting room. Each model has its own FIFO queue so wait times scale independently. Selection persists locally; switching mid-queue moves you to the back of the new queue and switching mid-session is blocked. Adds /queue command to end the current session and rejoin (allowing model switch). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Greptile SummaryThis PR adds a per-model queue system to the freebuff waiting room, letting users choose between GLM 5.1 ( Key changes:
Three issues found:
Confidence Score: 4/5Safe to merge with one targeted fix — the The core queue and session mechanics are solid: the single-UPSERT pattern for
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A([CLI Starts]) --> B[POST /freebuff/session\nwith x-freebuff-model header]
B --> C{Server response}
C -->|queued| D[WaitingRoomScreen\nshows position + ModelSelector]
C -->|active| E[Chat session active]
C -->|model_locked 409| F[model_locked state\n⚠ No UI branch in WaitingRoomScreen]
C -->|country_blocked| G[Terminal: blocked screen]
C -->|disabled| H[Pass-through: no gate]
D --> I{User picks\ndifferent model}
I -->|switchFreebuffModel| J[Persist to disk\nRe-POST to server]
J --> C
D --> K{Admission tick\nadmits user}
K -->|active| E
E --> L{User types /queue}
L --> M[DELETE session\nReset chat history\nRe-POST]
M --> D
E --> N{checkSessionAdmissible\non each chat request}
N -->|session_model_mismatch| O[409 to chat endpoint]
N -->|ok| P[Chat request proceeds]
|
| const modelLockId = FREEBUFF_ADMISSION_LOCK_ID + hashStringToInt32(model) | ||
| const lockResult = await tx.execute<{ acquired: unknown }>( |
There was a problem hiding this comment.
Advisory lock ID overflow risk
modelLockId is computed as:
const modelLockId = FREEBUFF_ADMISSION_LOCK_ID + hashStringToInt32(model)hashStringToInt32 can return values up to 0x3FFFFFFF (~1.07 billion). If FREEBUFF_ADMISSION_LOCK_ID is itself close to 2^31 - 1 (2,147,483,647, the max signed int4), the sum could overflow that bound. While pg_try_advisory_xact_lock accepts a bigint, the value is produced by JavaScript number addition — if the result exceeds Number.MAX_SAFE_INTEGER (unlikely here, but possible with very large base IDs), it could lose precision.
Consider capping or masking the result to keep it within a known safe range:
const modelLockId = (FREEBUFF_ADMISSION_LOCK_ID + hashStringToInt32(model)) | 0This forces a signed 32-bit result (via bitwise OR), matching PostgreSQL's int4 domain for the two-argument form, and avoids the risk entirely.
This is a low-probability issue with the current base ID (likely small), but worth hardening before additional models are added.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…_locked - Add end-session to FREEBUFF_ONLY_COMMANDS so non-freebuff users can't invoke it (would have shown a confusing "returning to waiting room" message with no underlying state to act on). - When the waiting room receives model_locked from a switch attempt that raced with admission, silently revert the local model selection to the active session's model and re-tick. Previously polling halted and the screen had no render branch, leaving the UI blank. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The admission tick now iterates per registered model, so tests that asserted admitted: 1 received 2 (one per model). Default the test helper to a single model so the existing assertions stay crisp and won't drift as more production models are added. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Drop dead x-freebuff-model header on GET — the server only reads it on POST, and tick() always POSTs first so GET-before-POST never happens. - Derive FREEBUFF_MODEL_OVERRIDABLE_AGENT_IDS from the server's FREE_MODE_AGENT_MODELS (agents whose allowlist includes every freebuff model) so adding a new model doesn't require updating two lists. - Extract shouldReleaseSlot() — DELETE-eligibility predicate was inlined in two places. - Probe Fireworks once per admission tick instead of N times (N = number of models). Adds a TODO for when we add a non-Fireworks model. - Tighten model-selector key handler to /^[1-9]$/ so "1abc" isn't treated as 1. - Make FREEBUFF_MODELS a literal tuple so isFreebuffModelId narrows to the actual id union instead of plain string. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the fleet-wide worst-of collapse with a per-model map. One probe per tick still covers every deployment (Fireworks returns them in a single response), but each model's admission now uses its own deployment's verdict — a degraded minimax no longer blocks glm. Models absent from FIREWORKS_DEPLOYMENT_MAP (serverless) default to 'healthy'; TODO for when they move to dedicated deployments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Update switchFreebuffModel docstring to match the silent auto-revert the tick loop actually does (not the prompt-to-end-first flow it used to describe). - Remove the unused `disabled` prop on FreebuffModelSelector; the only caller never set it, so the component's internal `pending` state is the only disable path. - Refresh docs/freebuff-waiting-room.md for per-model queues: mermaid, schema (with the `model` column from migration 0044), admission loop (per-model advisory locks, getFleetHealth), tunables, POST semantics, model_locked response shape, and the CLI / multi-pod sections. - Group endAndRejoinFreebuffSession with the other ../hooks/* imports in command-registry.ts.
Prefixes the idle session readout with the model's display name (e.g. "GLM 5.1 · Free session · 48m left") so admitted users always see which model is answering, without spending any vertical space. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Server returns a queueDepthByModel snapshot with every queued session response via a single GROUP BY read, so each row of the model selector can show a live "3 ahead" / "No wait" hint instead of the static tagline. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
GLM 5.1 → "Smartest", MiniMax M2.7 → "Fastest". Single-word labels read better next to the live "N ahead" hint than a full sentence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Summary
model_locked); new/queuecommand ends your session and rejoins so you can switchImplementation notes
modelcolumn to free-session row + per-model advisory locks for queue admissionjoinOrTakeOverswitches model on a queued row (resetsqueued_at), throwsFreeSessionModelLockedErrorif mid-session model differsuseFreebuffModelStore(zustand, persisted),FreebuffModelSelectorcomponent in waiting room,/queueslash command (aliases:rejoin,switch)loadAgentDefinitionsoverridesmodelonbase2-free,editor-lite,code-reviewer-litefrom the selected model whenIS_FREEBUFFTest plan
bun --filter='*' run typecheck(passes locally)bun --filter='*' testforwebandcli/queue— verify session ends, returns to waiting room0044_violet_stingray.sqlin staging🤖 Generated with Claude Code