MLE-5017: docs: add VAD/turn_detection params to realtime transcription endpoint#243
Conversation
Document the Voice Activity Detection configuration for the /realtime WebSocket endpoint: - Add transcription_session.updated client event with turn_detection schema - Document all 5 client-settable VAD parameters with production defaults - Document how to disable VAD (turn_detection: null or query param none) - Document query parameter configuration at connection time - Document VAD on/off behavior (auto completed events vs manual commit) - Add transcription_session.updated server confirmation event MLE-5017 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✱ Stainless preview buildsThis PR will update the go openapi python terraform typescript
|
Summary
Documents the Voice Activity Detection (VAD) configuration for the
/realtimeWebSocket transcription endpoint. This was previously completely undocumented despite being actively used by customers.What's added
New client event:
transcription_session.updated— configure VAD parameters or disable VAD entirelyVAD parameters table (5 client-settable params with production defaults):
threshold(default 0.3)min_silence_duration_ms(default 500)min_speech_duration_ms(default 250)max_speech_duration_s(default 5.0)speech_pad_ms(default 250)VAD disable/enable:
turn_detection: nullin session message to disableturn_detection=nonequery parameter to disable at connection timecompletedevents; VAD off = manualcommitrequiredNew server event:
transcription_session.updated— confirms VAD config was appliedQuery parameter support:
Defaults verified against
Production defaults from
inference-pop/src/v1/realtime.ts(lines 403-414), NOT the vad-service Python fallbacks which differ.🤖 Generated with Claude Code