Skip to main content

Settings

Several aspects of the Whisper pipeline can be customized from the settings on the subsystem.

SettingDefaultDescription
SmartTurnEnabledfalseCan be set to true to turn on endpoint detection. This will attempt to detect if the user is done speaking during realtime audio processing and will wait for the speaker to finish their phrase.
NoiseSuppressionEnabledfalseIsolate the user's voice from background noise if set to true. This can improve accuracy for voice encoding when using in noisy environments, but may decrease transcription quality especially with smaller models.
AttenuationLimit0The intensity of noise suppression. Higher is more aggressive with 0 being the maximum.
VADThreshold0.5The threshold for detecting voice activity. The default of 0.5 will be sufficient for the majority of use cases.
EndpointThreshold0.5The threshold for detecting end of speech when using Smart Turn.
SegmentCheckSize60The number of samples to check for an interrupting voice comparison. Can be set lower for more responsive interrupts, but may be less accurate.
InterruptThreshold0.75The similarity threshold for interrupting voice comparison.
InterruptDelay1The time in seconds to delay after an interrupt is detected before processing audio again. This prevents potential spill over audio.
StopChunks24The number of chunks of audio to check for voice activity. Can be set lower for shorter delay, but may cut off speech early.
SimilarityThreshold0.7Threshold used for voice encoding similarity checks.