Settings

Several aspects of the Whisper pipeline can be customized from the settings on the subsystem.

Setting	Default	Description
SmartTurnEnabled	false	Can be set to true to turn on endpoint detection. This will attempt to detect if the user is done speaking during realtime audio processing and will wait for the speaker to finish their phrase.
NoiseSuppressionEnabled	false	Isolate the user's voice from background noise if set to true. This can improve accuracy for voice encoding when using in noisy environments, but may decrease transcription quality especially with smaller models.
AttenuationLimit	0	The intensity of noise suppression. Higher is more aggressive with 0 being the maximum.
VADThreshold	0.5	The threshold for detecting voice activity. The default of 0.5 will be sufficient for the majority of use cases.
EndpointThreshold	0.5	The threshold for detecting end of speech when using Smart Turn.
SegmentCheckSize	60	The number of samples to check for an interrupting voice comparison. Can be set lower for more responsive interrupts, but may be less accurate.
InterruptThreshold	0.75	The similarity threshold for interrupting voice comparison.
InterruptDelay	1	The time in seconds to delay after an interrupt is detected before processing audio again. This prevents potential spill over audio.
StopChunks	24	The number of chunks of audio to check for voice activity. Can be set lower for shorter delay, but may cut off speech early.
SimilarityThreshold	0.7	Threshold used for voice encoding similarity checks.