simulstream.server.speech_processors.sliding_window_retranslation.SlidingWindowRetranslator

class simulstream.server.speech_processors.sliding_window_retranslation.SlidingWindowRetranslator(config: SimpleNamespace)

Bases: BaseSpeechProcessor

A speech processor that applies a fixed-length sliding window retranslation with deduplication to mitigate overlapping outputs when processing unsegmented audio streams.

This class implements the algorithm introduced in:

S. Sen, et al. 2025. “Simultaneous Translation for Unsegmented Input: A Sliding Window Approach” (https://arxiv.org/pdf/2210.09754)

The approach relies on detecting the longest common subsequence between the current window and the previous one, in order to prevent repeating tokens caused by overlapping audio windows.

Parameters:

config (SimpleNamespace) –

Configuration object. The following attributes are expected:

window_len (int): Length of the sliding window (in seconds).
matching_threshold (float, optional): Minimum fraction of the current tokens that must match the previous history to be considered aligned. Default = 0.1.
override_on_failed_match (bool, optional): If True, the previous history is deleted from the output when no sufficient match is found. Otherwise, previous history is kept and the new output is appended to the end of the previous history. Default = False.
max_tokens_per_second (int, optional): Maximum output tokens allowed per second of audio. Default = 10.

__init__(config: SimpleNamespace)

Initialize the speech processor with a given configuration.

Parameters:: config (SimpleNamespace) – Configuration loaded from a YAML file.

Methods

`__init__`(config)	Initialize the speech processor with a given configuration.
`clear`()	Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
`end_of_stream`()	This method is called at the end of audio chunk processing.
`load_model`(config)	Load and initialize the underlying speech model.
`process_chunk`(waveform)	Process a chunk of waveform and produce incremental output.
`set_source_language`(language)	Set the source language for the speech processor.
`set_target_language`(language)	Set the target language for the speech processor (for translation).
`tokens_to_string`(tokens)	Converts token sequences into human-readable strings.

Attributes

speech_chunk_size

Return the size of the speech chunks to be processed (in seconds).

clear() → None: Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

end_of_stream() → IncrementalOutput

This method is called at the end of audio chunk processing. It can be used to emit hypotheses at the end of the speech to conclude the output.

Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput

abstractmethod classmethod load_model(config: SimpleNamespace)

Load and initialize the underlying speech model.

Parameters:: config (SimpleNamespace) – Configuration of the speech processor.

process_chunk(waveform: float32) → IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:: waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.
Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput

abstractmethod set_source_language(language: str) → None

Set the source language for the speech processor.

Parameters:: language (str) – Language code (e.g., "en", "it").

abstractmethod set_target_language(language: str) → None

Set the target language for the speech processor (for translation).

Parameters:: language (str) – Language code (e.g., "en", "it").

property speech_chunk_size: float: Return the size of the speech chunks to be processed (in seconds).

abstractmethod tokens_to_string(tokens: List[str]) → str

Converts token sequences into human-readable strings.

Returns:: The textual representation of the tokens.
Return type:: str