simulstream.server.speech_processors.base.BaseSpeechProcessor

class simulstream.server.speech_processors.base.BaseSpeechProcessor(config: SimpleNamespace)

Bases: SpeechProcessor

A partial implementation of SpeechProcessor that provides common logic for handling incremental speech-to-text processing.

This class defines the high-level workflow of processing an incoming audio chunk (preprocessing, generation, updating history, building outputs), while leaving the model-specific details to subclasses.

Subclasses must implement the abstract helper methods to define how audio is preprocessed, tokens are generated, and histories are updated.

__init__(config: SimpleNamespace)

Initialize the speech processor with a given configuration.

Parameters:: config (SimpleNamespace) – Configuration loaded from a YAML file.

Methods

`__init__`(config)	Initialize the speech processor with a given configuration.
`clear`()	Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
`end_of_stream`()	This method is called at the end of audio chunk processing.
`load_model`(config)	Load and initialize the underlying speech model.
`process_chunk`(waveform)	Process a chunk of waveform and produce incremental output.
`set_source_language`(language)	Set the source language for the speech processor.
`set_target_language`(language)	Set the target language for the speech processor (for translation).
`tokens_to_string`(tokens)	Converts token sequences into human-readable strings.

Attributes

speech_chunk_size

Return the size of the speech chunks to be processed (in seconds).

clear() → None: Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

abstractmethod end_of_stream() → IncrementalOutput

This method is called at the end of audio chunk processing. It can be used to emit hypotheses at the end of the speech to conclude the output.

Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput

abstractmethod classmethod load_model(config: SimpleNamespace)

Load and initialize the underlying speech model.

Parameters:: config (SimpleNamespace) – Configuration of the speech processor.

process_chunk(waveform: float32) → IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:: waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.
Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput

abstractmethod set_source_language(language: str) → None

Set the source language for the speech processor.

Parameters:: language (str) – Language code (e.g., "en", "it").

abstractmethod set_target_language(language: str) → None

Set the target language for the speech processor (for translation).

Parameters:: language (str) – Language code (e.g., "en", "it").

property speech_chunk_size: float: Return the size of the speech chunks to be processed (in seconds).

abstractmethod tokens_to_string(tokens: List[str]) → str

Converts token sequences into human-readable strings.

Returns:: The textual representation of the tokens.
Return type:: str