simulstream.server.speech_processors.SpeechProcessor

class simulstream.server.speech_processors.SpeechProcessor(config: SimpleNamespace)

Bases: ABC

Abstract base class for speech processors.

Subclasses must implement methods to load models, process audio chunks, set source/target languages, and clear internal states.

__init__(config: SimpleNamespace)

Initialize the speech processor with a given configuration.

Parameters:: config (SimpleNamespace) – Configuration loaded from a YAML file.

Methods

`__init__`(config)	Initialize the speech processor with a given configuration.
`clear`()	Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
`end_of_stream`()	This method is called at the end of audio chunk processing.
`load_model`(config)	Load and initialize the underlying speech model.
`process_chunk`(waveform)	Process a chunk of waveform and produce incremental output.
`set_source_language`(language)	Set the source language for the speech processor.
`set_target_language`(language)	Set the target language for the speech processor (for translation).
`tokens_to_string`(tokens)	Converts token sequences into human-readable strings.

Attributes

Return the size of the speech chunks to be processed (in seconds).

abstractmethod clear() → None: Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

abstractmethod end_of_stream() → IncrementalOutput

This method is called at the end of audio chunk processing. It can be used to emit hypotheses at the end of the speech to conclude the output.

abstractmethod classmethod load_model(config: SimpleNamespace)

Load and initialize the underlying speech model.

Parameters:: config (SimpleNamespace) – Configuration of the speech processor.

abstractmethod process_chunk(waveform: float32) → IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:: waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.
Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput

abstractmethod set_source_language(language: str) → None

Set the source language for the speech processor.

abstractmethod set_target_language(language: str) → None

Set the target language for the speech processor (for translation).

property speech_chunk_size: float: Return the size of the speech chunks to be processed (in seconds).

abstractmethod tokens_to_string(tokens: List[str]) → str

Converts token sequences into human-readable strings.