simulstream.server.speech_processors

Functions

build_speech_processor(speech_processor_config)

Instantiate a SpeechProcessor subclass based on configuration.

class_load(class_string)

speech_processor_class_load(...)

Import the speech processor class from its string definition.

Classes

SpeechProcessor(config)

Abstract base class for speech processors.

class simulstream.server.speech_processors.SpeechProcessor(config: SimpleNamespace)

Abstract base class for speech processors.

Subclasses must implement methods to load models, process audio chunks, set source/target languages, and clear internal states.

abstractmethod clear() None

Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

abstractmethod end_of_stream() IncrementalOutput

This method is called at the end of audio chunk processing. It can be used to emit hypotheses at the end of the speech to conclude the output.

Returns:

The incremental output (new and deleted tokens/strings).

Return type:

IncrementalOutput

abstractmethod classmethod load_model(config: SimpleNamespace)

Load and initialize the underlying speech model.

Parameters:

config (SimpleNamespace) – Configuration of the speech processor.

abstractmethod process_chunk(waveform: float32) IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:

waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.

Returns:

The incremental output (new and deleted tokens/strings).

Return type:

IncrementalOutput

abstractmethod set_source_language(language: str) None

Set the source language for the speech processor.

Parameters:

language (str) – Language code (e.g., "en", "it").

abstractmethod set_target_language(language: str) None

Set the target language for the speech processor (for translation).

Parameters:

language (str) – Language code (e.g., "en", "it").

property speech_chunk_size: float

Return the size of the speech chunks to be processed (in seconds).

abstractmethod tokens_to_string(tokens: List[str]) str

Converts token sequences into human-readable strings.

Returns:

The textual representation of the tokens.

Return type:

str

simulstream.server.speech_processors.build_speech_processor(speech_processor_config: SimpleNamespace) SpeechProcessor

Instantiate a SpeechProcessor subclass based on configuration.

The configuration should specify the fully-qualified class name in the type field (e.g. "simulstream.server.speech_processors.MyProcessor").

Parameters:

speech_processor_config (SimpleNamespace) – Configuration for the speech processor.

Returns:

An instance of the configured speech processor.

Return type:

SpeechProcessor

Raises:

AssertionError – If the specified class is not a subclass of SpeechProcessor.

simulstream.server.speech_processors.speech_processor_class_load(speech_processor_class_string: str) type[SpeechProcessor]

Import the speech processor class from its string definition.

Parameters:

speech_processor_class_string (str) – Full name of the speech processor class to load.

Returns:

A class object for the speech processor class.

Return type:

SpeechProcessorClass

Raises:

AssertionError – If the specified class is not a subclass of SpeechProcessor.