simulstream.server.speech_processors.base.BaseSpeechProcessor
- class simulstream.server.speech_processors.base.BaseSpeechProcessor(config: SimpleNamespace)
Bases:
SpeechProcessorA partial implementation of
SpeechProcessorthat provides common logic for handling incremental speech-to-text processing.This class defines the high-level workflow of processing an incoming audio chunk (preprocessing, generation, updating history, building outputs), while leaving the model-specific details to subclasses.
Subclasses must implement the abstract helper methods to define how audio is preprocessed, tokens are generated, and histories are updated.
- __init__(config: SimpleNamespace)
Initialize the speech processor with a given configuration.
- Parameters:
config (SimpleNamespace) – Configuration loaded from a YAML file.
Methods
__init__(config)Initialize the speech processor with a given configuration.
clear()Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
This method is called at the end of audio chunk processing.
load_model(config)Load and initialize the underlying speech model.
process_chunk(waveform)Process a chunk of waveform and produce incremental output.
set_source_language(language)Set the source language for the speech processor.
set_target_language(language)Set the target language for the speech processor (for translation).
tokens_to_string(tokens)Converts token sequences into human-readable strings.
Attributes
Return the size of the speech chunks to be processed (in seconds).
- clear() None
Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
- abstractmethod end_of_stream() IncrementalOutput
This method is called at the end of audio chunk processing. It can be used to emit hypotheses at the end of the speech to conclude the output.
- Returns:
The incremental output (new and deleted tokens/strings).
- Return type:
- abstractmethod classmethod load_model(config: SimpleNamespace)
Load and initialize the underlying speech model.
- Parameters:
config (SimpleNamespace) – Configuration of the speech processor.
- process_chunk(waveform: float32) IncrementalOutput
Process a chunk of waveform and produce incremental output.
- Parameters:
waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range
[-1.0, 1.0]sampled atsimulstream.server.speech_processors.SAMPLE_RATE.- Returns:
The incremental output (new and deleted tokens/strings).
- Return type:
- abstractmethod set_source_language(language: str) None
Set the source language for the speech processor.
- Parameters:
language (str) – Language code (e.g.,
"en","it").
- abstractmethod set_target_language(language: str) None
Set the target language for the speech processor (for translation).
- Parameters:
language (str) – Language code (e.g.,
"en","it").