simulstream.server.speech_processors.base
Classes
|
A partial implementation of |
- class simulstream.server.speech_processors.base.BaseSpeechProcessor(config: SimpleNamespace)
A partial implementation of
SpeechProcessorthat provides common logic for handling incremental speech-to-text processing.This class defines the high-level workflow of processing an incoming audio chunk (preprocessing, generation, updating history, building outputs), while leaving the model-specific details to subclasses.
Subclasses must implement the abstract helper methods to define how audio is preprocessed, tokens are generated, and histories are updated.
- clear() None
Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.
- process_chunk(waveform: float32) IncrementalOutput
Process a chunk of waveform and produce incremental output.
- Parameters:
waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range
[-1.0, 1.0]sampled atsimulstream.server.speech_processors.SAMPLE_RATE.- Returns:
The incremental output (new and deleted tokens/strings).
- Return type: