simulstream.server.speech_processors.base

Classes

BaseSpeechProcessor(config)

A partial implementation of SpeechProcessor that provides common logic for handling incremental speech-to-text processing.

class simulstream.server.speech_processors.base.BaseSpeechProcessor(config: SimpleNamespace)

A partial implementation of SpeechProcessor that provides common logic for handling incremental speech-to-text processing.

This class defines the high-level workflow of processing an incoming audio chunk (preprocessing, generation, updating history, building outputs), while leaving the model-specific details to subclasses.

Subclasses must implement the abstract helper methods to define how audio is preprocessed, tokens are generated, and histories are updated.

clear() → None: Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

process_chunk(waveform: float32) → IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:: waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.
Returns:: The incremental output (new and deleted tokens/strings).
Return type:: IncrementalOutput