simulstream.server.speech_processors.base

Classes

BaseSpeechProcessor(config)

A partial implementation of SpeechProcessor that provides common logic for handling incremental speech-to-text processing.

class simulstream.server.speech_processors.base.BaseSpeechProcessor(config: SimpleNamespace)

A partial implementation of SpeechProcessor that provides common logic for handling incremental speech-to-text processing.

This class defines the high-level workflow of processing an incoming audio chunk (preprocessing, generation, updating history, building outputs), while leaving the model-specific details to subclasses.

Subclasses must implement the abstract helper methods to define how audio is preprocessed, tokens are generated, and histories are updated.

clear() None

Clear internal states, such as history of cached audio and/or tokens, in preparation for a new stream or conversation.

process_chunk(waveform: float32) IncrementalOutput

Process a chunk of waveform and produce incremental output.

Parameters:

waveform (np.float32) – A 1D NumPy array of the audio chunk. The array is PCM audio normalized to the range [-1.0, 1.0] sampled at simulstream.server.speech_processors.SAMPLE_RATE.

Returns:

The incremental output (new and deleted tokens/strings).

Return type:

IncrementalOutput