Welcome to simulstream documentation
simulstream is a Python library for simultaneous/streaming speech recognition and translation.
It enables both the simulation with existing files to score systems, like in the SimulEval
project, and the possibility to run demos on a browser.
simulstream provides a WebSocket server and utilities for running streaming speech processing
experiments and demos. It supports real-time transcription and translation through streaming audio
input. By streaming, we mean that the library by default assumes that the input is an unbounded
speech signal, rather than many short speech segments as in simultaneous speech processing.
The simultaneous setting can be easily addressed by pre-segmenting the audio into many small
segments and feed each segment to simulstream.
The repository is tested using Python 3.11. Although it may work also with other Python versions, wedo not ensure compatibility with them. Check out the Usage section for instructions on how to use the repository and the Installation section for further information about how to install the project.
Python API Documentation
Here is the list of the modules currently part of the repository with the corresponding documentation:
Credits
If you use this library, please cite:
@article{gaido-et-al-2025-simulstream,
title={{simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems}},
author={Gaido, Marco and Papi, Sara and Cettolo, Mauro and Negri, Matteo and Bentivogli, Luisa},
journal = "arXiv",
year={2025}
}