Welcome to simulstream documentation

simulstream is a Python library for simultaneous/streaming speech recognition and translation. It enables both the simulation with existing files to score systems, like in the SimulEval project, and the possibility to run demos on a browser.

simulstream provides a WebSocket server and utilities for running streaming speech processing experiments and demos. It supports real-time transcription and translation through streaming audio input. By streaming, we mean that the library by default assumes that the input is an unbounded speech signal, rather than many short speech segments as in simultaneous speech processing. The simultaneous setting can be easily addressed by pre-segmenting the audio into many small segments and feed each segment to simulstream.

The repository is tested using Python 3.11. Although it may work also with other Python versions, wedo not ensure compatibility with them. Check out the Usage section for instructions on how to use the repository and the Installation section for further information about how to install the project.

Python API Documentation

Here is the list of the modules currently part of the repository with the corresponding documentation:

Credits

If you use this library, please cite:

@article{gaido-et-al-2025-simulstream,
  title={{simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems}},
  author={Gaido, Marco and Papi, Sara and Cettolo, Mauro and Negri, Matteo and Bentivogli, Luisa},
  journal = "arXiv",
  year={2025}
}