simulstream.metrics.score_quality.cli_main

simulstream.metrics.score_quality.cli_main()

Quality scoring script for Simulstream evaluation.

This module provides functionality to compute quality-based evaluation metrics on system outputs stored in JSONL log files. It uses pluggable scorers from the simulstream.metrics.scorers.quality registry and compares system outputs against references and/or transcripts.

It supports: - Reference-based metrics (e.g., BLEU, COMET). - Source-based metrics (e.g., reference-free COMET). - Hybrid setups when both references and transcripts are available.

The script can be invoked as a standalone CLI:

$ python -m simulstream.metrics.score_quality
–eval-config config/speech-processor.yaml –log-file metrics.jsonl –references ref.en –transcripts src.it –audio-definition audio_def.yaml –scorer sacrebleu

Otherwise, the script can be invoked without specifying the –audio-definition, but in this case the name of the refererence and transcript files (trimmed of the extension) must be the same of the audio files used (i.e. the names present in metrics.jsonl), e.g.:

$ python -m simulstream.metrics.score_quality
–eval-config config/speech-processor.yaml –log-file metrics.jsonl –references 1.en,2.en –transcripts 1.it,2.it –scorer sacrebleu