simulstream.metrics.score_quality

Functions

`cli_main`()	Quality scoring script for Simulstream evaluation.
`main`(scorer_cls, args)	Main entry point for quality scoring.

simulstream.metrics.score_quality.cli_main()

Quality scoring script for Simulstream evaluation.

This module provides functionality to compute quality-based evaluation metrics on system outputs stored in JSONL log files. It uses pluggable scorers from the simulstream.metrics.scorers.quality registry and compares system outputs against references and/or transcripts.

It supports: - Reference-based metrics (e.g., BLEU, COMET). - Source-based metrics (e.g., reference-free COMET). - Hybrid setups when both references and transcripts are available.

The script can be invoked as a standalone CLI:

$ python -m simulstream.metrics.score_quality
–eval-config config/speech-processor.yaml –log-file metrics.jsonl –references ref.en –transcripts src.it –audio-definition audio_def.yaml –scorer sacrebleu

Otherwise, the script can be invoked without specifying the –audio-definition, but in this case the name of the refererence and transcript files (trimmed of the extension) must be the same of the audio files used (i.e. the names present in metrics.jsonl), e.g.:

$ python -m simulstream.metrics.score_quality
–eval-config config/speech-processor.yaml –log-file metrics.jsonl –references 1.en,2.en –transcripts 1.it,2.it –scorer sacrebleu

simulstream.metrics.score_quality.main(scorer_cls: type[QualityScorer], args: Namespace)

Main entry point for quality scoring.

This function loads the evaluation configuration, system hypotheses, and reference/transcript data (if required), then constructs scoring samples and computes the final quality score using the selected scorer.

The output is printed on standard output.

Parameters:

scorer_cls (type[QualityScorer]) – Class implementing the quality metric.
args (argparse.Namespace) – Parsed command-line arguments.