simulstream.metrics.scorers.latency.mwersegmenter

Classes

`MWERSegmenterBasedLatencyScorer`(args)	Abstract base class for scorers that require aligned system outputs and references through MWER Segmenter alignment.
`ResegmentedLatencyScoringSample`(audio_name, ...)	A sample containing realigned hypotheses and references.

class simulstream.metrics.scorers.latency.mwersegmenter.MWERSegmenterBasedLatencyScorer(args)

Abstract base class for scorers that require aligned system outputs and references through MWER Segmenter alignment.

This class wraps a latency scorer and applies the MWER Segmenter alignment by “Effects of automatic alignment on speech translation metrics” to hypotheses before scoring.

Subclasses must implement _do_score(), which operates on ResegmentedLatencyScoringSample instances where hypotheses and references are aligned.

Example

>>> class CustomLatencyScorer(MWERSegmenterBasedLatencyScorer):
...     def _do_score(self, samples):
...         # Compute a custom latency score
...         return LatencyScores(...)

class simulstream.metrics.scorers.latency.mwersegmenter.ResegmentedLatencyScoringSample(audio_name: str, hypothesis: List[OutputWithDelays], reference: List[ReferenceSentenceDefinition])

A sample containing realigned hypotheses and references.

audio_name

The identifier of the audio file.

Type:: str

hypothesis

Hypothesis lines after realignment.

Type:: List[str]

reference

Reference lines aligned to the hypothesis.

Type:: List[str]