Notes
Measuring transcription latency – for each correctly transcribed word, measure the time from when the word ends in the audio stream (x) to when that same word first appears in the partial transcript received by the client (y). Latency = y − x. Only words that are transcribed correctly are included in the calculation.The latency is measured using audio files balanced across:Clean speech – LibriSpeech excerptsReal-world calls – internal benchmark recordings (retail support, drive-thru, B2B triage)Stress clips – crafted edge-cases (rapid speaker turns, burst noise, long silences)This mix captures everyday usage and the extreme scenarios that typically break streaming systems.
Measuring transcription accuracy - We use word error rate (WER) for accuracy measurement. For comprehensive evaluation, we use datasets totaling more than 205 hours of audio. These datasets cover various domains, including meetings, broadcasts, and call centers, as well as a wide range of English accents. They also encompass various audio conditions, in terms of duration, signal-to-noise ratios, speech-to-silence ratios, and other factors.
'Speech Signal Processing > Basic' 카테고리의 다른 글
UTF-8, Byte-level BPE (BBPE) (10) | 2024.10.09 |
---|---|
Public Speech Datasets for ASR (0) | 2023.11.18 |
Public Speech Datasets for ASR (details) (0) | 2023.11.18 |
16 Bit, 16kHz wav 데이터 사이즈 계산 (Calculation of 16 Bit, 16kHz wave data size) (0) | 2021.05.14 |
[기본] 음성 신호 처리 - 시간영역/주파수영역 분석 (0) | 2020.09.18 |