- OWSM v1, v2, and v3: Refer the paper
- OWSM v1
- AISHELL-1 [23],
- CoVoST2 [24],
- GigaSpeech [25],
- LibriSpeech [26],
- MuST-C [27],
- SPGISpeech [28]
- TEDLIUM3 [29].
- OWSM v2
- builds upon v1 and includes additional datasets:
- GigaST [30]
- Multilingual LibriSpeech [31]
- WenetSpeech [32].
- OWSM v3
- extends v2 with even more datasets:
- AIDATATANG [33],
- AMI [34],
- Babel [35],
- Common Voice [36],
- Fisher (Switchboard) [37],
- Fisher Callhome Spanish [38],
- FLEURS [39],
- Googlei18n3 ,
- KsponSpeech [40],
- MagicData [41],
- ReazonSpeech [42],
- Russian Open STT [43],
- VCTK [44],
- VoxForge [45],
- VoxPopuli [46],
- WSJ [47].
- OWSM v1
- NeMo-Public dataset
- Librispeech
- Fisher Corpus
- Switchboard-1 Dataset
- WSJ-0 and WSJ-1
- National Speech Corpus (Part 1, Part 6)
- VCTK
- VoxPopuli (EN)
- Europarl-ASR (EN)
- Multilingual Librispeech (MLS EN) - 2,000 hrs subset
- Mozilla Common Voice (v8.0)
- People's Speech - 12,000 hrs subset
- Librispeech
- SpeechStaw
- Librispeech
- Common Voice v8.0
- TED-LIUM v3
- AMI
- English Broadcast News2
- WSJ0 and WSJ1
'Speech Signal Processing > Basic' 카테고리의 다른 글
UTF-8, Byte-level BPE (BBPE) (4) | 2024.10.09 |
---|---|
Public Speech Datasets for ASR (details) (0) | 2023.11.18 |
16 Bit, 16kHz wav 데이터 사이즈 계산 (Calculation of 16 Bit, 16kHz wave data size) (0) | 2021.05.14 |
[기본] 음성 신호 처리 - 시간영역/주파수영역 분석 (0) | 2020.09.18 |
16비트 고정소수점, 32비트 부동소수점 WAV 파일 (16-bit fixed point, 32-bit floating point WAV file basics) (0) | 2020.07.22 |