Speech Signal Processing/Basic
Public Speech Datasets for ASR
KeepPersistStay
2023. 11. 18. 15:29
- OWSM v1, v2, and v3: Refer the paper
- OWSM v1
- AISHELL-1 [23],
- CoVoST2 [24],
- GigaSpeech [25],
- LibriSpeech [26],
- MuST-C [27],
- SPGISpeech [28]
- TEDLIUM3 [29].
- OWSM v2
- builds upon v1 and includes additional datasets:
- GigaST [30]
- Multilingual LibriSpeech [31]
- WenetSpeech [32].
- OWSM v3
- extends v2 with even more datasets:
- AIDATATANG [33],
- AMI [34],
- Babel [35],
- Common Voice [36],
- Fisher (Switchboard) [37],
- Fisher Callhome Spanish [38],
- FLEURS [39],
- Googlei18n3 ,
- KsponSpeech [40],
- MagicData [41],
- ReazonSpeech [42],
- Russian Open STT [43],
- VCTK [44],
- VoxForge [45],
- VoxPopuli [46],
- WSJ [47].
- OWSM v1
- NeMo-Public dataset
- Librispeech
- Fisher Corpus
- Switchboard-1 Dataset
- WSJ-0 and WSJ-1
- National Speech Corpus (Part 1, Part 6)
- VCTK
- VoxPopuli (EN)
- Europarl-ASR (EN)
- Multilingual Librispeech (MLS EN) - 2,000 hrs subset
- Mozilla Common Voice (v8.0)
- People's Speech - 12,000 hrs subset
- Librispeech
- SpeechStaw
- Librispeech
- Common Voice v8.0
- TED-LIUM v3
- AMI
- English Broadcast News2
- WSJ0 and WSJ1