Speech Signal Processing/Basic

Public Speech Datasets for ASR

KeepPersistStay 2023. 11. 18. 15:29
  1. OWSM v1, v2, and v3: Refer the paper
    • OWSM v1
      • AISHELL-1 [23],
      • CoVoST2 [24],
      • GigaSpeech [25],
      • LibriSpeech [26],
      • MuST-C [27],
      • SPGISpeech [28]
      • TEDLIUM3 [29].
    • OWSM v2
      • builds upon v1 and includes additional datasets:
      • GigaST [30]
      • Multilingual LibriSpeech [31]
      • WenetSpeech [32].
    • OWSM v3
      • extends v2 with even more datasets:
      • AIDATATANG [33],
      • AMI [34],
      • Babel [35],
      • Common Voice [36],
      • Fisher (Switchboard) [37],
      • Fisher Callhome Spanish [38],
      • FLEURS [39],
      • Googlei18n3 ,
      • KsponSpeech [40],
      • MagicData [41],
      • ReazonSpeech [42],
      • Russian Open STT [43],
      • VCTK [44],
      • VoxForge [45],
      • VoxPopuli [46],
      • WSJ [47].
  2. NeMo-Public dataset
    • Librispeech
    • Fisher Corpus
    • Switchboard-1 Dataset
    • WSJ-0 and WSJ-1
    • National Speech Corpus (Part 1, Part 6)
    • VCTK
    • VoxPopuli (EN)
    • Europarl-ASR (EN)
    • Multilingual Librispeech (MLS EN) - 2,000 hrs subset
    • Mozilla Common Voice (v8.0)
    • People's Speech - 12,000 hrs subset
  3. SpeechStaw
    • Librispeech
    • Common Voice v8.0
    • TED-LIUM v3
    • AMI
    • English Broadcast News2
    • WSJ0 and WSJ1