'음' 태그의 글 목록

음

[speech recognition] Audio augmentation 2020.06.13

[speech recognition] Audio augmentation

2020. 6. 13. 18:28

1) VTLP based data augmentation

- vocal tract length perturbation (VTLP) [3], has shown gains on the TIMIT phoneme recognition task. VTLP was further extended to large vocabulary continuous speech recognition (LVCSR) in [4]. In [3] the VTLP warping factors for each utterance is randomly chosen from a range (e.g. [0.9, 1.1]). Using these sampled warping factors, improvement was reported on TIMIT phoneme recognition task. In [4], VTLP was used in large vocabulary continuous speech recognition (LVCSR) tasks, and an observation was made that selecting VTLP warping factors from a limited set of perturbation factors, was better.

2) Tempo perturbation based data augmentaion

- Speech rate perturbation, where the speech rate of the audio was modified by randomly selected factor, was investigated in [6]. In speech rate modification, the tempo of the signal is modified while ensuring that the pitch and spectral evelope of the signal does not change. The WSOLA [16] based implementation in the tempo command of the SoX tool was used to achieve this perturbation.

3) Speed perturbation based data augmentation

- To modify the speed of a signal we just resample the signal. The speed function of Sox was used for this. Two additional copies of the original training data were created by modifying the speed to 90% and 110% of the original rate.

4) SpecAugment

**In order to implement speed perturbation, we resample the signal using the speed function of the Sox audio manipulation tool; SoX, audio manipulation tool, (accessed March 25, 2015). [Online]. Available: http://sox.sourceforge.net/

[1] T. Ko, V. Peddinti, D. Povey, and S. Khudanpur, “Audio augmentation for speech recognition.” in INTERSPEECH, 2015, pp. 3586–3589.

'Speech Signal Processing > Speech Recognition' 카테고리의 다른 글

[Kaldi Decoding] Finite State Transducer algorithms (FST) (0)	2020.06.18
[Acoustic Model] Feedforward Sequential Memory Networks (FSMN) (0)	2020.06.15
[E2E ASR] RNN-Transducer for ASR (0)	2020.06.13
[E2E ASR] Improved RNN-T Beam search decoding (Facebook) (0)	2020.06.13
[E2E ASR] RNN-T Beam search decoding (0)	2020.06.13

PREV 1 NEXT

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Notes

음

[speech recognition] Audio augmentation

'Speech Signal Processing > Speech Recognition' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역