Language Modeling
Technical Program Session SPE-4
CAPITALIZATION NORMALIZATION FOR LANGUAGE MODELING WITH AN ACCURATE AND EFFICIENT HIERARCHICAL RNN MODEL
Google Research
Problem
Capitalization normalization (truecasing) is the task of restoring the correct case (uppercase or lowercase) of noisy text.
Proposed method
A fast, accurate and compact two-level hierarchical word-and-character-based RNN
Used the truecaser to normalize user-generated text in a Federated Learning framework for language modeling.
Key Findings
In a real user A/B experiment, authors demonstrated that the improvement translates to reduced prediction error rates in a virtual keyboard application.
NEURAL-FST CLASS LANGUAGE MODEL FOR END-TO-END SPEECH RECOGNITION
Facebook AI, USA
Proposed method
Neural-FST Class Language Model (NFCLM) for endto-end speech recognition
a novel method that combines neural network language models (NNLMs) and finite state transducers (FSTs) in a mathematically consistent framework
Key Findings
NFCLM significantly outperforms NNLM by 15.8% relative in terms of WER.
NFCLM achieves similar performance as traditional NNLM and FST shallow fusion while being less prone to overbiasing and 12 times more compact, making it more suitable for on-device usage.
ENHANCE RNNLMS WITH HIERARCHICAL MULTI-TASK LEARNING FOR ASR
University of Missouri, USA
Proposed method
Key Findings
RESCOREBERT: DISCRIMINATIVE SPEECH RECOGNITION RESCORING WITH BERT
1Amazon Alexa AI, USA 2Emory University, USA
Problem
Second-pass rescoring improves the outputs from a first-pass decoder by implementing a lattice rescoring or n-best re-ranking.
Proposed method (RescoreBERT)
Authors showed how to train a BERT-based rescoring model with minimum WER (MWER) loss, to incorporate the improvements of a discriminative loss into fine-tuning of deep bidirectional pretrained models for ASR.
Authors proposed a fusion strategy that incorporates the MLM into the discriminative training process to effectively distill knowledge from a pretrained model. We further propose an alternative discriminative loss.
Key Findings
Reduced WER by 6.6%/3.4% relative on the LibriSpeech clean/other test sets over a BERT baseline without discriminative objective
Found that it reduces both latency and WER (by 3 to 8% relative) over an LSTM rescoring model.
Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages
Cognitive Systems Lab, University Bremen, Germany
Problem
Dealing with Out Of Vocabulary (OOV) words or unseen words
For morphologically rich languages having high type token ratio, the OOV percentage is also quite high.
Sub-word segmentation has been found to be one of the major approaches in dealing with OOVs.
Proposed method
This paper presents a hybrid sub-word segmentation algorithm to deal with OOVs.
A sub-word segmentation evaluation methodology is also presented.
All the experiments are done for conversational code-switched Malayalam-English corpus.