Enhancement of adaptive de-correlation filtering separation model for robust speech recognition
Metadata[+] Show full item record
The development of automatic speech recognition (ASR) technology has enabled an increasing number of applications. However, the robustness of ASR under real acoustic environments still remains to be a challenge for practical applications. Interfering speech and background noise have severe degrading effects on ASR. Speech source separation separates target speech from interfering speech but its performance is affected by adverse environmental conditions of acoustical reverberation and background noise. This dissertation works on the enhancement of a speech source separation technique, namely adaptive decorrelation filtering (ADF), for robust ASR applications. To overcome these difficulties and develop practical ADF speech separation algorithms for robust ASR, improvements are introduced in several aspects. From the perspectives of speech spectral characteristics, prewhitening procedures are applied to flatten the long-term speech spectrum to improve adaptation robustness and decrease ADF estimation error. To speedup convergence rate, block-iterative implementation and variable step-size (VSS) methods are proposed. To exploit scenarios where multiple pairs of sensors are available, multi-ADF postprocessing is developed. To overcome the limitations of ADF separation model under background noise, procedures of noise-compensation (NC) and adaptive speech enhancement are proposed for the achievement of improved robustness in diffuse noise. Speech separation simulations and speech recognition experiments are carried out based on TIMIT database and ATR acoustic measurement database. Evaluations of the methods presented in this dissertation demonstrate significant improvement of performances over baseline ADF algorithm in speech separation and recognition.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.