Performance Evaluation of Various Speech Enhancement Techniques

Main Article Content

Prathamesh V. Phadke
V.M. Thakare, R.N. Khobragade

Abstract

Temporal dynamics and speaker characteristics are two important features of speech that distinguish speech from noise. In this paper, aim is to propose a method to maximally extract these two features of speech for speech enhancement. This can reduce the requirement for prior information about the noise, which can be difficult to estimate for fast-varying noise. Given noisy speech, the new approach estimates clean speech by recognizing long segments of the clean speech as whole units. In the speech recognition, clean speech sentences, taken from a speech corpus, are used as examples. Matching segments are identified between the noisy sentence and the corpus sentences. The a priori signal-to-noise ratio (SNR) plays an important role in many speech enhancement algorithms. It may be used with a wide range of speech enhancement techniques, such as, e.g., the minimum mean square error (MMSE) (log) spectral amplitude estimator, the super Gaussian joint maximum a posteriori (JMAP) estimator, or theWiener filter. Also, Discrete cosine transform (DCT) has been proven to be a good approximation to the Karhunen–Loeve Transform (KLT) and has similar properties to the discrete Fourier transform (DFT). This Paper suggests a better energy compaction capability which is advantageous for speech enhancement.

Downloads

Download data is not yet available.

Article Details

Section
Articles