SST-00

SIGNAL PROCESSING & FEATURE ANALYSIS

IMPROVED AUDITORY MASKING MODELS
M. R. Flax, E. Ambikairajah, W. H. Holmes, J. S. Jin
PDF     CiteSeerX     Google Scholar    

Abstract  Hybrid masking models are defined and are informally judged to be improved masking models. Moore’s method for deriving spreading functions, a previously unused constituent, is closer to real world observations than its counterparts. Moore’s method lowers the computational complexity of the masking model and suggests that people are auditorily more sensitive to high frequencies than previously assumed. The hybrid models are independent of cochlea mapping functions, hence the same model may be used for assessing auditory redundancy in a variety of mammals. The two best masking models are chosen and classed as models which a] preserve auditory quality for the loss of spectral character discrimination and b] discriminate spectral character for the loss of auditory quality.


REAL-TIME NOISE CANCELLING BASED ON SPECTRAL MINIMUM DETECTION AND DIFFUSIVE GAIN FACTORS
Hyoung-Gook Kim, Klaus Obermayer, Mathias Bode, Dietmar Ruwisch
PDF     CiteSeerX     Google Scholar    

Abstract  In this paper we propose an efficient algorithm for a one channel noise reduction in audio signals. One of the main objectives is to find a balanced trade off between noise reduction and speech distortion in the processed signal. This is accomplished by a system based on spectral minimum detection and diffusive gain factors. Our approach to speech enhancement is capable of distinguishing between language and noise interference in the microphone signal, even when they are located in the same frequency band.


NONPARAMETRIC PEAK FEATURE EXTRACTION AND ITS APPLICATIONS TO SPEECH SIGNALS
HyunSoo Kim and W. Harvey Holmes
PDF     CiteSeerX     Google Scholar    

Abstract  We have developed a novel peak feature algorithm that can be used for endpoint detection, transient extraction and segmentation of speech. It uses binomial probabilities to measure peak characteristics and estimate sets of endpoints to characterize an utterance. in first tests our algorithm appears to outperform existing methods. It can also be used to improve the segmentation capabilities of existing segmentation methods, and can probably benefit most speech recognition approaches. In addition to the utility of the first order peaks, the statistics of the higher order peaks are also effective for feature extraction and segmentation. The second order peaks can also be used to devise an intelligent update procedure for the feature data windows, where the window update rate changes based on the type of speech signal present. The nonparametric peak feature algorithm is flexible, efficient and very robust in noise.


ADAPTIVE KALMAN FILTERING OF SPEECH SIGNALS, BASED ON A BLOCK MODEL IN THE STATE SPACE AND VECTOR QUANTIZATION OF AUTOREGRESSIVE FEATURES
A.A. Kovtonyuk, A.Ya. Kalyuzhny, V.Yu. Semenov
PDF     CiteSeerX     Google Scholar    

Abstract  A novel method of adaptive Kalman filtering (KF) of noisy speech is proposed. The method is based on block model of autoregressive (AR) signal in the state space (SS). It is shown that such representation allows to reduce computational expenses and to decrease filtering error as compared with known methods. Also the essentially new method of estimation of AR parameters in the presence of noise is developed. This method is based on the usage of optimal Bayesian estimation and vector quantization.


SPEECH ENHANCEMENT USING NEAR-FIELD SUPERDIRECTIVITY WITH AN ADAPTIVE SIDELOBE CANCELER AND POST-FILTER
Iain A. McGowan, Darren Moore, Sridha Sridharan
PDF     CiteSeerX     Google Scholar    

Abstract  This paper describes a new microphone array technique and investigates its effectiveness for speech enhancement. A system structure consisting of a fixed near-field superdirective beamformer and an adaptive sidelobe canceling path is proposed (NFSD-ASC). The effect of adding a post-filter is also examined. The system is evaluated in terms of speech quality measures in the context of a computer workstation in an office environment. The speaker is located directly in front of the computer monitor at a distance of 60 cm and the array is designed to fit across the top of a standard 17 inch monitor. The experiments show that the array is effective in both decreasing the noise level and the amount of signal distortion when compared with standard near—field superdirectivity and the generalised sidelobe canceler.


A NEW APPROACH IN DESIGNING AN ADAPTIVE LATTICE PREDICTOR FOR NONLINEAR AND NONSTATIONARY SPEECH SIGNALS IN ADPCM USING LYAPUNOV THEORY
Seng Kah Phooi, Zhihong Man, H. R. Wu
PDF     CiteSeerX     Google Scholar    

Abstract  In this paper, we present a computationally efficient adaptive lattice-ladder predictor for adaptive prediction of nonstationary speech signals in ADPCM. The important advantage of the proposed predictor is capable of adaptive predicting the signal and its algorithm does not require a priori knowledge of time dependent among the input data. The lattice reflection coefficients and the ladder weights are adaptively adjusted by algorithms that are designed using Lyapunov theory. The proposed scheme possesses distinct advantages of stability and speed of convergence over linear adaptive LMS or RLS lattice predictors in ADPCM. The theoretical derivation of the lattice predictor is further supported by simulation examples for speech signals.


VOWEL IDENTIFICATION IN SINGING AT HIGH PITCH
CW Thorpe and CI Watson
PDF     CiteSeerX     Google Scholar    

Abstract  We present a new analysis method that represents the vowel space directly by a factorial analysis of the harmonic amplitudes, without requiring explicit identification of formant frequencies. Analyses of vowels sung by male and female singers across their pitch ranges are performed with this method and also by LP formant extraction. The results indicate that even at high pitch, vowels are well separated with our new method, even though the LP analysis produces clusters of formants locked onto harmonic frequencies. This result suggests that vowel identity at high pitch may be conveyed largely by the magnitudes of individual harmonics, and that some of the observations of "vowel modification" and "convergence" in acoustic analyses of high pitch vowels may be artefacts of formant analysis.