초록
This dissertation proposes an effective feature compensation scheme based
on the speech model for achieving robust speech recognition. RATZ (Multivariate
Gaussian Based Cepstral Normalization) Is overviewed as the representative GMM
(Gaussian Mixture Model) based feature compensation method. For
implementation in condition of small sized resource such as embedded system, some
alternative versions of RATZ are proposed. Considerable computation loads of
conventional RATZ could be significantly reduced by employing Gaussian selection
technique, The proposed algorithm is based on interpolated RATZ and it is
modified to be suitable for the frame-synched recognition system. It shows the
equivalent performance to the original isolated RATZ just with the far-lower
computational load.
Conventional RATZ requires off-line training with a noisy speech database
and is not suitable for online adaptation. In the proposed scheme, we can eliminate
the need for the noisy speech database in the off-line training by employing the
parallel model combination technique for the estimation of correction factors. The
application of the model combination technique to the mixture model alone, as
opposed to the entire HMM, makes online model combination possible. Exploiting
the availability of noise models from off-line sources, we accomplish the online
adaptation via MAP(Maximum A Posteriori) estimation. In addition, the real-time
channel estimation procedure is induced within the proposed framework. For a
more efficient implementation, a selective model combination scheme is proposed,
which leads to a reduction of the computational complexity. Representative
experimental results indicate that the proposed algorithm is effective in realizing
robust speech recognition under the combined adverse conditions of additive
background noise and channel distoroon.
In the conventional GMM-based method, feature restoration Is
accomplished by MMSE (Minimum Mean Squared Error) in which the posterior
probability decides on the extent of compensation. Since the noisy speech is
"incomplete", the compensation by posteriori can result in an obscure feature. In
the Proposed method, we define the components which are likely to diminish the
discriminative property of speech feature and re-compose the mixture model by
excluding the competing components. Candidates for distinctive features are
estimated from the re-composed model. Final feature selection is based on the
measures with likelihood average over the similar states and standard deviation of
likelihood across the dissimilar states. The experimental results show that the
suggested algorithm is effective in achieving more distinccve features and thus leads
to improved recognition performance under noisy environments
닫기