Speech is
composed of an excitation sequence convolved with the impulse response of the vocal
system model. It is often desirable to eliminate one of the components so that the other
may be used in a recognition algorithm. Cepstrum is a common transform, which can be
used to separate the excitation signal (which contains the phones and the pitch) and
the transfer function (which contains the voice quality). These two portions
are convolved in
the time domain, but convolution in time domain becomes multiplication in
frequency
domain, which could be represented as,
X(w) =G(w)H(w)
When a log
of the magnitude of both sides of the transform is taken,
log | X(w) |= log |G(w) | +log | H(w) |
Taking IDFT
on both sides of the above equation, introduces us to a term called “Quefrency”,
which is the x-axis of the cepstrum domain.
This process
is better understood with the help of a block diagram .A lifter is used to
separate the high quefrency (Excitation) from the low quefrency (Transfer Function).
No comments:
Post a Comment