Thursday, September 10, 2015
LPC(Linear Predictive Coding)
Liftering
Liftering operation is
similar to filtering operation in the frequency domain where a desired
quefrency region for analysis is selected by multiplying the whole cepstrum by
a rectangular window at the desired position. There are two types of liftering
performed, low-time liftering and high-time liftering. Low-time liftering
operation is performed to extract the vocal tract characteristics in the
quefrency domain and high-time liftering is performed to get the excitation
characteristics of the analysis speech frame.
Cepstrum Domain
Speech is
composed of an excitation sequence convolved with the impulse response of the vocal
system model. It is often desirable to eliminate one of the components so that the other
may be used in a recognition algorithm. Cepstrum is a common transform, which can be
used to separate the excitation signal (which contains the phones and the pitch) and
the transfer function (which contains the voice quality). These two portions
are convolved in
the time domain, but convolution in time domain becomes multiplication in
frequency
domain, which could be represented as,
X(w) =G(w)H(w)
When a log
of the magnitude of both sides of the transform is taken,
log | X(w) |= log |G(w) | +log | H(w) |
Taking IDFT
on both sides of the above equation, introduces us to a term called “Quefrency”,
which is the x-axis of the cepstrum domain.
This process
is better understood with the help of a block diagram .A lifter is used to
separate the high quefrency (Excitation) from the low quefrency (Transfer Function).
Thursday, September 3, 2015
Frame Blocking of speech signal
In this step, the continuous speech signal is blocked into frames of N samples, with adjacent frames being separated by M (M < N). The first frame consists of the first N samples. The second frame begins M samples after the first frame, and overlaps it by N – M samples. Similarly, the third frame begins 2M samples after the first frame (or M samples after the second frame) and overlaps it by N - 2M samples. This process continues until all the speech is accounted for within one or more frames. Frame blocking of the speech signal is done because when examined over a sufficiently short period of time, its characteristics are fairly stationary. However, over long periods of time the signal characteristic change to reflect the different speech sounds being spoken. Overlapping frames are taken not to have much information loss and to maintain correlation between the adjacent frames.
Tuesday, September 1, 2015
What is white noise?
White noise is a type of noise that is produced by
combining sounds of all different frequencies together. If you took all
of the imaginable tones that a human can hear and combined them together, you would have white noise.
The adjective "white" is used to describe this type of noise because of the way white light works. White light is light that is made up of all of the different colors (frequencies) of light combined together. In the same way, white noise is a combination of all of the different frequencies of sound. You can think of white noise as 20,000 tones all playing at the same time. Because white noise contains all frequencies, it is frequently used to mask other sounds.
The adjective "white" is used to describe this type of noise because of the way white light works. White light is light that is made up of all of the different colors (frequencies) of light combined together. In the same way, white noise is a combination of all of the different frequencies of sound. You can think of white noise as 20,000 tones all playing at the same time. Because white noise contains all frequencies, it is frequently used to mask other sounds.
Spectral and Temporal Features in Speech signal
There are two types of features of a speech signal :
- The temporal features (time domain features), which are simple to extract and have easy physical interpretation, like: the energy of signal, zero crossing rate, maximum amplitude, minimum energy, etc.
- The spectral features (frequency based features), which are obtained by converting the time based signal into the frequency domain using the Fourier Transform, like: fundamental frequency, frequency components, spectral centroid, spectral flux, spectral density, spectral roll-off, etc. These features can be used to identify the notes, pitch, rhythm, and melody.
Subscribe to:
Posts (Atom)
On the other hand, some of the time domain (temporal) features such as plosion index and maximum correlation coefficient are relatively more robust to noise.