Blind Source Separation
The term of blind source separation can mean so many different things. In this short article, we limit our discussions to algorithms that separate multiple sources or extract information from a linear mix of multiple sources with no training data and no prior knowledge of the signal parameters with or without multiple reception sensors.
Linear Mixture Model
For a set of independent sources, x(t) = {x1(t), x2(t), … , xN(t)}, we observe the following output set, y(t) = {y1(t), y2(t), … , yM(t)},
where H is the linear mixing matrix and N(t) is the noise terms for each channel.
The above is formulated as a multiple-input/multiple-output problem. If a single source is the target, beamforming is usually the appropriate approach. In general, blind source separation algorithms are techniques that separate all the component sources or a subset of a few dominant sources while suppressing the additive noise term N(t) in some statistic sense.
Under the context of speech and acoustic signal processing, the above model must be extended into convolute mixing rather than a simple matrix multiplication.
Overdetermined BSS
An intensively studied class of multichannel mixings by blind source separation is when the number of sources, N, is smaller than the number of channels, M. Independent component analysis (ICA) is the main tool that assumes the N sources are statistically mutually independent. The approach utilizes spatial diversity to discriminate between the desired and undesired sources. It is essentially an unsupervised learning algorithm that adaptively forms a spatial null towards the undesired signal.
Underdetermined BSS
In this class of problems, there are more sources than the number of channels, N > M. This is a challenging problem and theoretically ICA by itself cannot resolve all the sources. However, realizing the sparseness of speech in the time-frequency domain, plus the spatial domain, we can still untangle the sources in many situations.
Single Channel BSS
Single channel BSS or monaural BSS deals with only one channel available but the number of sources is larger than one. Traditional approaches make use of the speech time-frequency domain structures, such as, harmonicity and temporal burstiness, to build a probabilistic source model. The target source is derived from maximizing the a posteriori probability of the source given the observations.