![]() | ||
In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.
Contents
- Introduction
- Defining component independence
- Mathematical definitions
- General definition
- Linear noiseless ICA
- Linear noisy ICA
- Nonlinear ICA
- Identifiability
- Binary independent component analysis
- Projection pursuit
- Based on infomax
- Based on maximum likelihood estimation
- History and background
- Applications
- References
Introduction
Independent component analysis attempts to decompose a multivariate signal into independent non-Gaussian signals. As an example, sound is usually a signal that is composed of the numerical addition, at each time t, of signals from several sources. The question then is whether it is possible to separate these contributing sources from the observed total signal. When the statistical independence assumption is correct, blind ICA separation of a mixed signal gives very good results. It is also used for signals that are not supposed to be generated by a mixing for analysis purposes.
A simple application of ICA is the "cocktail party problem", where the underlying speech signals are separated from a sample data consisting of people talking simultaneously in a room. Usually the problem is simplified by assuming no time delays or echoes.
An important note to consider is that if N sources are present, at least N observations (e.g. microphones) are needed to recover the original signals. This constitutes the square case (J = D, where D is the input dimension of the data and J is the dimension of the model). Other cases of underdetermined (J > D) and overdetermined (J < D) have been investigated.
That the ICA separation of mixed signals gives very good results is based on two assumptions and three effects of mixing source signals. Two assumptions:
- The source signals are independent of each other.
- The values in each source signal have non-Gaussian distributions.
Three effects of mixing source signals:
- Independence: As per assumption 1, the source signals are independent; however, their signal mixtures are not. This is because the signal mixtures share the same source signals.
- Normality: According to the Central Limit Theorem, the distribution of a sum of independent random variables with finite variance tends towards a Gaussian distribution. Loosely speaking, a sum of two independent random variables usually has a distribution that is closer to Gaussian than any of the two original variables. Here we consider the value of each signal as the random variable.
- Complexity: The temporal complexity of any signal mixture is greater than that of its simplest constituent source signal.
Those principles contribute to the basic establishment of ICA. If the signals we happen to extract from a set of mixtures are independent like sources signals, or have non-Gaussian histograms like source signals, or have low complexity like source signals, then they must be source signals.
Defining component independence
ICA finds the independent components (also called factors, latent variables or sources) by maximizing the statistical independence of the estimated components. We may choose one of many ways to define a proxy for independence, and this choice governs the form of the ICA algorithm. The two broadest definitions of independence for ICA are
- Minimization of mutual information
- Maximization of non-Gaussianity
The Minimization-of-Mutual information (MMI) family of ICA algorithms uses measures like Kullback-Leibler Divergence and maximum entropy. The non-Gaussianity family of ICA algorithms, motivated by the central limit theorem, uses kurtosis and negentropy.
Typical algorithms for ICA use centering (subtract the mean to create a zero mean signal), whitening (usually with the eigenvalue decomposition), and dimensionality reduction as preprocessing steps in order to simplify and reduce the complexity of the problem for the actual iterative algorithm. Whitening and dimension reduction can be achieved with principal component analysis or singular value decomposition. Whitening ensures that all dimensions are treated equally a priori before the algorithm is run. Well-known algorithms for ICA include infomax, FastICA, JADE, and kernel-independent component analysis, among others. In general, ICA cannot identify the actual number of source signals, a uniquely correct ordering of the source signals, nor the proper scaling (including sign) of the source signals.
ICA is important to blind signal separation and has many practical applications. It is closely related to (or even a special case of) the search for a factorial code of the data, i.e., a new vector-valued representation of each data vector such that it gets uniquely encoded by the resulting code vector (loss-free coding), but the code components are statistically independent.
Mathematical definitions
Linear independent component analysis can be divided into noiseless and noisy cases, where noiseless ICA is a special case of noisy ICA. Nonlinear ICA should be considered as a separate case.
General definition
The data are represented by the random vector
Linear noiseless ICA
The components
weighted by the mixing weights
The same generative model can be written in vectorial form as
Given the model and realizations (samples)
The original sources
Linear noisy ICA
With the added assumption of zero-mean and uncorrelated Gaussian noise
Nonlinear ICA
The mixing of the sources does not need to be linear. Using a nonlinear mixing function
Identifiability
The independent components are identifiable up to a permutation and scaling of the sources. This identifiability requires that:
Binary independent component analysis
A special variant of ICA is Binary ICA in which both signal sources and monitors are in binary form and observations from monitors are disjunctive mixtures of binary independent sources. The problem was shown to have applications in many domains including medical diagnosis, multi-cluster assignment, network tomography and internet resource management.
Let
where
The above problem can be heuristically solved by assuming variables are continuous and running FastICA on binary observation data to get the mixing matrix
Another method is to use dynamic programming: recursively breaking the observation matrix
The Generalized Binary ICA framework introduces a broader problem formulation which does not necessitate any knowledge on the generative model. In other words, this method attempts to decompose a source into its independent components (as much as possible, and without losing any information) with no prior assumption on the way it was generated. Although this problem appears quite complex, it can be accurately solved with a branch and bound search tree algorithm or tightly upper bounded with a single multiplication of a matrix with a vector.
Projection pursuit
Signal mixtures tend to have Gaussian probability density functions, and source signals tend to have non-Gaussian probability density functions. Each source signal can be extracted from a set of signal mixtures by taking the inner product of a weight vector and those signal mixtures where this inner product provides an orthogonal projection of the signal mixtures. The remaining challenge is finding such a weight vector. One type of method for doing so is projection pursuit.
Projection pursuit seeks one projection at a time such that the extracted signal is as non-Gaussian as possible. This contrasts with ICA, which typically extracts M signals simultaneously from M signal mixtures, which requires estimating a M × M unmixing matrix. One practical advantage of projection pursuit over ICA is that fewer than M signals can be extracted if required, where each source signal is extracted from M signal mixtures using an M-element weight vector.
We can use kurtosis to recover the multiple source signal by finding the correct weight vectors with the use of projection pursuit.
The kurtosis of the probability density function of a signal, for a finite sample, is computed as
where
Using kurtosis as a measure of non-normality, we can now examine how the kurtosis of a signal
- the kurtosis of the extracted signal
y to be maximal precisely wheny = s . - the kurtosis of the extracted signal
y to be maximal whenw is orthogonal to the projected axesS 1 S 2 S 1 S 2
For multiple source mixture signals, we can use kurtosis and Gram-Schmidt Orthogonalization (GSO) to recover the signals. Given M signal mixtures in an M-dimensional space, GSO project these data points onto an (M-1)-dimensional space by using the weight vector. We can guarantee the independence of the extracted signals with the use of GSO.
In order to find the correct value of
Rescaling each vector
The updating process for
where
Another approach is using negentropy instead of kurtosis. Negentropy is a robust method for kurtosis, as kurtosis is very sensitive to outliers. The negentropy method are based on an important property of Gaussian distribution: a Gaussian variable has the largest entropy among all random variables of equal variance. This is also the reason why we want to find the most nongaussian variables. A simple proof can be found in Differential entropy.
y is a Gaussian random variable of the same covariance matrix as x
An approximation for negentropy is
A proof can be found on page 131 in the book Independent Component Analysis written by Aapo Hyvärinen, Juha Karhunen, and Erkki Oja (They contribute great works to ICA) This approximation also suffers the same problem as kurtosis (sensitive to outliers). Other approaches were developed.
A choice of
Based on infomax
ICA is essentially a multivariate, parallel version of projection pursuit. Whereas projection pursuit extracts a series of signals one at a time from a set of M signal mixtures, ICA extracts M signals in parallel. This tends to make ICA more robust than projection pursuit.
The projection pursuit method uses Gram-Schmidt orthogonalizaton to ensure the independence of the extracted signal, while ICA use infomax and maximum likelihood estimate to ensure the independence of the extracted signal. The Non-Normality of the extracted signal is achieved by assigning an appropriate model, or prior, for the signal.
The process of ICA based on infomax in short is: given a set of signal mixtures
Consider the entropy of the vector variable
The joint pdf
where
therefore,
We know that when
where
so,
since
to achieve the independence of extracted signal.
If there are M marginal pdfs of the model joint pdf
In the sum, given an observed signal mixture
Based on maximum likelihood estimation
Maximum likelihood estimation (MLE) is a standard statistical tool for finding parameter values (e.g. the unmixing matrix
The ML "model" includes a specification of a pdf, which in this case is the pdf
MLE is thus based on the assumption that if the model pdf
Using MLE, we call the probability of the observed data for a given set of model parameter values (e.g., a pdf
We define a likelihood function
This equals to the probability density at
Thus, if we wish to find a
It is common practice to use the log likelihood, because this is easier to evaluate. As the logarithm is a monotonic function, the
If we substitute a commonly used high-Kurtosis model pdf for the source signals
This matrix
History and background
The early general framework for independent component analysis was introduced by Jeanny Hérault and Bernard Ans from 1984, then rejoined by Christian Jutten from 1985 and was most clearly stated by Pierre Comon in 1994. In 1995, Tony Bell and Terry Sejnowski introduced a fast and efficient ICA algorithm based on infomax, a principle introduced by Ralph Linsker in 1987.
There are many algorithms available in the literature which do ICA. A largely used one, including in industrial applications, is the FastICA algorithm, developed by Aapo Hyvärinen and Erkki Oja, which uses the kurtosis as cost function. Other examples are rather related to blind source separation where a more general approach is used. For example, one can drop the independence assumption and separate mutually correlated signals, thus, statistically "dependent" signals. Sepp Hochreiter and Jürgen Schmidhuber showed how to obtain non-linear ICA or source separation as a by-product of regularization (1999). Their method does not require a priori knowledge about the number of independent sources.
Applications
ICA can be extended to analyze non-physical signals. For instance, ICA has been applied to discover discussion topics on a bag of news list archives.
Some ICA applications are listed below: