![]() | ||
Multiple kernel learning refers to a set of machine learning methods that use a predefined set of kernels and learn an optimal linear or non-linear combination of kernels as part of the algorithm. Reasons to use multiple kernel learning include a) the ability to select for an optimal kernel and parameters from a larger set of kernels, reducing bias due to kernel selection while allowing for more automated machine learning methods, and b) combining data from different sources (e.g. sound and images from a video) that have different notions of similarity and thus require different kernels. Instead of creating a new kernel, multiple kernel algorithms can be used to combine kernels already established for each individual data source.
Contents
- Algorithms
- Supervised learning
- Fixed rules approaches
- Heuristic approaches
- Optimization approaches
- Bayesian approaches
- Boosting approaches
- Semisupervised learning
- Unsupervised learning
- MKL Libraries
- References
Multiple kernel learning approaches have been used in many applications, such as event recognition in video, object recognition in images, and biomedical data fusion.
Algorithms
Multiple kernel learning algorithms have been developed for supervised, semi-supervised, as well as unsupervised learning. Most work has been done on the supervised learning case with linear combinations of kernels, however, many algorithms have been developed. The basic idea behind multiple kernel learning algorithms is to add an extra parameter to the minimization problem of the learning algorithm. As an example, consider the case of supervised learning of a linear combination of a set of
where
Supervised learning
For supervised learning, there are many other algorithms that use different methods to learn the form of the kernel. The following categorization has been proposed by Gonen and Alpaydın (2011)
Fixed rules approaches
Fixed rules approaches such as the linear combination algorithm described above use rules to set the combination of the kernels. These do not require parameterization and use rules like summation and multiplication to combine the kernels. The weighting is learned in the algorithm. Other examples of fixed rules include pairwise kernels, which are of the form
These pairwise approaches have been used in predicting protein-protein interactions.
Heuristic approaches
These algorithms use a combination function that is parameterized. The parameters are generally defined for each individual kernel based on single-kernel performance or some computation from the kernel matrix. Examples of these include the kernel from Tenabe et al. (2008). Letting
Other approaches use a definition of kernel similarity, such as
Using this measure, Qui and Lane (2009) used the following heuristic to define
Optimization approaches
These approaches solve an optimization problem to determine parameters for the kernel combination function. This has been done with similarity measures and structural risk minimization approaches. For similarity measures such as the one defined above, the problem can be formulated as follows:
where
Structural risk minimization approaches that have been used include linear approaches, such as that used by Lanckriet et al. (2002). We can define the implausibility of a kernel
where
Bayesian approaches
Bayesian approaches put priors on the kernel parameters and learn the parameter values from the priors and the base algorithm. For example, the decision function can be written as
These methods have been used successfully in applications such as protein fold recognition and protein homology problems
Boosting approaches
Boosting approaches add new kernels iteratively until some stopping criteria that is a function of performance is reached. An example of this is the MARK model developed by Bennett et al. (2002)
The parameters
Semisupervised learning
Semisupervised learning approaches to multiple kernel learning are similar to other extensions of supervised learning approaches. An inductive procedure has been developed that uses a log-likelihood empirical loss and group LASSO regularization with conditional expectation consensus on unlabeled data for image categorization. We can define the problem as follows. Let
The problem can be written as
where
where
where
Unsupervised learning
Unsupervised multiple kernel learning algorithms have also been proposed by Zhuang et al. The problem is defined as follows. Let
where . One formulation of this is defined as follows. Let
MKL Libraries
Available MKL libraries include