The kernel trick was first published in the paper

M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821--837, 1964.

The kernel trick uses Mercer's theorem, which states that any positive definite kernel K(x, y) can be expressed as a dot product in a high-dimensional space.

More specifically, if a kernel is positive semi-definite, i.e.,

then there exists a function whose image is in an inner product space of possibly high dimension, such that

The kernel trick transforms any algorithm that solely depends on the dot product between two vectors. Wherever a dot product is used, it is replaced with the kernel function. Thus, a linear algorithm can easily be transformed into a non-linear algorithm. This non-linear algorithm is the linear algorithm operating in the range space of φ. However, because kernels are used, the φ function is never explicitly computed. This is desirable, because the high-dimensional space may be infinite-dimensional (as is the case when the kernel is a Gaussian).

The kernel trick has been applied to several algorithms in machine learning and statistics, including:

The coiner of the term kernel trick is unknown.

See also: