The kernel trick was first published in the paper
M. Aizerman, E. Braverman, and L. Rozonoer. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25:821--837, 1964.
The kernel trick uses Mercer's theorem, which states that any positive definite kernel K(x, y) can be expressed as a dot product in a high-dimensional space.
More specifically, if a kernel is positive semi-definite, i.e.,
The kernel trick has been applied to several algorithms in machine learning and statistics, including:
- Support vector machine
- Principal components analysis
- Fisher's Linear discriminant analysis
See also: