Profile photo for Yoshua Bengio

Basically, a kernel-based SVM requires on the order of n^2 computation for training and order of nd computation for classification, where n is the number of training examples and d the input dimension (and assuming that the number of support vectors ends up being a fraction of n, which is shown to be expected in theory and in practice). Instead, a 2-class linear SVM requires on the order of nd computation for training (times the number of training iterations, which remains small even for large n) and on the order of d computations for classification. So when the number of training examples is large (e.g. millions of documents, images, or customer records) then kernel SVMs are too expensive for training and very expensive for classification (unless one uses one of the approximations that have been proposed but have not yet become widely used as far as I know; someone more expert in this area could correct me). In the case of sparse inputs, linear SVMs are also convenient because d in the above can be replaced by the average number of non-zeros in each example input vector.

View 3 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025