Gradient descent (GD) is a universal method for devising training algorithms, for different types of problems and models. This method came to data mining from neural networks, where GD is the foundation of Backpropagation algorithm - the primary method for training neural nets. However, GD is much more general than this. You can come up with any type of model that includes unknown (trainable) parameters, and by calculating derivatives of the model error over respective parameters you will come up with training rules for them.
Derivates are relatively easy to calculate - they always have a closed solution, in contrast to integrals - so everyone with a little patience and basic mathematical skills can derive them. The only trick is that sometimes you have to differentiate chained transformations or do this over multi-dimensional spaces - in such case something more than basic math skills is desirable.
This approach is very powerful: you're not limited by a handful of general model structures (neural nets, SVM, trees ...), but you can tailor the structure (not only parameters) precisely to the problem at hand, and with GD find appropriate equations for training it. This is what I do in literally all my data mining projects: recommender systems, image recognition, market forecasting, fitting physical models to measurement data ... And I can say from experience that every real-world problem is always somehow specific and exceptional, with general-purpose models not behaving as well as the ones with the structure adapted thanks to GD.