Profile photo for Luis Argerich

A fold is a set of (usually consecutive) records of the dataset.

The idea of k-fold cross-validation is to split the dataset into a fixed number of folds, for example if we have 100 records we can split them in 2 folds of 50 records each, 10 folds of 10 records each, 4 folds of 25 records each, etc.

Then we use each fold as a validation set and the rest of the folds as a training set. If k is the number of folds we always train with k-1 folds and compute our metric of interest over the remaning fold. The result of the metric is then averaged to get the evaluation of the algorithm.

For example let’s consider that we use 2-fold cross-validation for a classification algorithm and our metric of interest is accuracy.

Then we split the dataset in two folds of equal size, to avoid the ordering of the data to be meaningful, we can first shuffle the entire dataset. Once we have the two folds f1 and f2 we proceed to train with f1 and use f2 to compute our metric (acc). After this we use f2 to train and compute accuracy over f1. The final accuracy is the average of the two computed accuracy values.

In general cross-validation is used to tune hyper-parameters in machine-learning algorithms, we use it to find the value for the hyperparameters that produce the best result (best metric).

The idea is that every record is used exactly once for validation which is better than just do a split of the data and use some records for train and others for validation.

K-fold cross validation is a generalization of leave-n-out cross-validation, the difference is that in leave-n-out all the possible combinations leaving n-records out should be tested.

As an example consider a simple case of only 4 records: r0,r1,r2 and r3.

If we use 2-fold cross validation we first shuffle let’s say we obtain r2,r0,r1,r3 and then split so our fold1 is {r2,r0} and our fold2 is {r1,r3}. So we train twice with {r2,r0} and {r1,r3} using the remaining fold to compute our metric. In leave-2-out cross validation we would have to train with {r0,r1}, {r0,r2}, {r0,r3},{r1,r2},{r1,r3} and {r2,r3} so we would have to run the algorithm six times instead of 2. Notice that in the particular case of a dataset containing 4 records then 4-fold cross-validation is the same as leave-1-out cross validation because each fold would have exactly one record.

View 5 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025