Taofeek Iyanda's answer to What is the difference between public and private leaderboard in Kaggle?

Studied Fisheries & Data Science at Lagos State University (LASU) (Graduated 2020) · 5y ·

I came here for this question but it is yet to answered so I decided to provide answer i got from further research i made. I hope it is helpful

How do the public and private leaderboards work?

Kaggle competitions are decided by your model's performance on a test data set. Kaggle has the answers for this data set, but withholds them to compare with your predictions. Your Public score is what you receive back upon each submission (that score is calculated using a statistical evaluation metric, which is always described on the Evaluation page). BUT: Your Public Score is being determined from only a fraction of the test data set -- usually between 25-33%. This is the Public Leaderboard, and it shows some relative performance during the competition.
When the competition ends, we take your selected submissions (see below) and score your predictions against the REMAINING FRACTION of the test set, or the private portion. You never receive ongoing feedback about your score on this portion, so it is the Private leaderboard. Final competition results are based on the Private leaderboard, and the Winner is the person(s) at the top of the Private Leaderboard. Why? This separation of the test set into public and private portions is what ensures that the most accurate but generalized model is the one that wins the challenge. If you based your model solely on the data which gave you constant feedback, you run the danger of a model that overfits to the specific noise in that data. One of the hard challenges in data science is to avoid overfitting, by leaving your model flexible to out-of-sample data.

Source: Kaggle

8.5K views ·

View upvotes

Something went wrong. Wait a moment and try again.