-
Don’t Overfit! II is a challenging problem where we must avoid models to be overfitted (or crooked way to learn) given very small amount of training samples.
As per Kaggle say, "It was a competition that challenged mere mortals to model a 20,000x200 matrix of continuous variables using only 250 training samples… without overfitting."
Dataset can be download here: https://www.kaggle.com/c/dont-overfit-ii/overview (You can download data.rar directly here)
Dimension of train.csv – 250 samples and 300 features and 1 class label and 1 Id: (250,302)
Dimension of test.csv – 19750 samples and 300 features and 1 Id: (19750,301)
So, with the small amount of train data given, we must do to task carefully to avoid overfitting easily. What do we need to predict? We are predicting the binary target value (binary classification) associated with each row which contains 300 continuous feature values. Also without overfitting with the minimal set of training samples given.
- As per Kaggle problem statement, the score will be evaluated based on AUROC between predicted target and actual target.
sahilsharma884/Dont_OverfitII
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|