Skip to content

sahilsharma884/Dont_OverfitII

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dont_OverfitII

  • Kaggle problem:

    Don’t Overfit! II is a challenging problem where we must avoid models to be overfitted (or crooked way to learn) given very small amount of training samples.

    As per Kaggle say, "It was a competition that challenged mere mortals to model a 20,000x200 matrix of continuous variables using only 250 training samples… without overfitting."

    Dataset can be download here: https://www.kaggle.com/c/dont-overfit-ii/overview (You can download data.rar directly here)

    Dimension of train.csv – 250 samples and 300 features and 1 class label and 1 Id: (250,302)

    Dimension of test.csv – 19750 samples and 300 features and 1 Id: (19750,301)

    So, with the small amount of train data given, we must do to task carefully to avoid overfitting easily. What do we need to predict? We are predicting the binary target value (binary classification) associated with each row which contains 300 continuous feature values. Also without overfitting with the minimal set of training samples given.

  • Evaluation:

    As per Kaggle problem statement, the score will be evaluated based on AUROC between predicted target and actual target.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors