Master Kaggle by Competing Consistently
How do you get competitive at Kaggle competitions?
It is a typical question that props up. The ideal advice for beginning and getting good is to constantly take part in competitions. You cannot help but improve at machine learning.
One of the posts by Triskelion entitled “Reflecting back on one year on Kaggle contests” bares this out. He began as a machine learning starter and completed as a “master” level Kaggle competitor (accomplishing a 10% and a top 10 finish)
In this blog article, we will review Triskelion’s lesson of constant participation as a strategy to start and master Kaggle.
Critical to a good beginning
We believe the key to Triskelion starting well and having the confidence to continue is dual fold:
1] Reproduced outcomes: He reproduced outcomes detailed in the forums and blog articles.
2] Used tools: Even though reproducing outcomes, he found out and began to leverage tools such as Vowpal Wabbit, and scikit-learn.
Reproduce results – this is an overt but a really underrated approach.
There is a lacking of good machine learning tutorials. The ideal surrogate (and better guidelines on toy datasets) are the “how to beat the benchmark” posts on forums and the “how I did it posts” at the conclusion of a competition.
The purpose for this is that these quasi-tutorials provide you insight into how a world-class analyst thinks about finds a solution to a problem. For instance, the utilities they leverage, how they setup their pipeline, the parameters they leverage, the procedure, everything.
Emulating these elements is an intelligent fashion to bootstrap your machine learning skills.
Use Good Tools
A starter mistake is reimplementing algorithms from the ground up.
There is a massive array of potent utilities available and you must reap the advantages from them. You will obtain improved outcomes, quicker. This will compel you to push further.
Triskelion found out Vowpal Wabbit early on and was not hesitant to toy with it. VW is a very potent utility that even pros have a difficult time with.
As a matter of fact, an issue we observe with “experts” trained in machine learning is that the ignore or even scoff at sophisticated or varying tools. They learned machine learning in R or Weka and thus every issue can just be tackled with their weapon of choice.
The more utilities you are aware of and can leverage, the more methods you have to contemplate about and handle your problem.
Crucial to Getting Good
Competing constantly is the catch to getting good.
Good is relative, but Triskelion is understandably much better now than a single year ago (better than approximately 200,000 other competitors), owing largely to his aggressive participation schedule.
He details some seven particular competitions, but his profile signifies a total of 15 competitions in which he has taken part.
If you wish to get good at machine learning competitions, follow his lead and take part in a ton of contexts. Even if you just require the benchmark in the first few, you will learn a ton of data prep and utilities.
If you reproduce the outcomes you see posted on blogs and forums for those competitions, then the gains will be non-linear.
Competition Tips
Lastly, Triskelion completes with an array of tips.
- Practice a lot: Complete as many challenges as you can, incremental improvements.
- Study evaluation metrics: Really comprehend AUC, etc.
- Study the domain: Business cases, papers, state of the art, feature engineering.
- Team up: Top 10 finish is difficult, but the need is for collaboration to accomplish it.
- Read the forums: Post to contest threads, comprehend winning solutions.
- Share on forums: Tons of angles to a provided problem, don’t share too much.
- Leverage ensembles: They always enhance outcomes, can provide you a top 10 with simple models.
- Experiment: Attempt new ideas instead of living in thought.
- Creativity: Foster lateral thinking and think outside of the box.
- Tools: Identify and leverage adequate algorithms.
- Tuning: Leverage cross-validation, tune all model parameters.
His last tip is to have fun.
This might very well be the most critical point. Competitive machine learning is amazingly fun. Identify the fun in it. Some perseverance is required to get over the knowledge hump when beginning. The very act of doing “OK” (defeating the benchmark) could be the fun part in the start.