Kaggle's CauseEffect Pair Competition has ended and I got a rank of 19 out of 269 teams --- my second top 10%
finish. Here is a link to the competition http://www.kaggle.com/c/cause-effect-pairs
Here is a brief description of the contest: You are given a large number of A-B
observation pairs, each pair itself contains a list (a_i, b_i). Each pair has an indicator assigned to it A->B
(meaning A is a cause of B), B->A
(meaning B is a cause of A) and A-B
(meaning neither, they could be independent or could be affected by a third common cause etc). The aim of the contest was to predict similar indications - A->B, B->A, A-B
on new unknown pairs.
The competition involved a lot of feature engineering. The learning part was mostly handled by off-the-shelf learning algorithm --- I used Gradient Boosting Forest. But the one with better feature was the winner. (For example, one of the winners developed thousands of features out of the pair). Here is a discussion of everyone's approaches: http://www.kaggle.com/c/cause-effect-pairs/forums/t/5643/sharing-methods
Finally, here is a comprehensive list of what worked and what not: http://clopinet.com/causality/FactSheetv1.pdf
, compiled by the hosts.