Sudeep's Blog

Disorganized Thoughts in Organized Manner

Kaggle - Whale Detection Challenge
Today marks the end of Kaggle's MarineExplore Whale Detection challenge. The challenge, simply stated, is this: You are given You are given a set of 2-minute .aiff sound files, some containing sound from some species of whale, while others containing other ambient noises in sea (possibly including sounds from different species of whale). The dataset consists of a 0/1 label train data (30000 samples) and a unlabelled test data (54503 samples). The challenge was to predict the presence of the relevant species of whale in test set . Like many, my initial approach was to read the aiff files and directly use sound frequencies from the file as features. This approach helps 'break-into' the 0.90 AUC (Area-Under the-Curve) score. Some of the most successful submissions, however, treat the problem as an image-processing problem, treating audio spectrogram as relevant feature. Check this forum for more information on these approaches. Using this approach, I have been able to obtain an AUC of 0.96016 with a respectable 56th place out of 249 participants. This gives me a (sorta) coveted Top 25% badge on Kaggle. Click here to checkout my code on Github.
Back to Home