Today marks the end of Kaggle's MarineExplore Whale Detection challenge. The challenge, simply stated, is this: You are given You are given a set of 2-minute .aiff sound files, some containing sound from some species of whale
, while others containing other ambient noises in sea (possibly including sounds from different species of whale).
The dataset consists of a 0/1 label train data (30000 samples) and a unlabelled test data (54503 samples). The challenge was to predict the presence of the relevant species of whale in test set
Like many, my initial approach was to read the aiff files and directly use sound frequencies from the file as features. This approach helps 'break-into' the 0.90 AUC (Area-Under the-Curve) score. Some of the most successful submissions, however, treat the problem as an image-processing problem, treating audio spectrogram
as relevant feature. Check this forum
for more information on these approaches.
Using this approach, I have been able to obtain an AUC of 0.96016
with a respectable 56th place out of 249 participants.
This gives me a (sorta) coveted Top 25% badge
on Kaggle. Click here to checkout my code on Github.