Combining audio and non-audio inputs in evolved neural networks for ovenbird classification

Sergio Poo Hernandez, Vadim Bulitko & Erin Bayne (2024). Combining audio and non-audio inputs in evolved neural networks for ovenbird classification. Bioacoustics, Volume 33 (3):
Abstract: 

In the last several years, the use of neural networks as tools to automate species classification from digital data has increased. This has been due in part to the high classification accuracy of image classification through Convolutional Neural Networks (CNN). In the case of audio data, CNN-based recognisers are used to automate the classification of species in audio recordings by using information from sound visualisation (i.e. spectrograms). It is common for these recognisers to use the spectrogram as their sole input. However, researchers have other, non-audio data, such as habitat preferences of a species, phenology, and range information, which could improve species classification. We present how a single-species recogniser neural network’s accuracy can be improved by using non-audio data as inputs in addition to spectrogram information. We analyse the cause of the improvements: are they a result of having a neural network with a higher number of parameters or is it due to the use of the two inputs? We find that networks that use the two different inputs have a higher classification accuracy. This suggests that the accuracy of classifiers can be improved by giving them non-audio information about the location and conditions where the recordings were obtained.

Keywords: 

Birdsong, classification, machine learning, spectrogram