Spect2bird
2019
This project acts as proof of concept for the use of spectrograms and pix2pix as a New Zealand bird identification and visualisation tool. Images of bird sound spectrograms were paired with images of birds and trained using pix2pix. Audio files were collected from eBird (https://ebird.org/media/catalog), edited to reduced noise, manually scrubbed for audio events and saved in sections as spectrogram images. Bird images were also collected from eBird and filtered to collect the top 20 images of each bird category, using the criteria of image clarity, exposure of bird, body position and side profile perspective. Tui, Fantail, Bellbird and Rosella were chosen as the final bird for the dataset due their availability of data and variety of visual features. Different combinations of spectrogram and image quantities were tested as well as different bird types, which told us that a high quantity of spectrograms with some variation, and a smaller quantity of bird images with uniformity gave the best results. There are limitations using New Zealand birds as a dataset, as range and availability of quality data can be difficult to find due to the bird populations and scarcity. Results showed that, through the pipeline which this paper outlines, pix2pix can successfully train a model to use spectrograms to identify and visualise birds when given a spectrogram of their vocalisations.

Website with full paper including experiment results and method as well as interactive results https://staceywillcox.github.io/spect2birds/
Above is an example of the training dataset. 4000 images were used, 1000 pairs for each bird consisting of 20 bird images and 50 spectrograms. The coloured square in the bottom right corner was included a key to better understand test results if the images were unclear. The bird types used in the tests (Tui, Fantail, Bellbird, Eastern Rosella and Saddleback) were chosen based on the availability of data as well as visual consideration such as variety of colour and shape. All birds used can be found in New Zealand.
Some notable results from the final model show how the vocalisation similarities of a Bellbird and a Tui can be represented by a hybrid bird (right). Generally, the images are very clearly one type of bird but there was the occasional cross over. This demonstrates an unintentional visual feature of the results that may be a useful tool for recognising similarities across bird calls. It may also be useful as a way of classifying unidentified bird sounds. Some less defined spectrograms resulted in less defined birds and combinations. This is where the coloured square is beneficial to have as a tool as it more clearly communicates which birds the model is combining.
Tui and Bellbird Hybrid
The final model can complete successful bird identifications of a Tui, Fantail, Bellbird and Eastern Rosella when given a spectrogram of those birds sounds with very few mistakes and produce recognisable bird images. At a larger scale, this model would allow for nationwide bird identification as well bird appearance predictions of unknown bird sound inputs. There are limitations due to the availability of audio and visual data, especially for some native New Zealand Birds although this concept is not limited to birds. There is also the possibility of a reverse pipeline. Using the same process, spectrograms could be generated from bird images and converted back to an audio file, allowing for bird vocalisations to be experienced from only an image input. 

Below is a screenshot of the website earlier mentioned with interactive results.
Spect2Bird
Published:

Spect2Bird

Pix2pix project identifying bird vocalisations by spectrograms.

Published:

Creative Fields