Objective

Design, development, and evaluation of a computational-physical model to generate synthetic birdsongs from recorded samples.

Overview

Study and Python implementation of the motor gestures for birdsongs model created by Prof. G Mindlin. This model explains the physics of birdsongs by modeling the organs involved in sound production in birds (syrinx, trachea, glottis, Oro-Oesophageal Cavity (OEC), and beak) using ordinary differential equations (ODEs).

This work presents an automated model to generate comparable synthetic birdsongs, in spectro and time, using the motor gestures model and an audio-recorded file (real birdsongs) as input. The automatization is done by formulating a minimization problem with three control parameters: air sac pressure of the bird’s bronchi, labial tension of the syrinx walls, and a time scale constant. This optimization problem is solved using numerical methods, signal processing tools, and numerical optimization techniques. The objective function depends on the Fundamental Frequency (also called pitch and denoted as FF or F0) and the Spectral Conent Index (SCI) of both synthetic and real syllable

The package is tested and evaluated in three different Colombian bird species: Zonotrichia Capensis, Ocellated Tapaculo, and Mimus Gilvus; recorded samples are downloaded from Xeno-Canto and eBird audio libraries. The results give FF and SCI relative errors less than % and comparable spectral harmonics, in number and frequency, as you can see in section Results

The PDF dissertation document is written with latex and is stored on Github at www.github.com/saguileran/birdsongs/tree/dissertation in the dissertation branch. You can download and compile it using text software (Texmaker, Overleaf, TeXstudio).

Dissertation document. — **Figure 1.** PDF document of the bachelor's thesis.

Python Implementation

Physical Model

Schematic description of the physical model motor gestures for birdsongs with the organs involved in the sound production (syrinx, trachea, glottis, OEC, and beak) and their corresponding ODEs.

Object-Oriented Thinking (OOP)

By taking advantage of the Object-Oriented Programming (OOP) paradigm, long codes can be avoided. Furthermore, the execution and implementation of the model are fast and easy, making it possible to create and compare several syllables with a single line of code. To solve the optimization problem and to analyze and compare real and synthetic birdsongs, five objects are created:

BirdSong: Read audio using its file name and a path object, it computes the audio spectral and temporal features. It can also split the audio into syllables.
Syllable: Create a birdsong syllable from a birdsong object using a time interval that can be selected in the plot or defined as a list. The spectral and temporal features of the syllable are automatically computed.
Optimizer: Create an optimizer that solves the minimization problem using the method entered (the default is a brute force but can be changed to leastsq, bfgs, newton, etc; further information in lmfit) in a feasible region that can be modified.
Plot: Visualize the birdsong and syllable objects and their spectral and temporal features.
Paths: Manage the package paths: audio files and results directories.

For each object an icon is defined as follows:

POO — **Figure 1.** Objects implemented.

This will facilitate the reading of the methodology diagram. Each icon is an object that deals with different tasks. The major advantage of this implementation is the possibility to easily compare the features between syllables or chuncks (small part of a syllable) objects.

Methodology

Using the above-defined objects, the optimization problem is solved by following the next steps below:

Each step contains the icon of the object involved. The final output is a parameters object (data frame from lmfit library) with the optimal control parameters coefficients of the optimal motor gesture that best reproduce the real birdsong.

Conclusions

The SCI score gives comparable results to finding the optimal pressure parameters coefficients, however, it is not always sufficient since the noise can be interpreted as harmonics or spectral content. An improvement is to refine the objective function that finds these parametric coefficients
The model successfully simulated several syllables of Zonotrichia Capensis with different sound quality. The best sounds to generate are the longer, simpler, and clearer syllables which were simulated with high accuracy. The thrilled syllables can be well-generated using chunks, small parts of syllables, but it requires tuning the pitch threshold.
The most problematic and difficult syllables are the noisy and with high spectral content audios, in which strong harmonics are present making the pitch computing hard or even impossible to compute correctly. Although for some audios is sufficient to change the pitch threshold detector, it does not work for all of them.

Applications

Some of the applications of this model are:

Data augmentation: Use the model to create numerous synthetic syllables, it can be done by creating a synthetic birdsong and then varying its motor gesture parameters to get similar birdsongs.
Birdsongs descriptions: Characterize and compare birdsongs using the motor gestures parameters.

References

Literature

[1] Amador, A., Perl, Y. S., Mindlin, G. B., & Margoliash, D. (2013). Elemental gesture dynamics are encoded by song premotor cortical neurons. Nature 2013 495:7439, 495(7439), 59–64. https://doi.org/10.1038/nature11967

Software

[2] Newville, M., Stensitzki, T., Allen, D. B., & Ingargiola, A. (2014). LMFIT: Non-Linear Least-Square Minimization and Curve-Fitting for Python. https://doi.org/10.5281/ZENODO.11813

[3] Ulloa, J. S., Haupert, S., Latorre, J. F., Aubin, T., & Sueur, J. (2021). scikit-maad: An open-source and modular toolbox for quantitative soundscape analysis in Python. Methods in Ecology and Evolution, 12(12), 2334–2340. https://doi.org/10.1111/2041-210X.13711 Dataset

[4] McFee, B., Raffel, C., Liang, D., Ellis, D. P., McVicar, M., Battenberg, E., & Nieto, O. & (2015). librosa: Audio and music signal analysis in Python. In Proceedings of the 14th python in science conference , 12(12), (Vol. 8). Librosa

Audios Dataset

[5] Xeno-canto Foundation and Naturalis Biodiversity Center & (2005). xeno-canto: Sharing bird sounds from around the world. Dissertation Audios Dataset

[6] Ther Cornell Lab of Ornithology & (2005). Macaulay Library - ebird , ebird.com

Sebastian Aguilera Novoa

BirdSongs