Fall Research Expo 2022

Automatic Detection of Brown-headed Cowbird Song in Urban Environments

Brown-headed cowbirds have an extremely rich social structure, and they exhibit many socially relevant behaviors during the mating season, e.g. calls, wingstrokes, and copulation solicitation displays. The goal of the Schmidt lab is to quantify how this social context is represented in the cowbird brain, and to this end, the ambition is to construct a "smart aviary" which tracks the behaviors of birds throughout time. Cowbird studies have traditionally been limited by the power of human observation to the tracking of one or two birds at a time, but with 24/7 camera and audio monitoring, the smart aviary will be able to track the behavior of an entire flock of birds simultaneously, providing a much richer dataset. The first task of this "audio-to-behavior pipeline" is to automatically detect the occurrence of birdsong in the aviary audio. However, this aim is complicated by the location of the aviary inside a noisy urban environment, which presents significant interference in the microphone signal.

To this end, we use a sparse-representation-based approach in order to quantify how much cowbird song is present in a signal. First, examples of cowbird whistles and chatters were manually annotated and collected into a call "dictionary." Then, signals are preprocessed by removing all of the information outside of a certain frequency band and windowed into approximately one-second long fragments. Using an algorithm called "orthogonal matching pursuit," the correlation between the fragment and the dictionary vectors is then calculated, and the fragment is represented as a weighted sum of the dictionary vectors which correlate with it most (sparse representation). The coefficients of the weighted sum give a measure of how much song is present in the fragment, which can be used to calculate the signal-to-interference-plus-noise ratio, or SINR. We can then classify the fragment as a song or not a song based on if the SINR is above a certain threshold. Using this method, we were able to successfully achieve an above-90% success rate and a below-10% error rate for both whistles and chatters. Future directions include implementation of audio enhancement and noise cancellation technology in order to get more precise information about call types, as well as collecting simultaneous data regarding both calls and neural events. The hope is that we can link neural signals to different social situations and elucidate how the brain understands social context.

PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2025
Advised By
Marc Schmidt
Professor of Biology; Co-Director, Undergraduate Neuroscience Program
PRESENTED BY
PURM - Penn Undergraduate Research Mentoring Program
College of Arts & Sciences 2025
Advised By
Marc Schmidt
Professor of Biology; Co-Director, Undergraduate Neuroscience Program

Comments