Full text
2,204 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
Vocal interactions are fundamental for social functioning across animals, including humans. The diverse rules underlying these exchanges remain largely unknown, and emerging AI technologies offer promising avenues for investigation. We used computational tools to collect and analyze >1,000 hours of vocal interactions between female zebra finches and discovered that their interactions were characterized by correlated call production and structure, rapid acoustic modulation, and response selectivity. To test these interaction rules, we developed a generative audio large language model (ZF-AIM Acoustic Interaction Model) that engaged in real-time vocal exchanges with birds. When birds interacted with ZF-AIM, their vocal production and flexibility recapitulated key naturalistic features, which did not happen with non-interactive playbacks. Targeted ablations of ZF-AIM revealed that call timing and structure differentially contribute to natural vocal interactions. Using these AI-animal interactions, we demonstrate how AI can be leveraged to reveal fundamental rules underlying animal communication.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
↵‡ These authors listed alphabetically.
↵†† Co-supervising authors.
We have updated the title and abstract
↵3 Implementation from https://github.com/lucidrains/recurrent-memory-transformer-pytorch.
↵4 Implementation from https://github.com/facebookresearch/audiocraft
↵5 https://github.com/facebookresearch/audiocraft/blob/main/config/model/encodec/encodec_base_causal.yaml
↵6 https://github.com/facebookresearch/audiocraft/blob/main/config/solver/compression/default.yaml
↵7 https://github.com/facebookresearch/audiocraft/blob/main/config/model/encodec/encodec_base_causal.yaml
↵8 https://github.com/facebookresearch/audiocraft/blob/main/config/solver/compression/default.yaml
↵9 We use the FAD implementation from https://github.com/gudgud96/frechet-audio-distance
↵11 https://osf.io/wf2x8/overview?view_only=7d7b699afec949a089c8b7abfd8e71c2
↵12 https://osf.io/wf2x8/overview?view_only=7d7b699afec949a089c8b7abfd8e71c2
↵13 https://osf.io/wf2x8/overview?view_only=7d7b699afec949a089c8b7abfd8e71c2
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.