Modelling temporal shift-invariance in self-supervised generative models improves accuracy and interpretability of species detection in soundscape recordings

doi:10.64898/2025.12.09.693207

Modelling temporal shift-invariance in self-supervised generative models improves accuracy and interpretability of species detection in soundscape recordings

2025 · doi:10.64898/2025.12.09.693207

preprint OA: closed

Full text JSON View at publisher

Full text 1,818 characters · extracted from oa-doi-fallback · click to expand

Abstract Realising the potential for acoustic monitoring to deliver biodiversity insight at scale requires new approaches to the automated analysis of PAM recordings that are trustworthy as well as cost-effective. Discriminative models trained on annotated species data are gaining popularity but are labour intensive, notoriously opaque and biased. Self-supervised generative models such as Variational Autoencoders (VAE) offer great potential for learning compact yet expressive representations of data, which can be used for subsequent discriminative tasks and are intrinsically interpretable. However, the default learning algorithm results in weakly discriminative data representations due to under-specification of the generative task. We propose and evaluate a novel modification to the VAE learning algorithm that models intra-frame shift-invariance. We demonstrate that this modification provides representations that are more interpretable, consistent and improve classification performance. Performance accuracy is evaluated on species detection tasks on two weakly annotated data sets across temperate and tropical terrestrial habitats and compared to leading discriminative models BirdNet and Perch, as well as the classic VAE. Whilst demonstrated in terrestrial recordings, the approach is transferable to marine, freshwater, and soil habitats. These innovations set the path for trustworthy, data and time-efficient tools to support solid ecological inference from large-scale passive acoustic monitoring surveys. Competing Interest Statement The authors have declared no competing interest. Footnotes alicee{at}sussex.ac.uk a.shuaibu{at}sussex.ac.uk i.simpson{at}sussex.ac.uk https://github.com/m4gpi/interpretable_bioacoustic_classifiers/ https://m4gpi.github.io/interpretable_bioacoustic_classifiers/

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00