Full text
2,551 characters
· extracted from
oa-doi-fallback
· click to expand
This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
You must log in to post a comment.
There are no comments or no comments have been made public for this article.
This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint.
Add a Comment
You must log in to post a comment.
Comments
There are no comments or no comments have been made public for this article.
Social media platforms have emerged as a promising source of data for biodiversity monitoring, due to the vast amounts of user-generated visual content. However, the unstructured and noisy nature of social media data poses challenges for accurate species identification. Foundation vision models present an innovative methodology for identifying a large diversity of species from photographs, however, they are yet to be robustly tested on messy social media data. This study explores the utility of foundation vision models in identifying species from social media images, focusing on charismatic species such as lions, cheetahs, and gorillas. We manually labeled a dataset of images from Flickr, taken in zoos across the United States, to establish a ground truth for species presence. We evaluated the performance of three models: (i) CLIP with binary prompts ("species name is present/species name is not present"), (ii) a categorical model with common object categories (e.g., “plant,” "building," "vehicle," and "expected species name"), and (iii) BioCLIP, a fine-tuned version of CLIP designed specifically for species identification. Our analysis revealed that the binary presence/absence model struggled with the noisy social media data, leading to low accuracy. The categorical model showed an improvement in true positive rates but continued to produce a large number of false positives. BioCLIP, while not achieving the highest accuracy, demonstrated superior performance in minimizing false positives, which is crucial for biodiversity monitoring where incorrect detections can have significant consequences. Precision-recall analysis using presence-only data indicates their potential in real-world applications where presence detection is prioritized. Our findings suggest that foundation vision
https://doi.org/10.32942/X21935
Life Sciences
Artificial Intelligence, social media, biodiversity
Published: 2025-04-22 17:03
Last Updated: 2025-04-22 17:03
CC BY Attribution 4.0 International
Conflict of interest statement:
None
Data and Code Availability Statement:
Open data/code are not available.
Language:
English
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.