Computer Vision Models Offer Scalable Species Detection From Social Media Photographs

preprint OA: closed CC-BY-4.0
Full text 2,551 characters · extracted from oa-doi-fallback · click to expand
This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint. You must log in to post a comment. There are no comments or no comments have been made public for this article. This is a Preprint and has not been peer reviewed. This is version 3 of this Preprint. Add a Comment You must log in to post a comment. Comments There are no comments or no comments have been made public for this article. Social media platforms have emerged as a promising source of data for biodiversity monitoring, due to the vast amounts of user-generated visual content. However, the unstructured and noisy nature of social media data poses challenges for accurate species identification. Foundation vision models present an innovative methodology for identifying a large diversity of species from photographs, however, they are yet to be robustly tested on messy social media data. This study explores the utility of foundation vision models in identifying species from social media images, focusing on charismatic species such as lions, cheetahs, and gorillas. We manually labeled a dataset of images from Flickr, taken in zoos across the United States, to establish a ground truth for species presence. We evaluated the performance of three models: (i) CLIP with binary prompts ("species name is present/species name is not present"), (ii) a categorical model with common object categories (e.g., “plant,” "building," "vehicle," and "expected species name"), and (iii) BioCLIP, a fine-tuned version of CLIP designed specifically for species identification. Our analysis revealed that the binary presence/absence model struggled with the noisy social media data, leading to low accuracy. The categorical model showed an improvement in true positive rates but continued to produce a large number of false positives. BioCLIP, while not achieving the highest accuracy, demonstrated superior performance in minimizing false positives, which is crucial for biodiversity monitoring where incorrect detections can have significant consequences. Precision-recall analysis using presence-only data indicates their potential in real-world applications where presence detection is prioritized. Our findings suggest that foundation vision https://doi.org/10.32942/X21935 Life Sciences Artificial Intelligence, social media, biodiversity Published: 2025-04-22 17:03 Last Updated: 2025-04-22 17:03 CC BY Attribution 4.0 International Conflict of interest statement: None Data and Code Availability Statement: Open data/code are not available. Language: English

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0