Mapping Cultural Ecosystem Service Flows from Social Media Imagery with Vision–Language Models: A Zero-Shot CLIP Framework

doi:10.32942/x29s8c

Mapping Cultural Ecosystem Service Flows from Social Media Imagery with Vision–Language Models: A Zero-Shot CLIP Framework

2025 · doi:10.32942/x29s8c

preprint OA: closed CC-BY-4.0

🔓 Open OA copy Full text JSON View at publisher

Full text 2,988 characters · extracted from oa-doi-fallback · click to expand

This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint. You must log in to post a comment. There are no comments or no comments have been made public for this article. This is a Preprint and has not been peer reviewed. This is version 1 of this Preprint. Add a Comment You must log in to post a comment. Comments There are no comments or no comments have been made public for this article. Geotagged social media imagery provides a valuable source for mapping cultural ecosystem service (CES) flows, which represent realized human interactions with nature, yet its open-world user-generated content poses challenges to automated content analysis. Supervised models require large labeled datasets and show limited generalization across contexts, whereas unsupervised approaches often need post-hoc interpretation. Vision–language models offer a promising alternative but remain largely unexplored in CES research. We present a label-efficient framework that leverages the open-source Contrastive Language–Image Pretraining (CLIP) model to classify and map 12 CES flows across Florida using only 120 labeled images. Five CLIP variants and three prompt strategies were benchmarked to evaluate zero-shot performance under closed-set conditions, and three CLIP-based pipelines with differing supervision levels were compared to address the open-set challenge of filtering irrelevant content. Mixed class-specific prompts increased closed-set accuracy to 97%. Under open-set conditions, a hybrid pipeline combining a lightweight binary classifier with zero-shot CLIP inference achieved the strongest performance (accuracy = 88%; F1-macro = 0.88; F1-other = 0.91), demonstrating major gains in label-efficiency and open-set robustness. Statewide flow maps reveal consistent hotspots for outdoor recreation, wildlife viewing, and landscape aesthetics along coastal areas and major inland greenspaces, extending beyond formal park systems into urban greenspaces and other natural and working lands. The resulting map products and interactive web application provide actionable tools for identifying CES hotspots and the landscapes that support human–nature interactions. Overall, this study demonstrates the transformative potential of foundation VLMs for large-scale CES assessment using social media imagery. https://doi.org/10.32942/X29S8C Computational Engineering, Computer Sciences, Natural Resources and Conservation, Nature and Society Relations, Sustainability cultural ecosystem services, Contrastive Language-Image Pre-training, vision–language model, Zero-Shot Learning, open-set recognition, social media imagery, natural and working landscapes Published: 2025-12-10 22:10 Last Updated: 2025-12-10 22:10 CC BY Attribution 4.0 International Data and Code Availability Statement: Interactive CES maps related to this study are available at: https://es-geoai.rc.ufl.edu/agroes-ces-clip/. Open data/code may be released in a future update. Language: English

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-23T02:00:01.238055+00:00

License: CC-BY-4.0