RuHere (Are You Here?): An R package to obtain, validate, and clean species records using metadata and specialist range information

preprint OA: closed
Full text JSON View at publisher
Full text 1,982 characters · extracted from oa-doi-fallback · click to expand
Abstract Species occurrence data are fundamental to understanding, predicting, and conserving global biodiversity. However, biodiversity datasets remain affected by substantial data-quality issues, particularly erroneous or imprecise geographic coordinates. Most available tools for identifying problematic records rely primarily on automated spatial or metadata-based checks and rarely integrate expert-curated species range information, which can reveal introductions or geographic errors that often escape standard validation procedures. Here, we introduce RuHere, an R package designed to manage species occurrence data, flag potential errors, and support the iterative exploration of problematic records. RuHere streamlines the data-cleaning process by integrating six main steps: (1) obtaining species occurrence records; (2) merging datasets and standardizing spatial information; (3) flagging records based on metadata; (4) flagging records using expert-derived distribution data; (5) visualizing, investigating, and summarizing flagged issues in the final datasets; and (6) exploring and reducing sampling bias. We demonstrate the applicability of RuHere using occurrence data for a plant species (Araucaria angustifolia) and an animal species (Cyanocorax caeruleus). Nearly 75% of records were flagged as potentially problematic, including records identified exclusively by functions relying on specialist range information. The main strengths of RuHere lie in its integrated and computationally efficient workflow, its tools for exploring and evaluating flagged records, and its ability to incorporate expert-derived distribution data to identify occurrences outside a species’ known natural range. By combining metadata-based checks, coordinate validation, and specialist knowledge, RuHere provides a robust and reproducible framework for improving the quality of species occurrence datasets. Competing Interest Statement The authors have declared no competing interest.

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00