Full text
1,976 characters
· extracted from
oa-doi-fallback
· click to expand
Abstract
Artificial Intelligence (AI) is rapidly transforming healthcare, but also raising concerns about algorithmic biases that mostly stem from the training data. It is widely supported that transparent dataset documentation is key to enabling responsible AI development. Several standardized dataset documentation approaches have been established, such as Datasheet, Dataset Nutrition Label, Accountability Documentation, Healthsheet, and Data Card. However, their suitability and usage for health datasets remain unclear. In this work, we compared all five approaches and evaluated their alignment with the STANDING Together Recommendations for Documentation of Health Datasets. We also investigated their real-world usage and gathered insights from generators and consumers of health datasets. Our findings reveal that none of these documentation approaches are used widely or fully suited for health datasets. We recommend developing a standard documentation approach for health datasets along with clear guidelines and automation tools to support adoption.
Competing Interest Statement
The authors have declared no competing interest.
Footnotes
Data availability
The data associated with this manuscript consists of several Excel files (mentioned in the Methods and Results section). Since no FAIR guidelines were found for structuring such data, we structured it according to the SPARC Data Structure (SDS), which provides a broad data and metadata structure to organize biomedical research data in line with the FAIR principles.27 The SPARC data curation software SODA for SPARC was used to organize the data and prepare the metadata files.28,29 The dataset is maintained in a GitHub repository called “dataset-documentation-paper-data” in the AI-READI GitHub organization, and the version associated with this manuscript (v1.0.0) is also archived on Zenodo.30 This data is shared under the permissible Creative Commons Attribution 4.0 International (CC-BY) license.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.