Abstract
BACKGROUND Significant diagnostic delays are common in primary ciliary dyskinesia (PCD), a rare disease that is significantly underdiagnosed. Scalable screening methods could improve early identification and health outcomes.
RESEARCH QUESTION Can machine learning (ML) be used to screen for PCD in pediatric patients?
STUDY DESIGN AND METHODS We evaluated the feasibility of a random forest model to screen for PCD using data from the PCD Foundation Registry and a national claims database. We identified a cohort of pediatric patients with diagnostic codes indicative of conditions potentially associated with PCD, and studied diagnostic, procedural, and pharmaceutical codes associated with PCD to develop ML features. Models were trained on composite claims data from confirmed patients with PCD, patients with Q34.8 (Specific Congenital Malformation of the Respiratory System) diagnosed within six months of an Electron Microscopy procedure (Q34.8+EM), and a randomly-selected, matched control group. Model performance was tested through 5-fold cross-validation.
Results
Using 82 confirmed PCD cases and 4,161 matched controls, the model demonstrated variable performance (positive predictive value 0.45–0.73, sensitivity 0.75–0.94). Synthetic data augmentation did not improve results (positive predictive value 0.45–0.67, sensitivity 0.71–1.00). Expanding the dataset to include 319 Q34.8+EM patients and 8,214 controls improved performance (positive predictive value 0.51–0.54, sensitivity 0.82–0.90), suitable for screening. In a cohort of 1.32 million pediatric patients, 7,705 were classified as positive, consistent with the estimated prevalence of PCD (1:7,554).
INTERPRETATION This study demonstrates the feasibility of using ML to screen for PCD using claims data, even in the absence of a specific International Classification of Disease (ICD) code. Such screening approaches may aid in the identification of individuals who may benefit from timely diagnostic testing and targeted interventions.
Competing Interest Statement
Dr. Shapiro is a member of the Advisory Boards for the Primary Ciliary Dyskinesia Foundation, Parion Sciences, Ethris GmbH, and ReCode Therapeutics. He receives salary support from the Primary Ciliary Dyskinesia Foundation and grant funding from the Chest Foundation and the National Institutes of Health.
Funding Statement
Shapiro and Milla - Research funding support: US NIH/ORDR/NCATS/NHLBI - U54HL096458, 1U01HL172658-01
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Protocol number PCDFR001 approved by the Genetic Alliance Institutional Review Board. This study was not registered.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors
ABBREVIATION LIST
- PCD
- primary ciliary dyskinesia
- Q34.8+EM
- presence of an Q34.8 diagnosis code within six months of an electron microscopy procedure code
- ML
- machine learning
- PCDFR
- PCD Foundation Registry
- ICD
- International Classification of Disease
- ADASYN
- adaptive synthetic