Evidence-based recommendations for application of construct-based splicing data in clinical variant classification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Evidence-based recommendations for application of construct-based splicing data in clinical variant classification Daffodil M Canson, George A R Wiggins, Eladio A Velasco-Sampedro, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9081705/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 9 You are reading this latest preprint version Abstract Background: Minigene RT-PCR assays are widely used to assess variant impact on splicing, with increasing reports of massively parallel splicing assays (MPSAs) demonstrating potential to upscale diagnostic use of construct-based data. This study conducted a comprehensive evaluation of > 41,000 variants from construct-based splicing assays, to build evidence-based recommendations to support the consistent application of such assays in clinical variant interpretation. Methods: Seven MPSAs were reviewed for design limitations, and their discriminatory performance evaluated by assessing: assay score distribution; consistency of splice-impact thresholds with expectations based on SpliceAI predictions. A traditional minigene RT-PCR dataset comprising 673 variants from 14 studies was analysed to: assess potential for SpliceAI score to predict level of aberration; demonstrate performance of SAI-10k-calc to accurately predict variant-induced splicing events; calibrate evidence strength towards or against spliceogenicity based on SpliceAI score. Traditional minigene results were compared to patient-derived RNA results, and calibrated for evidence strength towards or against pathogenicity using assertions from ClinVar. Results: MPSA datasets lacked specific information on variant-induced transcripts and had design limitations preventing detection of some aberrant splice events. Assay scores generated by five of seven MPSAs were unable to differentiate aberrant from natural splicing events. Traditional minigene results showed high predictive agreement between predicted and observed variant-induced events: SpliceAI score of 0.285 showed 90% sensitivity and 90% specificity for separating no/low versus intermediate-to-complete aberration, and agreement was 87% for aberration type using SAI-10k-calc. Minigene and patient-derived RNA results showed high agreement (45% complete concordance, 44% high concordance for predominant transcripts). Clinical calibration of minigene results showed that high (≥ 80%) or low (≤ 20%) expression of variant-induced LOF transcripts provide strong evidence towards or against pathogenicity, respectively. Evidence-based recommendations for design and critique of construct-based assays were built based on these and other findings. Conclusions: MPSA screens require review of design limitations and performance evaluation before considering their suitability for clinical variant interpretation. Well-designed multi-exon minigene assays can provide quantitative RNA results to supplement patient-derived RNA findings, and clinical calibration justifies their use in the diagnostic setting. Altogether, these findings support more consistent application of RNA evidence in clinical practice. RNA splicing Variant interpretation Construct-based assays Massively parallel splicing assays (MPSA) Minigenes SpliceAI SAI‑10k‑calc Clinical calibration ACMG/AMP guidelines Loss‑of‑function (LOF) Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 BACKGROUND Identifying germline pathogenic variants is important for directing clinical care of patients and their families. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published a framework for classifying variants using multiple categories and degrees of evidence [1]. However, interpreting the clinical significance of variants in disease susceptibility genes remains a significant diagnostic challenge. Furthermore, the body of genetic knowledge contributing to variant classification is notably biased according to ethnicity. For example, allele frequency data used to guide the assessment of genetic variants are not representative of minority population groups, contributing to inequities in the delivery of genetic health. RNA diagnostics has emerged as an important strategy for establishing the clinical relevance of gene variants. Recent studies have shown that RNA sequencing (RNA-seq) improves the diagnostic yield by up to 25%, enabling more informed clinical management of patients [2–6]. Importantly, RNA diagnostics also helps mitigate the reliance on biased population-specific reference data, offering an orthogonal approach to DNA-based variant interpretation. The current ACMG/AMP framework [1], supplemented with splicing-specific recommendations [7], denotes five ACMG/AMP codes for capturing potential splicing impact based on variant location, including: 1) splice donor/acceptor ± 1,2 dinucleotide positions (PVS1); 2) variants located at other positions (PP3, BP4); 3) additional evidence against pathogenicity for synonymous or intronic variants with no predicted impact on splicing and thus no other likely mechanism (BP7); and 4) similarity of predicted impact compared to a known (likely) pathogenic variant (PS1). While bioinformatic tools that predict variant impact on splicing play a critical role as the initial step in variant assessment, RNA assay data adds value by: 1) confirming that a variant is spliceogenic i.e. alters native splicing profile compared to controls; 2) determining whether the impact on splicing is complete or partial; and 3) revealing the altered transcript profile, to infer protein-level impact e.g., spliceogenic predicted loss of function (LOF), functional or uncertain function [8]. Assay-based detection of splicing using RT-PCR and RNA-seq of RNA from patient samples has been used to measure impact on splicing associated with the variant allele, ranging from complete (i.e., no reference transcript), partial (i.e., varying levels of reference transcript), to none [9–11]. However, results can vary based on factors such as tissue source, transcript isoform complexity, allele expression bias, and assay sensitivity [12], and it is difficult to track a partial effect for intronic variants using patient RNA. These limitations have led to the application of allele-specific minigene models in the research and diagnostic setting, not only to quantify allele-specific variant effects on splicing, but to enable variant impact assessment where patient RNA is unavailable [13–15]. More recently, massively parallel splicing assays (MPSAs) have enabled high-throughput functional assessment of thousands of variants in a single experiment [16–22]. These studies generate large datasets capturing quantitative effects of sequence variation on splicing and potentially provide a rich resource for benchmarking computational prediction tools and informing clinical variant interpretation. The existing ClinGen recommendations for use of splicing prediction and assay data provide high-level considerations for application of construct data for variant classification, and suggest “a conservative approach would be to apply information from construct data alone at lower weight in the absence of calibration of an experimental system against clinical data for proven spliceogenic and non-spliceogenic variants” [7]. Despite the increased availability of splicing data from construct-based assays, and their potential to provide valuable insights into the type and level of impact induced by sequence variants, for MPSA data especially there has been limited evaluation of splicing data consistency across assay platforms, cell models, and the pipelines used to process RNA data. In this study, we conducted a comprehensive analysis of the spliceogenic effects of approximately 41,000 variants that utilised a construct-based MPSA method or RT-PCR detection of traditional minigenes, in order to provide recommendations about the utility of such data for diagnostic variant classification following the ACMG/AMP framework. METHODS Study datasets RNA splicing data used in this study were accessed from published MPSA datasets and strategically selected construct-based RT-PCR datasets. We analyzed a total of 21 published datasets − 7 MPSAs and 14 RT-PCR (details provided in Additional file 1: Table S1 ). The MPSA datasets selected for analysis met the following inclusion criteria: 1) used a construct-based assay approach to assess the impact of sequence variants on RNA splicing; and 2) analysed more than 100 variants. To limit potential variability due to high-level differences in assay design and execution, the RT-PCR datasets selected for inclusion were from a single laboratory (author E.A.V.-S.), all of which used multi-exon minigene constructs to assess gene variants in established cancer susceptibility genes and applied the same method to semi-quantify the transcript products: 1) transfected cells were treated with cycloheximide, an inhibitor of nonsense mediated decay (NMD); and 2) fluorescent RT-PCR products were assessed by capillary electrophoresis on an automated DNA sequencer. We termed the combined RT-PCR datasets as the traditional minigene assay dataset in this study. MPSA data preparation Data from each study was downloaded from either the manuscript, supplementary information or an online repository (Additional file 1: Table S1 ). A step-wise and study-specific approach was taken to identify the relevant transcript for SpliceAI prediction. O’Neill et al. (2024) [18] and Rong et al. (2023) [21] provided the relevant Ensembl or RefSeq transcript identifiers. For Adamson et al. (2018) [16] and Chong et al. (2019) [17], transcripts were identified based on matching the provided exon coordinates with MANE and RefSeq transcripts exon coordinates. Patel et al. (2021) [19], Soemedi et al. 2017) [22] and Rhine et al. (2022) [20] provided no exon coordinates for the respective assay designs, therefore transcripts were selected based on the variant coordinate, prioritising the MANE transcripts. Additionally, for Soemedi et al. (2017) [22], we assumed alleles were provided for the RNA transcript and not DNA. As SpliceAI requires variants to be annotated based on the DNA sequence, the allele information provided by Soemedi et al. (2017) [22] was compared to the DNA reference at the given locus; for variants discordant to the DNA reference (n = 2,483), the complement of reported reference and variant allele was used. See Additional file 1: Table S2 for detailed summary of MPSA designs, characteristics and study-reported thresholds, including limitations to detect certain types of aberrations. Evaluation of MPSA data for clinical application We employed a multi-step approach to evaluate the discriminatory performance of MPSAs and establish their potential clinical utility. To investigate whether each MPSA effectively differentiated spliceogenic from non-spliceogenic variants using study-reported thresholds, the distribution of assay scores (e.g. delta percentage spliced in (PSI)) were visually assessed using a histogram; the presence of a bimodal or multi-modal distribution was considered as evidence for the ability of the assay to demarcate spliceogenic and non-spliceogenic variants. Unimodality was tested using Hartigan’s dip test with the dip.test package in R. We assessed if the study-reported thresholds for detecting spliceogenic variants for different MPSA datasets were consistent with expectations based on SpliceAI predictions, using max delta score (hereafter referred to as SpliceAI score) thresholds from previous calibration of this tool for use in ACMG/AMP variant classification [7]. Splicing impact was only calculated for the transcript used in the relevant construct (see section ‘MPSA data preparation’). To determine if spliceogenic variants with predicted high impact were enriched within the splice site motifs as expected, particularly at the splice donor/acceptor ± 1,2 dinucleotide positions, we visually examined the assay score distribution based on variant location. To further examine the capacity of MPSAs to detect variants highly likely to be spliceogenic, we selected the subset of variants located at the splice donor/acceptor ± 1,2 dinucleotide positions, and plotted their assay and SpliceAI scores to determine if the MPSA consistently identified variants predicted to have impact as spliceogenic. Data formatting and standardization for traditional minigene studies Minigene splicing data for 673 variants in seven cancer susceptibility genes ( ATM, BRCA2, CHEK2, PALB2, RAD51C, RAD51D , and TP53 ) were extracted from 14 publications (Additional file 1: Table S1 ), and formatted for consistent presentation of semi-quantitative RT-PCR results (Additional file 1: Table S3). Assay results indicated the proportions of specific transcript products and the corresponding functional consequence in terms of effect on reading frame and the introduction of premature termination codon (PTC). To determine the level of splicing aberration induced by a variant relative to the level produced by the wild type (WT) construct, the reduction of canonical transcript (CT) was computed using this formula: [(CT WT – CT variant )/CT WT ] x 100. A 100% CT reduction indicated complete splicing aberration (that is, no canonical transcript produced by the variant allele). The variants were binned into four levels of splicing aberration based on CT reduction: high/complete (≥ 80% to 100%), intermediate (> 20% to 0% to ≤ 20%), or no aberration (≤ 0%). The no aberration category included variants that increased the level of canonical transcript (that is, negative measure of CT reduction). Performance evaluation of splicing prediction tools using the traditional minigene dataset Prediction tool evaluation was designed to build on our previous work, where SpliceAI was selected as the best-performing tool, and used to exemplify tool calibration providing evidence against/towards spliceogenicity [7]. Performance evaluation covered two aspects critical for the use of prediction data in variant curation. First we assessed if SpliceAI [23] score could be used to predict different levels of aberration: no vs. low-to-complete, no/low vs. intermediate-to-complete, and no-to-intermediate vs. high/complete. The maximum raw SpliceAI score was defined as the maximum probability of altered splicing across the four output probabilities at a maximum distance of 10,000 nucleotides (± 4,999 nucleotides from the variant of interest). To determine the optimal thresholds for separating the aberration levels, we performed Receiver Operating Characteristic (ROC) analysis using the pROC R package [24] and obtained the Youden index, which helped identify the optimal binary cut-off point on the ROC curve where the test performs best, balancing sensitivity and specificity. To determine the magnitude and direction of the correlation between the SpliceAI prediction and level of splicing aberration, we computed Kendall’s tau. We then examined the ability of the SAI-10k calculator (SAI-10k-calc) [25], recently modified to incorporate SpliceAI alternate scores [26], to accurately predict the variant-induced splicing events. For this assessment, variants with intermediate to complete aberration (> 20% CT reduction) were used as the positive control set, and no/low aberration (≤ 20% CT reduction) as the negative control set. The concordance of SAI-10k-calc transcript predictions with the minigene assay results were calculated based on the following criteria: concordance of positive predictions (one or more predicted events were detected at > 20% of total transcript pool), concordance of negative predictions (agreement between predicted no aberrant transcript and no/low aberration detected in the assay), and overall concordance (total concordant positive and negative predictions). The SAI-10k-calc algorithm for predicting exon skipping and intron retention requires the loss of both native acceptor and donor splice sites. In instances where at least one of the SpliceAI-predicted losses affects a cryptic splice site, SAI-10k-calc flags the variant with an annotation “lost site/s do not match consensus”; in this study, we modified the flag into “lost site/s do not match native site” for clarity. For variants flagged by SAI-10k-calc with “lost site/s do not match native site” under the “Exon_skipping_aaseq” or “Intron_retention_aaseq” fields, we manually inspected the SpliceAI scores using SpliceAI-visual [27] based on the decision flowchart in Additional file 2: Fig. S1 to obtain their final transcript prediction. For variants where the predicted aberrant transcript was absent or expressed at low level and other dominant transcript/s were detected in the assay, we additionally checked if the predicted and observed transcripts had the same functional consequence (PTC or in-frame), and considered them as concordant in a separate comparison. We also assessed whether knowledge of naturally occurring splicing events improved the performance of SAI-10k-calc to predict splicing events, by considering information captured in the SpliceVault resource [28], a database of common splicing events observed in reference RNA-sequencing samples. To obtain the SpliceVault 300K-RNA Top-4 alternative splicing events [28], we used the Ensembl Variant Effect Predictor (VEP) v113.4 [29] with the SpliceVault VEP plugin ( https://github.com/Ensembl/VEP_plugins/blob/release/113/SpliceVault.pm ) , which annotates variants with SpliceAI-predicted loss of native splice sites only. We chose the Ensembl transcript identifier equivalent to the RefSeq MANE Select transcript. SpliceVault results were considered as concordant if the Top-4 events matched with at least one aberrant transcript exhibiting > 20% expression in the minigene assay. Since native splice site loss is mostly caused by splice site motif disruption, SpliceVault concordance calculations were limited to variants in the splice donor and acceptor motifs, defined as in Walker et al., 2023 [7] (donor motif - the last three nucleotides of the exon to six nucleotides downstream of the exon; acceptor motif - the first nucleotide of the exon up to 20 nucleotides upstream of the exon). Calibration of SpliceAI for computational code application, using quantitative RNA results The likelihood ratio (LR) of spliceogenicity for variants located outside the splice donor/acceptor ± 1,2 dinucleotide positions was estimated using the formula previously reported [30]. We initially used the previously established SpliceAI score thresholds [7] based on analysis of an RNA splicing data truth set that did not capture quantification of transcript levels (that is, categorization of splicing aberration was binary, recorded simply as Yes or No). Quantitative splicing data from the traditional minigene dataset analyzed in this study enabled an assessment of SpliceAI thresholds for optimal prediction of no aberration versus aberration at a given minimum level, using the Youden index to estimate the optimal thresholds. Thresholds were estimated for best separation of no aberration versus low-to-complete, and for no/low aberration versus intermediate-to-complete aberration. We then used these thresholds to estimate LRs towards spliceogenicity (at the denoted minimum levels), and equated each LR with an evidence strength category according to the Bayesian framework of the ACMG/AMP variant classification guidelines [31]. Evaluation of traditional minigene RNA results for diagnostic variant classification Comparison of minigene and patient-derived RNA results: Traditional minigene assay results were compared with previously published splicing data derived from patient RNA. For positive patient RNA assay outcomes, concordance was assessed between the traditional minigene assay dataset and reported patient findings for all seven genes. Concordant results were defined as either complete concordance of all detected transcripts or high concordance where a single patient-derived transcript corresponded to the most abundant transcript observed in the minigene assay, or vice versa. Discordant results were defined as complete disagreement, where no transcripts overlapped between the two systems, or low concordance where patient transcript(s) aligned only with minor transcripts from the minigene assay. For variants with multiple patient RNA findings, outcomes were prioritised in the following order when assigning concordance: complete, high, low, and no concordance. For example, if a variant had two patient findings, one a complete concordance and the other a high concordance involving the most abundant transcript, it was classified as a complete concordance. Negative patient RNA assay outcomes were evaluated only for BRCA2 , leveraging our previously published in-house resource of splicing data extracted from publications, that provides comprehensive splicing information for this gene [7]. Clinical calibration to derive strength of evidence applicable to construct-based RNA findings All variant annotations relevant for clinical calibration are provided in Additional file 1: Table S4. Classifications were extracted from ClinVar (October 2025), and crossmatched to the variants in the traditional minigene assay dataset. A total of 513 variants were reported at least once in ClinVar, with summary classification as follows: 12 Benign (B); 10 Benign/Likely benign (B/LB); 43 Likely benign (LB); 115 Uncertain significance (VUS); 111 conflicting classifications; 41 Likely pathogenic (LP); 81 Pathogenic/Likely pathogenic (P/LP); 100 Pathogenic (P). For variants with conflicting classifications of pathogenicity, application of a simple majority rule resolved summary classification for 109 variants: 13 B/LB, 6 LB, 73 VUS, 9 LP, 8 P/LP. As a critical step to perform a calibration that correlates the construct-based findings with clinical evidence of variant pathogenicity (and not simply variant spliceogenicity), RNA findings were reviewed to determine the percentage of transcripts predicted to cause gene LOF (RNA transcripts predicted non-coding, predicted protein truncating leading to NMD, predicted protein truncating without NMD but impacting a critical domain, and/or predicted to encode proteins lacking critical structural/functional motifs [8]. A transcript was assigned as predicted LOF if annotated as a PTC transcript, or if in-frame and eligible to be assigned at least moderate weight as reported in the original minigene RT-PCR publications (Additional file 1: Table S1 ), or for BRCA2 , following the ClinGen ENIGMA Variant Curation Expert Panel (VCEP) specifications V1.2 [32]. The summed percentage of all predicted LOF transcripts observed for each variant was categorised as follows: ≤20%; >20% and < 50%; ≥50% and < 80%; ≥80%. Variants were assigned a variant type using VEP consequence annotation, to allow consideration of an alternative mechanism of pathogenicity for variants annotated as nonsense, frameshift or missense variants at the DNA level; stratification of splice region variants according to location inside or outside the highly conserved splice site dinucleotide positions. Variants were excluded from the calibration reference sets if they met the following criteria: summary classification as VUS (n = 188) or conflicting (n = 2); summary classification P/LP and DNA level consequence as nonsense, frameshift or deletion within a critical domain (n = 43), of which only 5 displayed a high level of splice impact; or summary classification P/LP and missense consequence (n = 28), of which only 7 displayed high level of splice impact. Variants considered to be potential outliers for calibration were then reviewed for pathogenicity. This resulted in the exclusion of two variants with a (likely) pathogenic assertion and ≤ 20% predicted LOF transcripts. BRCA2 c.9501G > A (p.Glu3167=), with a single submitter (BIC), no assertion criteria, no star; classification VUS following the ClinGen ENIGMA BRCA1/2 VCEP specifications V1.2 [32], meeting PM2_Supporting (1 point), and no bioinformatic code (SpliceAI score 0.16). PALB2 c.108 + 2T > C, LP with a single submitter and no RNA data; this variant is designated as no PVS1 code applicable following the ClinGen Hereditary Breast Ovarian Pancreatic VCEP specifications V1.2 [33]. Classification remained unchanged for another five potential outlier variants: two B/LB classification and ≥ 50% predicted LOF transcript from construct-based assays ( ATM c.1066-6T > G; BRCA2 c.441A > G p.Gln147=); two P/LP PALB2 variants with complete impact on splicing in construct-based assays, but with A and c.211 + 2T > C); and similarly the BRCA2 c.9257-1G > C variant with multiple P submissions (some supported by internal laboratory data), and complete impact on splicing resulting in an in-frame transcript reaching only PVS1_Supporting level. Additional review of ClinVar entries was then performed for all reference set variants located outside of the splice donor/acceptor ± 1,2 dinucleotide positions, to assess potential for circularity due to use of the minigene data as the only RNA source for an individual variant classification (Additional file 1: Table S5). An additional variant, BRCA2 c.7875A > G (p.Arg2625=), was excluded since the two submitters that did not use the minigene data had conflicting assertions of VUS or LB.The final reference sets for clinical calibration were comprised of 83 B/LB variants (all outside the splice donor/acceptor ± 1,2 dinucleotide positions), and 166 P/LP variants (28 outside the splice donor/acceptor ± 1,2 dinucleotide positions). Likelihood ratio (LR) estimates towards or against pathogenicity were estimated by comparing the proportion of variants meeting different splice percentage categories within the benign versus pathogenic reference sets, using the statistical method detailed previously [30]. The LR estimates were then used to assign weights for or against pathogenicity following recommendations arising from Bayesian modelling of the ACMG/AMP guidelines [31]. RESULTS Characterisation of MPSA study designs To determine whether MPSA results can be used for clinical variant classification, we evaluated two types of splicing studies: 1) gene-specific assays that assessed the impact of variants in one or a few genes for which the clinical validity of gene-disease relationship is established [18, 19] and 2) broad MPSA screens including hundreds of genes that were selected irrespective of gene-disease mechanism [16, 17, 20–22]. The study by Patel and colleagues used a construct with a 500 bp exon-intron-exon test sequence including partial exonic and intronic sequences surrounding the splice region (Fig. 1 A), and all remaining studies evaluated constructs with a single test exon inserted between constant vector exons (Fig. 1 B). The gene-specific ParSE-seq assay [18] used minigene constructs with a similar design to traditional single test exon splicing assays such that a wide range of exon sizes of up to ~ 350 bp and 125–250 bp of flanking intronic sequences were included (Additional file 1: Table S2 ). The broad MPSA screens, including Massively Parallel Splicing Assay (MaPSy) [20–22], Variant exon sequencing (Vex-seq) [16], and Multiplexed Functional Assay of Splicing using Sort-seq (MFASS) [17], used minigene constructs with inserted sequences limited to ~ 170 bp. These broad screens included test exons of ≤ 120 bp in size and shorter flanking intronic sequences with a lower limit of 40–50 bp upstream and 15–30 bp downstream of the exon. To allow for the inclusion of variants in larger exons (120–500 bp), Rong and colleagues designed a ‘half exon’ construct that incorporated the 100 bp of context-specific sequence (including the nearest splice site) with a common exon sequence and the other splice site. Generally, the minigene constructs were transfected into cells, and the splicing outcomes were measured in a pooled manner with sequencing data as quantitative readout of the splicing impact of variants (Fig. 1 C). For the MFASS method, splicing impact quantification was done by fluorescence-activated cell sorting based on green fluorescent protein and mCherry reporters [17] (Fig. 1 C). Detailed characterisation of study designs is shown in Additional file 1: Table S2 . MPSA performance evaluation All seven published MPSA datasets lacked information on specific splice isoforms produced by each variant assayed. For six of the seven MPSA datasets, the splicing assay scores were reported as the difference in the level of splicing (e.g. delta PSI) between the reference and variant constructs [16–18] or as an allelic ratio [20–22], and assay-specific thresholds determined whether or not a variant was considered to alter splicing (Additional file 1: Table S2 ). It should be noted that, the Vex-Seq and MFASS assays were limited to detecting exon skipping/inclusion, and at least three of the assays would (potentially) misclassify some splicing events as normal (Fig. 1 D). Further, although assay scores were intended to provide a measure of splicing impact, outputs from all assays did not differentiate between the types of splicing aberration detected. The remaining dataset by Patel and colleagues used a Fisher Exact test of the proportion of reads supporting “no splicing”, “normal splicing” and “aberrant splicing ” produced from the variant construct compared to the WT construct with “splice-affecting” variants defined by a p < 0.001 [19]. More detailed review of the study of Patel et al (2021) [19] revealed several issues that may compromise the clinical utility of their published assay results. The exon-intron-exon construct design could detect splice site loss or gain induced by splice region variants, but would be less likely to recapitulate the splicing aberrant transcripts in vivo . For example, this design could not detect variant-induced exon skipping (Fig. 1 A). The absence of splice donor/acceptor ± 1,2 dinucleotide variants in the assay precluded their use for assay readout calibration, and a previous publication [34] did not show data to support their report of 100% concordance for 10 positive control variants known to alter splicing. To verify concordance of predicted spliceogenic variants with splice assay results, we recalculated the p values using the Fisher exact test with normalised counts as described by the authors, but could not replicate the findings using the authors’ selected threshold for splice-affecting variants; 12 variants would have had different classification with our analysis. To reliably use splicing data generated by MPSA for variant curation, the assay scores from spliceogenic variants should be distinct from the distributions of non-spliceogenic variants. Only two studies [17, 18] had a multimodal distribution (Hartigan’s p <2x10 − 16 ) of assay scores that were separated by thresholds defined by the study (Fig. 2 ). Additionally, O’Neill et al. defined three categories to distinguish “abnormal”, “indeterminate” and “normal” impact on splicing for the ParSE-seq dataset (Fig. 2 ). As a second approach to assess the assay reliability, correlation of SpliceAI predictions with assay-defined spliceogenic and non-spliceogenic variants was performed (Fig. 2 ). Only data from ParSE-seq correlated well with SpliceAI predictions: spliceogenic variants reported high SpliceAI scores (mean = 0.79) and non-spliceogenic variants reported low SpliceAI scores (mean = 0.14). “Indeterminate” variants from ParSE-seq had a bimodal SpliceAI score distribution with variants at each end of the spectrum. All other MPSA studies had notable overlap of SpliceAI score distributions between reported spliceogenic and non-spliceogenic variants, suggesting misclassification. To evaluate the suitability of MPSA data for clinical application, we assessed whether each MPSA categorized variants as spliceogenic based on prior expectations, namely variant location in relation to splice donor and acceptor motifs, and SpliceAI score. An overview of findings is summarized in Additional file 2: Fig. 2 . Spliceogenic variants with predicted impact were enriched within the splice site motifs as expected, particularly at the splice donor/acceptor ± 1,2 dinucleotide positions (Fig. 3 , Additional file 2: Fig. S2 ). For those studies that assessed variants located within the splice donor/acceptor ± 1,2 dinucleotide positions [16–18], despite enrichment in spliceogenic variants, comparison of reported splice impact against SpliceAI predictions suggested differences in performance across the three methods (Fig. 3 ). For ParSE-seq, 97% (30/31) of splice donor/acceptor ± 1,2 dinucleotide variants were spliceogenic with only one variant ( SCN5A c.3957_3963 + 1dup) observed not to impact splicing; the SpliceAI score predicted no splicing impact, highlighting the accuracy of ParSE-seq for the splice donor/acceptor ± 1,2 dinucleotides. For Vex-seq, only 76% (16/21) of splice donor/acceptor ± 1,2 dinucleotide variants impacted splicing, and six variants had splicing assay results discordant with SpliceAI predictions (five predicted spliceogenic but with no observed splicing impact). For MFASS, 83% (180/217) of splice donor/acceptor ± 1,2 dinucleotide variants impacted splicing, with 35 (19%) of these ± 1,2 dinucleotide variants predicted as non-spliceogenic (SpliceAI score < 0.1). While specificity measures (Supplementary Table S6) could be interpreted to indicate that all MPSA were more accurate at identifying variants that did not impact splicing compared to those that did impact splicing, this reflects the fact that most variants assayed (87%) were located outside the splice region where variation has low prior probability to alter splicing. Traditional minigene assay dataset The dataset consisted of 600 single nucleotide variants (SNVs) and 73 indels from seven cancer susceptibility genes (Fig. 4 A): 359 were exonic, 307 were intronic, and seven were indels spanning the exonic and intronic regions. Of the 673 variants, 23% (135 SNVs and 20 indels) were located at or disrupted the splice donor/acceptor ± 1,2 dinucleotide positions. Seven of the 16 WT minigene constructs produced 100% canonical transcript, and the other nine exhibited background alternative splicing (Supplementary Table S7). To account for background splicing, we normalised the level of canonical transcript induced by all variants by calculating the CT reduction (Fig. 4 B). CT reduction distribution showed that the vast majority of variants resulted in either no or complete aberration (Fig. 4 C). Specifically, 37% (252/673) of the variants resulted in no aberration or increased the canonical transcript level, while 63% (421/673) had splicing impact (82 low, 73 intermediate, and 266 high/complete) (Additional file 1: Table S3). All but one of the splice donor/acceptor ± 1,2 dinucleotide variants showed a high/complete aberration (≥ 80% to 100% CT reduction). The PALB2 c.108 + 2T > C variant showed low impact on splicing with evidence of alternative splicing in GC substitution and therefore retains the ability to be processed by the U2-type spliceosome [35]. Evaluation of SpliceAI performance for predicting level of aberration Our analysis of the traditional minigene dataset showed that SpliceAI score for a variant generally increases with the categorized levels of variant-induced splicing aberration (Fig. 5 A, 5 B). There was a significant difference in SpliceAI scores (Kruskal-Wallis p < 0.0001) across the no (median = 0.025), low (median = 0.125), intermediate (median = 0.38), and high/complete (median = 0.96) aberration categories. Moreover, ROC analysis showed that SpliceAI can distinguish between different aberration level categories with good accuracy (Fig. 5 C, 5 D). The optimal thresholds to predict different ranges of aberration levels were: 0.185 for “no versus low-to-complete” (AUC = 0.924), 0.285 for “no/low versus intermediate-to-complete” (AUC = 0.955), and 0.45 for “no-to-intermediate versus high/complete” (AUC = 0.967). Importantly, the 0.285 threshold showed 90% sensitivity and 90% specificity for separating no/low versus intermediate-to-complete aberration, while the 0.45 threshold achieved 95% sensitivity and 89% specificity for separating high/complete from no-to-intermediate aberration (Fig. 5 D). We then tabulated results to assess the proportion of variants falling into the different aberration level categories for different SpliceAI score bins (Additional file 1: Tables S8 and S9), some selected to capture thresholds recommended previously e.g. <0.05, 0.1, 0.2, 0.5 and 0.8 [7, 23, 36]. For the 518 variants located outside the splice donor/acceptor ± 1,2 dinucleotide positions, there was convincing evidence for a correlation between SpliceAI score and level of aberration (τ = 0.66, p = 3.2 e-72) (Table 1 , Additional file 1: Table S8). Of the spliceogenic variants with SpliceAI score ≤ 0.05, 78% had low level impact and only 8% (3/26) had high/complete impact. For the latter, the results for two of three variants are indicative of impact via splicing regulatory elements which remain poorly predicted by SpliceAI and other tools [26, 36]. In contrast, for spliceogenic variants with SpliceAI score between ≥ 0.2 and < 0.5, 24% had low, 50% had intermediate, and 26% had high/complete aberration. At SpliceAI score ≥ 0.5, 72% of spliceogenic variants had high/complete aberration: 56% for those from ≥ 0.5 and < 0.8, and 83% for those with score ≥ 0.8. Notably, no variants with SpliceAI score ≥ 0.8 were shown to be non-spliceogenic in this dataset. Unsurprisingly, the correlation was even more striking when analyses included variants located at the splice donor/acceptor ± 1,2 dinucleotide positions (Additional file 1: Table S9), the vast majority of which had SpliceAI score ≥ 0.8 (τ = 0.76, p = 1.6e-121). Table 1 Distribution of variants outside the splice donor/acceptor ± 1,2 dinucleotide positions across different ranges of SpliceAI max delta scores. SpliceAI max delta score No. of variants Proportion in the spliceogenic set Non-spliceogenic set Spliceogenic set a Low Intermediate High/ complete Low Intermediate High/ complete ≤ 0.05 167 36 28 5 3 b 0.78 0.14 0.08 > 0.05 & ≤0.1 31 14 7 7 0 0.50 0.50 0.00 > 0.1 & <0.2 30 32 22 9 1 0.69 0.28 0.03 ≥ 0.2 & <0.5 17 54 13 27 14 0.24 0.50 0.26 ≥ 0.5 & 0% to ≤ 20%), intermediate (> 20% to G and c.451G > A) were located outside the donor/acceptor motifs and led to (multi-)exon skipping, indicating effect on splicing regulatory elements; the remaining false-negative ( PALB2 c.3113 + 3A > G) resulted in donor loss leading to cryptic donor activation and exon skipping. Correlation between SpliceAI score and level of aberration: τ = 0.66, p = 3.2 e-72. For variants outside the splice donor/acceptor ± 1,2 dinucleotide positions, further stratification of the bins for scores ≥ 0.2 indicated that, based on similarity of proportions in the spliceogenic set for different bin strata, there was slightly improved correlation between SpliceAI score and level of aberration when scores were binned as: ≥0.2 & <0.3; ≥0.3 & <0.4; ≥0.4 & <0.75; ≥0.75 (τ = 0.66, p = 1.4e-73; Additional file 1: Table S9). Calibration of SpliceAI for refined ACMG/AMP computational code application using quantitative traditional minigene assay data The ClinGen SVI Splicing Subgroup [7] previously performed a calibration to demonstrate the utility of SpliceAI to provide evidence towards or against spliceogenicity for variants located outside of the splice donor/acceptor ± 1,2 dinucleotide positions, providing the basis for selecting score thresholds for conservatively assigning computational codes BS4 and PP3. However, this previous analysis only considered whether a variant may result in aberrant splicing since information about the level of the variant-induced event/s was either lacking or measured heterogeneously across the different studies forming the reference datasets. Building on the findings above (Table 1 ; Additional file 1: Tables S8 and S9), we performed LR analysis to estimate the evidence strength applicable for SpliceAI score thresholds to distinguish different ranges of splicing aberration levels. Results are summarized in Table 2 . Using “no aberration” as the negative truthset, and thresholds previously shown to yield moderate evidence towards or against spliceogenicity [7], the LR for a SpliceAI score threshold of ≥ 0.2 for predicted aberrant splicing equated to a moderate evidence for spliceogenicity (low to complete aberration), score > 0.1 and < 0.2 equated to indeterminate evidence, but score ≤ 0.1 equated to supporting evidence for no aberration. Given that this semi-quantitative data would be expected to place previously undetected “low” level events in the positive reference set, we estimated LRs using alternative score thresholds and impact-level groups. Altering the lower threshold to ≤ 0.05 gave an LR equating to moderate evidence for no versus low-to-complete aberration, with > 0.05 and < 0.02 providing no evidence. Then, since there is data to indicate that low level of aberrant splicing (≤ 20% expression) is very unlikely to confer pathogenicity in the context of hereditary cancer genes at least [37, 38], while intermediate level aberrant splicing is adequate to confer pathogenicity for at least some genes [39, 40], we compared no/low as the negative reference set versus intermediate-to-complete aberration as the positive control set. Our analysis showed that SpliceAI score threshold of ≥ 0.3 provides moderate evidence for intermediate to complete aberration, and score ≤ 0.2 provides moderate evidence for no/low aberration. These thresholds showed 82% sensitivity and 86% specificity for predicting intermediate to complete aberration due to variants located outside the splice donor/acceptor ± 1,2 dinucleotide positions. For genes where lower levels of aberrant splicing (e.g. 10–20% expression) are proven to be associated with disease predisposition, other thresholds may be considered relevant for LR estimation and application of bioinformatic evidence of disease-associated splicing aberrations. Table 2 Likelihood ratio analysis of the SpliceAI max delta score for variants outside the splice donor/acceptor ± 1,2 dinucleotide positions. SpliceAI max delta score Negative set a Positive set a LR Low CI High CI Evidence strength b n Proportion n Proportion no aberration vs. low-to-complete aberration using score thresholds from previous calibration ≤ 0.1 198 0.79 50 0.19 0.24 0.18 0.31 Supporting (no aberration) > 0.1 & <0.2 30 0.12 32 0.12 1.01 0.63 1.61 Indeterminate ≥ 0.2 24 0.10 184 0.69 7.26 4.92 10.72 Moderate (low-to-complete aberration) TOTAL = 252 266 no aberration vs. low-to-complete aberration, using score thresholds set for this semiquantitative dataset ≤ 0.05 167 0.66 36 0.14 0.20 0.15 0.28 Moderate (no aberration) > 0.05 & <0.2 61 0.24 46 0.17 0.71 0.51 1.01 Indeterminate ≥ 0.2 24 0.10 184 0.69 7.26 4.92 10.72 Moderate (low-to-complete aberration) TOTAL 252 266 no/low aberration vs. intermediate-to-complete aberration,using score thresholds set for this semiquantitative dataset ≤ 0.2 287 0.86 26 0.14 0.16 0.11 0.23 Moderate (no/low aberration) > 0.2 & <0.3 13 0.04 8 0.04 1.11 0.47 2.62 Indeterminate ≥ 0.3 33 0.10 151 0.82 8.24 5.92 11.47 Moderate (intermediate-to-complete aberration) TOTAL 333 185 Abbreviation: LR, likelihood ratio; CI, confidence interval; n, number. a The negative and positive sets were adjusted according to the level of aberration being tested. Data drawn from traditional minigene assay dataset only. b Criteria thresholds as defined in Tavtigian et al., 2018 [31]. Evaluation of SAI-10k-calc and SpliceVault performance for predicting aberration type Of the 673 variants, 600 were SNVs enabling computational splicing assessment with SAI-10k-calc (Fig. 5 E), which automatically predicts the transcript type based on the combination of SpliceAI scores. For this assessment, we designated the variants with intermediate-to-complete aberration as the positive control set (n = 299), and no/low aberration as the negative control set (n = 301). This grouping was selected based on results shown in Table 2 , where the proportion of true positives (82%) and true negatives (86%) using SpliceAI score thresholds ≤ 0.2 and ≥ 0.3 were markedly higher than true positives (69%) and true negatives (79%) observed using previously recommended thresholds (≤ 0.1 and ≥ 0.2) from calibration using non-quantitative RNA findings. Detailed descriptions of concordance and discordance between predicted and observed transcripts are provided in Additional file 1: Tables S3 and S10. SAI-10k-calc aberrant transcript prediction for SNVs was 83% (496/600) concordant with minigene assay results (Fig. 5 F). Another 8% (50/600) of the SNVs were flagged by SAI-10k-calc and underwent manual inspection using SpliceAI-visual [27] to obtain the final transcript prediction (see Methods). SAI-10k-calc transcript prediction with additional SpliceAI-visual manual inspection achieved a slightly greater overall concordance of 84% (506/600) (Fig. 5 F); concordance was 79% (235/299) for positive prediction of aberration (as defined above, at least intermediate (> 20%) expression level), and 90% (271/301) for negative prediction of aberration (no/low aberration). For 18/27 variants assigned as discordant or partial concordance, where the predicted aberrant transcript was absent or expressed at low level and other dominant transcript/s were detected in the assay (coded as 3 + and 4 + in Additional file 1: Table S10), the predicted and observed transcripts had the same functional consequence (PTC or in-frame); considering these additional 18 variants with concordance in functional consequence, SAI-10k-calc overall concordance increased to 87% (524/600) (Fig. 5 F). We also assessed the concordance of SpliceVault [28] Top-4 alternative splicing events with minigene assay results (Additional file 1: Table S3). The SpliceVault VEP plugin returned Top-4 events for 39% (261/673) variants, including 242 SNVs and 19 indels (Fig. 5 E); of these, 254 variants were located in the splice donor and acceptor motifs. Since SpliceVault predicts the alternative splicing events including effect on reading frame chiefly when the native splice site motif is abrogated, we calculated the concordance of SpliceVault Top-4 using 291 positive controls (257 SNVs, 34 indels) located in the splice donor and acceptor motifs only. SpliceVault Top-4 events matched with at least one transcript with > 20% expression for 75% (192/257) of the positive control SNVs (Fig. 5 G) and 50% (17/34) of the indels. All variants returning SpliceVault predictions had SpliceAI score ≥ 0.2. For the same set of 257 positive control SNVs, we observed greater concordance for SAI-10k-calc alone (79%) and SAI-10k-calc with SpliceAI-visual (82%) (Fig. 5 G). Comparison of minigene and patient RNA results Our literature search identified 114 variants in the traditional minigene assay dataset with published patient RNA splicing assay results (Fig. 6 A), of which 91 variants demonstrated impact on splicing in patient assays (Additional file 1: Table S11) and 23 variants did not (Additional file 1: Table S12). Patient assays were conducted without NMD inhibition for 55%, only with NMD inhibition for 28%, and with or without NMD inhibition across different studies for 17%. For the 34 variants assayed in multiple patient RNA studies, 47% showed complete concordance, 32% high concordance, 12% low concordance (consistent outcomes in only a subset of assays), and 9% no concordance across all assays (Fig. 6 B). Variability across patient RNA results for the same variant could be ascribed to methodological issues, in particular failure to detect transcripts due to lack of NMD inhibition or detection limits (Additional file 1: Table S11). Patient genomic context such as common polymorphisms may also cause upregulation of naturally occurring alternative transcripts, which could contribute to patient RNA results variability. For variants with impact on splicing from the patient RNA assays, comparison of minigene and patient-derived RNA results revealed a high level of agreement across all seven genes (Fig. 6 C, Additional file 1: Table S11). Overall, 89% of variants had concordant splicing outcomes (45% complete concordance, and 44% high concordance where the predominant transcripts agreed across the two systems). The remaining 11% were discordant, comprising 8% with low concordance (where agreement was limited to minor transcripts only) and 3% with no concordance. Among the 40 highly concordant cases, two involved a single aberrant minigene transcript matching with the most abundant patient-derived transcript, and 38 had a single patient-derived transcript matching the most abundant transcript detected in the minigene assay. Of the latter, the minigene assay revealed additional transcripts in comparison to patient RNA assays, either lowly expressed (≤ 20% of the total RNA pool, n = 30) or moderately expressed transcripts (23–48%, n = 8) that were likewise absent in patient RNA assays. Possible explanations for the observed discrepancies between the minigene and patient findings are detailed in Additional file 1: Table S11. In general, the splicing outcome pattern likely reflects the greater sensitivity of the minigene semi-quantification method in detecting low-abundance splice products as these are probably missed in patient RNA assays using agarose gel electrophoresis and Sanger sequencing. In addition to the detection limit, agarose gel lacks the resolution to distinguish transcripts that differ by only a few nucleotides from the FL transcript and therefore may not be flagged for further characterisation. Moreover, most of the patient RNA assays were conducted without NMD inhibition, which likely contributed to the absence of certain PTC-containing transcripts that were detected in the minigene assay under NMD-inhibited conditions. Other possible explanations include differences in the location of primers, alternative splicing specific to leukocytes or lymphoblastoid cell lines from patients, minigene construct design limitation (e.g. removal of gene context important to splicing), and use of different techniques and reagents. For BRCA2 , where comprehensive splicing assay data were available through our in-house database, assay outcomes indicating no aberration also demonstrated high agreement with patient RNA data. Of the 23 BRCA2 variants with no aberration in patient RNA assays (Additional file 1: Table S12), 22 (96%) showed no/low aberration in the minigene assay. One variant ( BRCA2 c.441A > G), showed complete splicing impact (100% expression of PTC-containing transcripts) in the minigene assay but no detectable impact in the assay of patient RNA from blood without NMD inhibition [9]. This variant has a low SpliceAI score of 0.04; however, it was previously identified to lie within an exonic splicing enhancer (ESE) motif based on microdeletion analysis [41]. In addition to possible degradation of PTC-containing transcripts in the patient sample, differences in splicing factor activity between the minigene system and blood may explain the discordance. These findings indicate that the minigene assay reliably recapitulates patient-derived splicing patterns in most cases, with discrepancies occurring primarily in low-abundance splicing events. Clinical calibration to derive evidence strength applicable for construct-based RNA results Clinical calibration of the collated minigene dataset showed that variants leading to high (≥ 80%) or low (≤ 20%) expression of transcripts with predicted LOF consequence provide strong evidence towards or against pathogenicity, respectively (Table 3 ). Evidence strength was unchanged after exclusion of higher-scoring variants at splice donor/acceptor ± 1,2 dinucleotide positions (Table 3 ). Given the far lower confidence in the evidence strength estimate for the remaining two categories (> 20 and < 50; ≥50 and < 80), it is suggested to apply no evidence for these categories based on the current findings. Sensitivity analysis limiting the B/LB reference set to variants with at least one B classification, and the P/LP reference set to variants with at least one P classification, did not alter findings (data not shown). Table 3 Clinical calibration of traditional minigene assay results considering levels of predicted LOF transcripts Percentage of predicted LOF transcripts B/LB reference set P/LP reference set LR Low CI High CI Evidence strength based on LR Suggested evidence strength considering CI Points n Proportion n Proportion Calibration including variants at splice donor/acceptor ± 1,2 dinucleotide positions ≤ 20 76 0.92 1 0.01 0.01 0.00 0.05 Benign_strong Benign_strong -4 > 20 & <50 5 0.06 2 0.01 0.20 0.04 1.01 Benign_moderate No evidence 0 ≥ 50 & <80 1 0.01 7 0.04 3.50 0.44 27.98 Pathogenic_supporting No evidence 0 ≥ 80 1 0.01 156 0.94 78.00 11.11 547.44 Pathogenic_strong Pathogenic_strong + 4 TOTAL = 83 166 All reference set variants, n = 249 Calibration excluding variants at splice donor/acceptor ± 1,2 dinucleotide positions ≤ 20 76 0.92 0 0.00 0.02 0.00 0.30 Benign_strong Benign_strong -4 > 20 & <50 5 0.06 0 0.00 0.26 0.02 4.62 Benign_supporting No evidence 0 ≥ 50 & <80 1 0.01 1 0.04 2.96 0.19 45.84 Pathogenic_supporting No evidence 0 ≥ 80 1 0.01 27 0.96 80.04 11.39 562.24 Pathogenic_strong Pathogenic_strong + 4 TOTAL 83 28 All reference set variants, n = 111 Abbreviation: B, Benign; LB, Likely Benign; P, Pathogenic; LP, Likely Pathogenic; LR, likelihood ratio; CI, confidence interval; n, number. DISCUSSION We conducted a comprehensive evaluation of splicing assay datasets from MPSAs (7 datasets, 41,178 variants in total) and traditional minigene RT-PCR assays (14 selected datasets, 673 variants in total), and summarise below how the findings may inform recommendations for the application of construct-based splicing assay and bioinformatic prediction data for clinical variant classification. Caution related to use of MPSA data for clinical variant classification Our analysis of MPSA datasets ranging from gene-specific multiplexed assays (ParSE-seq and TTN assay) to broad MPSA screens (MaPSy, Vex-seq, and MFASS) has highlighted the need for rigorous evaluation of design and performance of individual datasets to justify use (or not) of MPSA-derived RNA results in clinical variant classification. The MPSAs all lacked information on specific alternatively spliced transcripts produced by each variant assayed; this absence of detailed characterisation of aberrant transcripts in MPSAs prevents the determination of effect on the amino acid sequence, a crucial factor for predicting the ultimate functional consequence of a spliceogenic variant, and thus its likelihood to cause disease. All MPSA datasets were also limited by assay design, restricting detection to specific splicing events and thereby capturing only a subset of RNA alterations (Fig. 1 D; Additional file 1: Table S3). Even considering the assay design limitations, the assay scores generated by five of seven MPSA datasets (MaPSy (3 datasets), Vex-seq, and TTN assay) were not able to clearly differentiate aberrant splicing events from normal splicing events, based on both overall distribution and correlation of reported splicing impact with SpliceAI score (Fig. 2 ). Notably, our scrutiny of assay data for splice donor/acceptor ± 1,2 dinucleotide variants with high probability to impact splicing revealed that a considerable number had no observed splicing impact for the Vexseq (24%) and MFASS (17%) datasets. In summary, all these assays exhibited significant experimental noise that contributed to the uncertainty of assay results. For Vex-seq and MaPSy, this was previously highlighted by The Critical Assessment of Genome Interpretation Consortium [42]. For MaPSy, this noise may be in part due to the assay normalisation. MaPSy used the variant sequence to de-multiplex and could only measure reads that mapped to normal exon inclusion. These counts were normalised to the DNA input (the amount of minigene added into the experiment) and makes the large assumption that the transfection efficiency and transcription rate of each minigene for each cell will be equal. In contrast, the ParSE-seq dataset showed all the characteristics of a well-performing assay, with a bimodal distribution, expected correlation with SpliceAI score, and complete concordance between observed aberration and prediction of splicing impact for variants at the splice donor/acceptor ± 1,2 dinucleotide positions. This likely reflects that this SCN5A gene-specific assay offered a key experimental advantage that was absent in broad MPSA screens. The ParSE-seq minigene constructs included longer flanking intronic sequences and thus are expected to have captured more intronic splicing signals than the minigene constructs used in broad MPSA screens. In addition, the study design itself allowed for assessment of assay performance: the ParSE-seq approach utilized two cell lines, one of which was physiologically relevant; validation was carried out for selected variants using orthogonal methods; the clinical validity of the assay was evaluated by including ClinVar P/LP and B/LB variants as validation controls. The ClinVar controls formed the basis for clinical calibration of the assay according to the framework developed by Brnich et al. (2019) [43], reaching a strong level of evidence towards or against pathogenicity. However, we note that the subsequent application of RNA-based evidence for clinical variant classification as part of the study by O’Neill et al. (2024) [18] did not openly consider aberration type and effect on reading frame, and assumed that all variants that induced abnormal splicing had deleterious biological consequences. Characterisation of the aberrant transcript(s) produced by each variant assayed was not included in the published dataset to confirm this assumption. Given the poor performance of specific MPSA datasets, any inferences from these data (including benchmarking) should be viewed with caution. For example, AlphaGenome used the MFASS dataset to benchmark its algorithm against SpliceAI and other splicing predictors [44], and demonstrated poor performance for all predictors (AUC < 0.54), in line with our findings. Value of selected traditional minigene data for informing clinical variant classification In contrast to the findings from our evaluation of MPSA datasets, we illustrate that strategically selected traditional minigene studies generating semi-quantitative RT-PCR data provide information that can benefit clinical variant classification - either directly as RNA evidence of variant impact or by informing the application of splicing aberration prediction data in classification algorithms. First, we demonstrate that the selected traditional minigene assays perform very well in measuring variant impact on splicing, from comparison of the experimentally observed variant-induced splicing aberration versus those expected based on SpliceAI prediction - using a variety of score thresholds previously recommended to separate variants according to likelihood to impact splicing. This is unsurprising, since the studies were specifically selected because they used multi-exon construct design (ranging from three to nine exons) that delivered good experimental reproducibility due to highly standardised protocols using the same reagents; all assays were performed in a minimum of three replicates, producing results with low standard deviation. Second, constructs composed of three or more exons generally show similar or identical results to those derived from patient RNA (where these are available) when the baseline splicing pattern of the minigene construct has been confirmed to match that observed in patient-derived RNA. As a specific example, variant BRCA2 c.8488-1G > A showed different splicing outcomes in 2-exon [45] and 9-exon [46] minigenes, with the latter emulating results from patient material. Here, we demonstrate that results from the selected construct-based assays were highly concordant with those generated from patient RNA; the single outlier variant (complete impact for construct, no impact for patient RNA) was located within an ESE, with splice impact potentially arising due to the truncated genomic context of the construct and absence of sequence essential for regulating splicing activity in vivo. Moreover, there was considerable variation between different patient RNA results for a given variant, likely due to differences in experimental methods. Assuming that patient RNA results remain the gold-standard as the source of information for variant impact on splicing, these findings indicate that minigene assay results can provide a reliable source of information where “validation” of construct-based assay findings is achieved through faithful replication of results from well-designed patient RNA assays for a subset of minigene results. Overall, these results also justify the use of construct-based data as a means to quantify allele-specific expression of aberrant transcripts to supplement patient-derived RNA results. Third, we have demonstrated through clinical calibration that multi-exon construct-based RNA results, after considering the aberration type and level, can provide strong evidence towards or against pathogenicity. Our calibration findings provide guidance, for hereditary cancer genes at least, regarding the association between level of predicted LOF transcripts and pathogenicity; the LR estimates indicate that more extreme expression level categories (≤ 20% and ≥ 80%) are required to confidently assign evidence against or towards pathogenicity. Although there appeared to be an ordered trend in evidence strength for bins representing increasing levels of predicted LOF transcript/s induced by a variant, statistical power was constrained by the limited number of variants for the inner bins in particular. We believe that larger studies - preferably gene-specific - are needed to provide more clarity about strength of evidence associated with “intermediate” levels of aberrant splicing. Importantly, we have also shown the extensive value of minigene assay results for informing bioinformatic prediction of both level and type of aberrant splicing induced by a variant. Our results provide a baseline for selecting SpliceAI score bin thresholds that provide not only a probability that a variant will be spliceogenic, but how likely the level of splicing will be high-to-complete. This allows more nuanced estimation of the LRs for bioinformatic bins, with potential to inform gene-specific score thresholds selected based on gene-disease knowledge, to separate disease-associated levels of aberrant splicing from those that are tolerated. It is also possible to better utilise SpliceAI predictions by using SAI-10k-calc to predict the variant-induced aberration(s), which in our analysis demonstrated 79% concordance for positive prediction of aberration type. This approach may advance the use of such predictions beyond spliceogenicity to determine pathogenicity (or benignity) through the application of gene-specific knowledge about transcript structure and clinically important protein domains. Moreover, our results suggest that concordance in functional consequence assigned to predicted transcripts versus observed transcripts may reach as high as 87%, with no material difference in the expected clinical relevance for ~⅔ of variants assigned as partial or complete discordance from our analysis. Another important outcome of the results from our minigene analysis is their implications for practical application of the PS1 code in the context of predicted splicing impact. Namely, it is possible to infer pathogenicity for a predicted spliceogenic variant based on pathogenic classification for another variant with the same predicted impact on splicing [7]. As currently stated, the prerequisite for applying the PS1 codes is as follows: the predicted event of the variant under assessment must “precisely match the predicted event of the comparison (likely) pathogenic variant (e.g., both predicted to lead to exon skipping, or both to lead to enhanced use of a cryptic splice motif), AND the “strength of the prediction for the variant under assessment must be of similar or higher strength than the strength of the prediction for the comparison (likely) pathogenic variant” [7]. The demonstrated good performance of SAI-10k-calc to predict aberration events provides confidence that it is possible to assess if two different variants have matching predicted events. Further, the observation that SpliceAI score range categories can be used to infer level of variant-induced aberration provides justification that variants falling into the same SpliceAI score bin might be considered to have similar strength of prediction for both variant type and level of aberration. We acknowledge that the actual measurement of aberration level might differ between different assay methods, but this is irrelevant for application of PS1 which is based on comparing predicted splicing impact, and relies on the fact that the comparison pathogenic variant reaches this classification due to consideration of clinical data (as is the situation for applying PS1 in the context of missense variants). Likewise, between-gene differences in the level of aberrant splicing required to confer pathogenicity are accommodated by the fact that PS1 application is restricted to variants in the same splice region of the same gene. Using construct-based splicing assay results in variant curation The use of RNA splicing data from traditional minigene RT-PCR assays as evidence for classification is already being implemented by some ClinGen VCEPs e.g. Criteria Specifications for BRCA1/2 v1.2, CTLA4 v1.0, LDLR v1.2, PTEN v3.1, RS1 v1.0. Moreover, the InSiGHT Hereditary Colorectal Cancer/Polyposis and ENIGMA BRCA1 and BRCA2 VCEPs specifically recognise the value of minigene assays to provide quantitative information for variants located outside of the splice donor/acceptor ± 1,2 dinucleotide positions. RT-PCR assays using multi-exon minigene constructs can measure aberrant and naturally occurring alternative splicing spanning multiple exons, and can also characterise complex splicing aberrations and multiple transcripts arising from one variant allele. When well-designed, these assays can precisely define the type and level of splicing aberration, including whether the effect is complete or partial, its impact on the reading frame, the affected functional domains, and the presence of rescue transcripts from alternative splicing. That is, traditional minigene splicing assay studies offer a considerable level of detail in the characterisation of splicing products. They also provide an avenue to provide gene-specific recommendations on the minimum level of aberrant splicing that is associated with disease presentation. Altogether, a well-validated assay can generate all information necessary to assign an ACMG/AMP evidence code according to a decision flowchart proposed by the ClinGen SVI Splicing Subgroup, which allows weights up to Very Strong to be assigned based on RNA evidence for variants both inside and outside the splice donor/acceptor ± 1,2 dinucleotides [7], based on alignment with a gene-specific or generic PVS1 decision tree [47]. Importantly, this splicing-focussed flowchart weights the evidence that a spliceogenic variant may be pathogenic, downweighting from the highest PVS1 (RNA) code if assay results indicate less severe functional consequences such as incomplete splicing impact, presence of rescue transcripts, or in-frame amino acid deletion outside clinically relevant domains. For RNA results showing multiple transcripts, the recommended approach is to assign a PVS1 strength to each transcript, group transcripts with the same strength, and then apply a conservative overall PVS1 level that reflects their relative contribution to total expression (Walker et al., 2023). With respect to application of construct-based evidence in variant curation and classification, current recommendations state that results for synonymous and intronic variants outside the splice region showing no impact relative to controls can be assigned a BP7_Strong (RNA) code [7]. However, there has been no ClinGen general recommendation on the maximum evidence weight towards pathogenicity applicable to construct-based data in the absence of calibration of an experimental system against clinical data for proven spliceogenic and non-spliceogenic variants. The decision tree for RNA analyses proposed by Buisine and colleagues (2025) [15] allowed a maximum weight of Very Strong for minigene RT-PCR data. The ClinGen ENIGMA BRCA1 and BRCA2 VCEP (Parsons et al., 2024) recommended that the overall evidence strength applicable to minigene RT-PCR data be downweighted and may not exceed PVS1_Strong (RNA) due to the artificial nature of minigene systems. Previous calibration analysis by O’Neill et al.[18] indicated that strong level of evidence was applicable for impact on splicing for SCN1A variants, as measured by their MPSA dataset. The results from the clinical calibration conducted in this study, which additionally assessed the level of predicted LOF transcripts induced by the variant, provide further justification to support application of strong level of evidence towards or against pathogenicity (PVS1_Strong (RNA), BP7_Strong (RNA) following current code recommendations) for RNA results from well-designed multi-exon minigene experiments. We anticipate that the revisions to the ACMG/AMP guidelines, introduced in v4.0, will limit the maximum weight applicable for a variant that leads to transcript/s for which NMD is predicted to 6 points, but the overall consideration of how to assess if RNA results from constructs may be equivalent to those from patient material remain the same. Following current recommended processes, clinical calibration is not required to assign a PVS1 (RNA) code if the result is complete impact or no impact from assays using RNA from patient material, although there is high-level advice to consider adaptive weighting based on factors that may influence splicing assay results[7]. The value of clinical calibration is to set the aberration level threshold that is considered pathogenic (or benign) for variants that have partial impact on splicing, and have been classified without use of RNA data. Construct-based studies can play a significant role here in providing allele-specific quantitative information to set the aberration level threshold to assign evidence towards or against pathogenicity for different gene-disease entities. For MPSAs that do not return exact details on aberration type, assay-level calibration approaches are very important to derive an overall evidence weight for an individual study. As a first step, the LR can be calculated to assign an evidence strength towards or against spliceogenicity for each splicing impact category. However for a true clinical calibration, several factors have to be considered: spliceogenic is not necessarily pathogenic (so some spliceogenic variants may justifiably be annotated as VUS or B/LB in ClinVar); non-spliceogenic is not necessarily benign (e.g. missense variants may justifiably be annotated as VUS or P/LP in ClinVar); ClinVar P/LP controls may have included splicing data in their classification introducing circularity (especially important for retrospective calibration); ClinVar LB controls may not be ideal negative clinical controls since variants can reach this classification with no clinical data (e.g. variant type and/or position (BP7) and bioinformatic prediction (BP4)). Having performed a study-wide calibration, assay-level and variant-level prediction information should preferably be used to inform further downweighting or upweighting of individual variant results from construct-based assays. For example, if there is a possibility of a multi-exon alternative event (based on literature, SAI-10k-calc or SpliceVault), then a single-exon construct design would not be able to capture this type of aberration, and a “no-impact” result should be discounted. Nevertheless, the same assay could provide valuable information to confirm predicted single exon skipping events, or predicted absence of an RNA impact. To allow increased clinical application of MPSA data consisting solely of assay scores such as delta PSI, we recommend the use of variant location information and bioinformatic predictions to assess likelihood of complete splicing and also the aberration type and relationship to pathogenesis. For example, given gene-specific knowledge, an in-frame deletion within a functional critical domain may be reasonably assigned evidence towards pathogenicity based on RNA findings, whereas an in-frame deletion outside a clinically important functional domain should not be considered evidence towards pathogenicity. Guidance for design and critique of construct-based assays for application in variant classification Good experimental design, data quality control, and clinical validation are essential for the application of minigene and multiplex splicing assay results in clinical variant classification. General recommendations exist for the use of multiplexed functional data for clinical variant classification, but these are not specific for RNA findings, and cover a broad range of assays primarily measuring variant effects on protein function, cell growth, or cell viability [48]. In contrast, a recent publication from a multidisciplinary French network includes detailed recommendations for the experimental design of minigene splicing assays for diagnostic implementation [15]. Based on these previous publications, our critique and consideration of the performance of the MPSAs reviewed and our selected minigene dataset, and our observations about reporting requirements to facilitate clinical application of RNA findings, we have compiled guidance for the application of minigene or multiplexed splicing assay results in clinical variant classification. This guidance highlights modifications necessary to account for construct design limitations, different quantification methods, and the large-scale nature of MPSAs. We provide a broad overview of the most important considerations below, and more detailed information in Additional file 1: Table S13. Assay design A clear understanding of the assay design, including knowledge of the types of splicing aberration that can (and cannot) be detected, is essential for accurately interpreting assay results. Sufficient intronic sequence flanking the exons is needed to ensure correct RNA splicing. Existing recommendations for traditional minigenes are that intronic sequence at both ends of the insert must be at least 150–200 bp [14, 15, 49], and this rationale holds true for MPSAs; O’Neill and colleagues (2024) [18] have demonstrated that it is possible to generate reliable MPSA results using constructs containing a single test exon with 125–250 bp of flanking intronic sequences in a multiplexed assay. When designing new assays, we strongly recommend identification and formal quantification of transcripts. Since minigene assays do not fully preserve the physiological and genomic contexts, it is critical to assess the quality of the assay before it can be used for clinical variant classification. Experimental controls Use of appropriate experimental controls is fundamental for establishing assay reliability, and comparison of minigene results with pre-existing splicing assay data is important to establish confidence in assay results. Both traditional RT-PCR or multiplexed minigene assays should include 1) WT construct control to determine baseline splicing; in the context of residual alternative splicing events, normalising the level of spliced products from variant constructs to the WT value will be necessary for accurate interpretation of results; 2) No-template control to assess whether sample contamination has occurred; 3) Positive control variant(s) known to be spliceogenic, to assess the ability to detect splicing aberration relative to WT; 4) Negative control variant(s) known to have no impact on splicing relative to WT (non-spliceogenic) to assess likelihood of false positives in a minigene context. Note while results using RNA from patient-derived material are generally set as the benchmark for detecting splicing aberration types since this is the clinical standard for validation, it should be recognised that every assay comes with its caveats, as demonstrated in this study. Reporting We provide the following guidance for reporting the results of minigene assays, in particular MPSAs, to enhance their usability by clinicians and variant curators in diagnostic laboratories 1) For variant reporting, genomic coordinates should be provided and defined according to a standard genome build [1], and the HGVS c. nomenclature must be reported based on the MANE Select transcript or the clinically relevant transcript [7]. 2) If the method is capable of characterising the spliced transcripts, we recommend reporting the HGVS r. nomenclature based on the MANE Select transcript or the clinically relevant transcript. In addition, building on previous reports by the ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles) consortium, we advise use of the shorthand nomenclature adapted and extended from Colombo et al. (2014)[50] to help simplify transcript annotation and facilitate variant curation. This additional aberrant transcript nomenclature, described in detail in Additional file 1: Table S14, enables easy identification of affected exons or introns and the sizes of deleted or retained sequences. Since this nomenclature style does not indicate the exact change in the transcript nucleotide sequence, it must always be accompanied by the HGVS description. We also recommend that transcripts be annotated to indicate whether they result in LOF or not. 3) If the method can formally quantify transcripts, we recommend reporting the percentages of variant-induced full-length and aberrant transcripts. For MPSAs, thresholds enabling categorical annotations of splicing impact, such as “abnormal splicing”, “indeterminate”, or “normal splicing”, must be reported to facilitate the interpretation of assay scores. 4) Data and experimental details must be deposited on publicly accessible databases and repositories. Further information on disclosure of datasets and statistical methods for multiplexed assays is detailed in Gelman et al. (2019) [48]. We anticipate this guidance may aid the design of new minigene splicing assays intended to have clinical application, and the quality assessment of data from existing minigene splicing assays. CONCLUSIONS There is no formal guidance on optimal assay design and validation of construct-based RNA results for application within the ACMG/AMP variant interpretation framework. We highlight that it is essential to evaluate the design limitations and performance of broad MPSA screens before considering their use in clinical variant classification, either directly or indirectly. We have shown that only one of seven MPSA datasets evaluated demonstrated performance suitable for clinical application of the RNA evidence in variant classification. These findings also raise concerns about use of MPSA datasets for benchmarking of new prediction tools. By extension, our findings have important implications for use of public collated splicing datasets in research or clinical studies. For example, the SpliceVarDB repository of splicing assay data was created to improve access to variant-specific splicing information for curators and researchers [51], and includes splicing assay data from multiple MPSA studies. Our findings suggest the value for quality control assessment of any studies included in such a repository. We have also shown that evaluation and analysis of well-designed multi-exon minigene assays reliably recapitulate patient splicing outcomes, and when clinically calibrated, can provide strong evidence towards or against pathogenicity using the ACMG/AMP variant classification framework. Analysis of this quantitative construct-based data highlighted potential for a more nuanced interpretation of splicing impact prediction, by demonstrating that SpliceAI score bins predict the level of splicing impact, and SAI-10k-calc accurately predicts variant-induced aberration types and their functional consequences. Abbreviations ACMG American College of Medical Genetics and Genomics AMP Association for Molecular Pathology B/LB Benign / Likely Benign CE Capillary Electrophoresis ClinVar Clinical Variant database ESE Exonic Splicing Enhancer gnomAD Genome Aggregation Database LOF Loss of Function LR Likelihood Ratio MANE Matched Annotation from NCBI and EMBL—EBI MaPSy Massively Parallel Splicing (Yeast—based) Assay MFASS Multiplexed Functional Assay of Splicing using Sort—seq MPSA Massively Parallel Splicing Assay NMD Nonsense—Mediated Decay P/LP Pathogenic / Likely Pathogenic PSI Percent Spliced In PTC Premature Termination Codon RNA seq —RNA sequencing ROC Receiver Operating Characteristic RT PCR —Reverse Transcription Polymerase Chain Reaction siRNA Small Interfering RNA SVI Sequence Variant Interpretation (ClinGen subgroup) VCEP Variant Curation Expert Panel VEP Variant Effect Predictor Vex seq —Variant Exon Sequencing VUS Variant of Uncertain Significance WT Wild Type Declarations Ethics approval and consent to participate This work exclusively used publicly available data. Consent for publication Not applicable Competing interests The authors declare that they have no competing interests Funding ABS was supported in part by an NHMRC Investigator Fellowship (APP177524). The work of D.C was supported in part by funding to QIMR Berghofer from an anonymous donor, and support from the Estate of Pamela G. Webb, in honour of William Alexander (Alec) McKay. GARW and LCW were supported by funding from the Health Research Council of New Zealand (22/187). EAV-S is supported by a grant from the Spanish Ministry of Science and Innovation, Plan Nacional de I+D+I 2023, ISCIII (ref. PI23/00047), co-funded by FEDER from Regional Development European Funds (European Union). Authors' contributions D.M.C, G.A.R.W., L.C.W. and A.B.S. conceived the study design. E.A.V-S. provided a the traditional minigene dataset. Data analysis was performed by all authors. All authors read, contributed to and approved the final manuscript. Acknowledgements We thank Michael Parsons for advice relating to BRCA1 and BRCA2 variant classification following VCEP specifications. References Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405 − 24. Fresard L, Smail C, Ferraro NM, Teran NA, Li X, Smith KS, et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med. 2019;25(6):911-9. Karam R, Conner B, LaDuca H, McGoldrick K, Krempely K, Richardson ME, et al. Assessment of Diagnostic Outcomes of RNA Genetic Testing for Hereditary Cancer. JAMA Netw Open. 2019;2(10):e1913900. Yamada M, Suzuki H, Shiraishi Y, Kosaki K. Effectiveness of integrated interpretation of exome and corresponding transcriptome data for detecting splicing variants of genes associated with autosomal recessive disorders. Mol Genet Metab Rep. 2019;21:100531. Yepez VA, Gusic M, Kopajtich R, Mertes C, Smith NH, Alston CL, et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med. 2022;14(1):38. Jaramillo Oquendo C, Wai HA, Rich WI, Bunyan DJ, Thomas NS, Hunt D, et al. Identification of diagnostic candidates in Mendelian disorders using an RNA sequencing-centric approach. Genome Med. 2024;16(1):110. Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023;110(7):1046-67. Spurdle AB, Greville-Heygate S, Antoniou AC, Brown M, Burke L, de la Hoya M, et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J Med Genet. 2019;56(6):347 − 57. Wai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med. 2020;22(6):1005-14. Lord J, Baralle D. Splicing in the Diagnosis of Rare Disease: Advances and Challenges. Front Genet. 2021;12:689892. Wai H, Douglas AGL, Baralle D. RNA splicing analysis in genomic medicine. Int J Biochem Cell Biol. 2019;108:61–71. Walker LC, Whiley PJ, Houdayer C, Hansen TV, Vega A, Santamarina M, et al. Evaluation of a 5-tier scheme proposed for classification of sequence variants using bioinformatic and splicing assay data: inter-reviewer variability and promotion of minimum reporting guidelines. Hum Mutat. 2013;34(10):1424-31. Rhine CL, Neil C, Glidden DT, Cygan KJ, Fredericks AM, Wang J, et al. Future directions for high-throughput splicing assays in precision medicine. Hum Mutat. 2019;40(9):1225-34. Gaildrat P, Killian A, Martins A, Tournier I, Frebourg T, Tosi M. Use of splicing reporter minigene assay to evaluate the effect on splicing of unclassified genetic variants. Methods Mol Biol. 2010;653:249 − 57. Buisine MP, Bellanne-Chantelot C, Calmels N, Vaché C, Besnard T, Cogne B, et al. RNA-based diagnostic studies in genetics: Review and guidance from a multidisciplinary French network. Eur J Hum Genet. 2025;33(10):1219-27. Adamson SI, Zhan L, Graveley BR. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 2018;19(1):71. Chong R, Insigne KD, Yao D, Burghard CP, Wang J, Hsiao YE, et al. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol Cell. 2019;73(1):183 − 94 e8. O'Neill MJ, Yang T, Laudeman J, Calandranis ME, Harvey ML, Solus JF, et al. ParSE-seq: a calibrated multiplexed assay to facilitate the clinical classification of putative splice-altering variants. Nat Commun. 2024;15(1):8320. Patel PN, Ito K, Willcox JAL, Haghighi A, Jang MY, Gorham JM, et al. Contribution of Noncanonical Splice Variants to TTN Truncating Variant Cardiomyopathy. Circ Genom Precis Med. 2021;14(5):e003389. Rhine CL, Neil C, Wang J, Maguire S, Buerer L, Salomon M, et al. Massively parallel reporter assays discover de novo exonic splicing mutants in paralogs of Autism genes. PLoS Genet. 2022;18(1):e1009884. Rong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, et al. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A. 2023;120(21):e2218308120. Soemedi R, Cygan KJ, Rhine CL, Wang J, Bulacan C, Yang J, et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet. 2017;49(6):848 − 55. Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176(3):535 − 48 e24. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. Canson DM, Davidson AL, de la Hoya M, Parsons MT, Glubb DM, Kondrashova O, et al. SpliceAI-10k calculator for the prediction of pseudoexonization, intron retention, and exon deletion. Bioinformatics. 2023;39(4). Canson DM, Parsons MT, Moir-Meyer G, Dumenil T, Montalban G, Lin E, et al. The SeqSplice multiplexed minigene splicing assay for characterization and quantitation of variant-induced BRCA1 and BRCA2 splice isoforms. Genome Res. 2025;35(9):2104-15. de Sainte Agathe JM, Filser M, Isidor B, Besnard T, Gueguen P, Perrin A, et al. SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation. Hum Genomics. 2023;17(1):7. Dawes R, Bournazos AM, Bryen SJ, Bommireddipalli S, Marchant RG, Joshi H, et al. SpliceVault predicts the precise nature of variant-associated mis-splicing. Nat Genet. 2023;55(2):324 − 32. McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122. O'Mahony DG, Ramus SJ, Southey MC, Meagher NS, Hadjisavvas A, John EM, et al. Ovarian cancer pathology characteristics as predictors of variant pathogenicity in BRCA1 and BRCA2. Br J Cancer. 2023;128(12):2283-94. Tavtigian SV, Greenblatt MS, Harrison SM, Nussbaum RL, Prabhu SA, Boucher KM, et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med. 2018;20(9):1054-60. Parsons MT, de la Hoya M, Richardson ME, Tudini E, Anderson M, Berkofsky-Fessler W, et al. Evidence-based recommendations for gene-specific ACMG/AMP variant classification from the ClinGen ENIGMA BRCA1 and BRCA2 Variant Curation Expert Panel. Am J Hum Genet. 2024;111(9):2044-58. Richardson ME, Bishop MFH, Holdren MA, de la Hoya M, Spurdle AB, Tavtigian SV, et al. Specifications of the ACMG/AMP variant curation guidelines for the analysis of germline PALB2 sequence variants. Am J Hum Genet. 2025;112(10):2266-80. Ito K, Patel PN, Gorham JM, McDonough B, DePalma SR, Adler EE, et al. Identification of pathogenic gene mutations in LMNA and MYBPC3 that alter RNA splicing. Proc Natl Acad Sci U S A. 2017;114(29):7689-94. Sheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res. 2006;34(14):3955-67. Moles-Fernandez A, Domenech-Vivo J, Tenes A, Balmana J, Diez O, Gutierrez-Enriquez S. Role of Splicing Regulatory Elements and In Silico Tools Usage in the Identification of Deep Intronic Splicing Variants in Hereditary Breast/Ovarian Cancer Genes. Cancers (Basel). 2021;13(13). de la Hoya M, Soukarieh O, López-Perolio I, Vega A, Walker LC, van Ierland Y, et al. Combined genetic and splicing analysis of BRCA1 c.[594-2A > C; 641A > G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. Hum Mol Genet. 2016;25(11):2256-68. Thompson BA, Martins A, Spurdle AB. A review of mismatch repair gene transcripts: issues for interpretation of mRNA splicing assays. Clin Genet. 2015;87(2):100-8. Fortuno C, Llinares-Burguet I, Canson DM, de la Hoya M, Bueno-Martínez E, Sanoguera-Miralles L, et al. Exploring the role of splicing in TP53 variant pathogenicity through predictions and minigene assays. Hum Genomics. 2025;19(1):2. Minnerop M, Kurzwelly D, Wagner H, Soehn AS, Reichbauer J, Tao F, et al. Hypomorphic mutations in POLR3A are a frequent cause of sporadic and recessive spastic ataxia. Brain. 2017;140(6):1561-78. Fraile-Bethencourt E, Valenzuela-Palomo A, Díez-Gómez B, Goina E, Acedo A, Buratti E, et al. Mis-splicing in breast cancer: identification of pathogenic BRCA2 variants by systematic minigene assays. J Pathol. 2019;248(4):409 − 20. CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol. 2024;25(1):53. Brnich SE, Rivera-Munoz EA, Berg JS. Quantifying the potential of functional evidence to reclassify variants of uncertain significance in the categorical and Bayesian interpretation frameworks. Hum Mutat. 2018;39(11):1531-41. Avsec Ž, Latysheva N, Cheng J, Novati G, Taylor KR, Ward T, et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature. 2026;649(8099):1206-18. Acedo A, Sanz DJ, Durán M, Infante M, Pérez-Cabornero L, Miner C, et al. Comprehensive splicing functional analysis of DNA variants of the BRCA2 gene by hybrid minigenes. Breast Cancer Res. 2012;14(3):R87. Acedo A, Hernández-Moro C, Curiel-García Á, Díez-Gómez B, Velasco EA. Functional classification of BRCA2 DNA variants by splicing assays in a large minigene with 9 exons. Hum Mutat. 2015;36(2):210 − 21. Abou Tayoun AN, Pesaran T, DiStefano MT, Oza A, Rehm HL, Biesecker LG, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat. 2018;39(11):1517-24. Gelman H, Dines JN, Berg J, Berger AH, Brnich S, Hisama FM, et al. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation. Genome Med. 2019;11(1):85. Riedmayr LM, Böhm S, Michalakis S, Becirovic E. Construction and Cloning of Minigenes for in vivo Analysis of Potential Splice Mutations. Bio Protoc. 2018;8(5):e2760. Colombo M, Blok MJ, Whiley P, Santamariña M, Gutiérrez-Enríquez S, Romero A, et al. Comprehensive annotation of splice junctions supports pervasive alternative splicing at the BRCA1 locus: a report from the ENIGMA consortium. Hum Mol Genet. 2014;23(14):3666-80. Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. Am J Hum Genet. 2024;111(10):2164-75. Additional Declarations No competing interests reported. Supplementary Files Additionalfile1.xlsx AdditionalFile2.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 16 Apr, 2026 Reviews received at journal 14 Apr, 2026 Reviews received at journal 03 Apr, 2026 Reviewers agreed at journal 02 Apr, 2026 Reviewers agreed at journal 25 Mar, 2026 Reviewers invited by journal 25 Mar, 2026 Editor assigned by journal 20 Mar, 2026 Submission checks completed at journal 10 Mar, 2026 First submitted to journal 10 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9081705","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":621486845,"identity":"2bf9cadc-73d0-4d25-bdcc-c01e43a8cbf1","order_by":0,"name":"Daffodil M Canson","email":"","orcid":"","institution":"QIMR Berghofer","correspondingAuthor":false,"prefix":"","firstName":"Daffodil","middleName":"M","lastName":"Canson","suffix":""},{"id":621486847,"identity":"8420d2a6-5400-4782-b789-2f5ae7c9565b","order_by":1,"name":"George A R Wiggins","email":"","orcid":"","institution":"University of Otago","correspondingAuthor":false,"prefix":"","firstName":"George","middleName":"A R","lastName":"Wiggins","suffix":""},{"id":621486849,"identity":"f39c4116-8a33-40cd-932c-cd02560e91f0","order_by":2,"name":"Eladio A Velasco-Sampedro","email":"","orcid":"","institution":"Instituto de Biomedicina y Genética Molecular de Valladolid (IBGM), Consejo Superior de Investigaciones Científicas - Universidad de Valladolid (CSIC-UVa)","correspondingAuthor":false,"prefix":"","firstName":"Eladio","middleName":"A","lastName":"Velasco-Sampedro","suffix":""},{"id":621486850,"identity":"8148254e-b644-4f4a-9396-3d145914cf8d","order_by":3,"name":"John F Pearson","email":"","orcid":"","institution":"University of Otago","correspondingAuthor":false,"prefix":"","firstName":"John","middleName":"F","lastName":"Pearson","suffix":""},{"id":621486851,"identity":"81442004-96d2-4c4e-b1c9-b18222d249cd","order_by":4,"name":"Logan Walker","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYBACPjiDB0h8IEYLG5wB1MI4g8GARC3MPERpYW+/+LmA4bA8G8/hY59tKv7Iy7s3MH74wXDHHqcWnjPF0jMYDhu28bYlz845Y2C48cwBZskehmfMOLVI5CRI8zDcZmzj5zFmzm0zYNw4I4FBmoHhMBseLcm/gVrs2/j5PzNbthnYA7Uw/wZq4cGtJf0YyJbENt4eZmbGNoPE+RIJbCBbJPD4hc2ax+B/chvPMWPGnjPGyRt4DrZZ9hgcxhl2/Oztj2/zVKTZ9vMkP2b4USFnO7+9+fCNHxWHcYYYAwMP0DhkEw0OMDYw4I8f9geofPkGfKpHwSgYBaNgJAIAvStKEHNCd9IAAAAASUVORK5CYII=","orcid":"","institution":"University of Otago","correspondingAuthor":true,"prefix":"","firstName":"Logan","middleName":"","lastName":"Walker","suffix":""},{"id":621486853,"identity":"c8bb6437-90a5-4b1a-8f5b-25346fd19bd5","order_by":5,"name":"Amanda B Spurdle","email":"","orcid":"","institution":"QIMR Berghofer","correspondingAuthor":false,"prefix":"","firstName":"Amanda","middleName":"B","lastName":"Spurdle","suffix":""}],"badges":[],"createdAt":"2026-03-10 09:08:59","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9081705/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9081705/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106901601,"identity":"69469785-9075-4fbf-b692-babc2a42e317","added_by":"auto","created_at":"2026-04-14 15:03:33","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":466627,"visible":true,"origin":"","legend":"\u003cp\u003eMPSA minigene construct design and assay workflow. A. An exon-intron-exon construct design including only a portion of exonic and intronic sequences surrounding the donor and acceptor splice site pair as test sequence. Deep intronic sequences are deleted when necessary to fit the inserted test sequence within the 500-bp size limit. \u003cstrong\u003eB.\u003c/strong\u003e A chimeric three-exon construct design containing a single test exon with flanking intronic sequences (blue) inserted between two constant vector exons (V1 and V2). \u003cstrong\u003eC.\u003c/strong\u003e In MPSA, numerous wild type (WT) and variant minigene constructs are transfected into cells, then the RNA splicing products are pooled and sequenced, and the final splicing outcomes are bioinformatically determined. Alternatively, exon skipping/inclusion are measured by cell sorting based on fluorescence. \u003cstrong\u003eD.\u003c/strong\u003e Splicing products measured by MPSAs included in this study. \u003csup\u003e1\u003c/sup\u003eIndirectly measures all events possible within the limitation of a single-exon minigene. \u003csup\u003e2\u003c/sup\u003eMeasured by counting the number of introns after alignment (based on code review). \u003csup\u003e3\u003c/sup\u003eUnclear if pipeline handles previously unannotated splice junctions.\u003c/p\u003e","description":"","filename":"Figure1.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/b48254b4016195e89d32069f.jpg"},{"id":106901602,"identity":"7ed27916-d73b-46a5-8a9f-0a4571dd8d06","added_by":"auto","created_at":"2026-04-14 15:03:33","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":4278804,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of SpliceAI scores and MPSA assay data. Left) Histograms of study-reported assay results with the respective thresholds (red dashed lines) used for determining aberrant splicing. The three MaPSy and Vex-seq assays identified variants that impacted splicing that increased or decreased normal exon inclusion. Additionally, Rhine et al., used multiple constructs with different common exons designated by the authors as having different splice site strength (\u003cem\u003eVCP\u003c/em\u003e exon 15 [Strong], \u003cem\u003eEMC\u003c/em\u003eexon 7 [Intermediate], or \u003cem\u003eVCP\u003c/em\u003e exon 10 [Weak]). O’Neill et al., suggested a threshold for no impact of splicing (green dashed line) and a grey zone of ‘intermediate’ impact (grey shaded box). No histogram was possible for the \u003cem\u003eTTN\u003c/em\u003e-assay as the study as a quantitative assay score was not used. Right) Density plots of SpliceAI max delta scores for assay-defined spliceogenic (red), non-spliceogenic (blue) and unknown impact on splicing (grey) variants.\u003c/p\u003e","description":"","filename":"Figure2.png","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/b79b2aebccead2c6549083fa.png"},{"id":106994379,"identity":"0bd6202d-4ad6-43b8-a612-efe06513b520","added_by":"auto","created_at":"2026-04-15 15:08:08","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":7473205,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation of reported splice outcomes for variants at splice motifs.\u003cstrong\u003e A. \u003c/strong\u003eSpliceAI max delta score (top row) and assay readout (bottom row) for variants at the ±1,2 dinucleotides (red, SS), at splice region (gold, SR), as defined in Walker et al., 2023 [7], and all other sites (teal). \u003cstrong\u003eB.\u003c/strong\u003eThe delta percent spliced in (PSI) and SpliceAI max delta scores for splice donor/acceptor ±1,2 dinucleotide variants tested by Adamson (left, Vex-Seq) [16], Chong (middle, MFASS) [17] and O’Neill (right, ParSE-seq) [18]. Vertical dashed lines is the SpliceAI max delta score for predicted splicing impact (\u0026gt;=0.2). Horizontal dashed lines are the assay-specific thresholds used to define aberrant and normal splicing.\u003c/p\u003e","description":"","filename":"Figure3.png","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/8d0377f3a12fb7341b1c0b52.png"},{"id":106961432,"identity":"332f50d5-23c6-4c41-ac76-15e08563d9ee","added_by":"auto","created_at":"2026-04-15 09:25:32","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":113313,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of variants in the traditional minigene assay dataset and their splicing aberration levels. \u003cstrong\u003eA.\u003c/strong\u003eProportion of variants across the seven cancer susceptibility genes included in the dataset, n=673. \u003cstrong\u003eB.\u003c/strong\u003e Levels of aberration categorized based on canonical transcript (CT) reduction. \u003cstrong\u003eC.\u003c/strong\u003e Distribution of variants based on CT reduction score.\u003c/p\u003e","description":"","filename":"Figure4.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/75bd967ee8e3e680083e3e3d.jpg"},{"id":106901606,"identity":"eee162e5-ad6d-497d-81bb-f53c8395c196","added_by":"auto","created_at":"2026-04-14 15:03:33","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":428159,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation of computational tool performance for predicting the level and type of splicing aberration. \u003cstrong\u003eA. \u003c/strong\u003eDistribution of SpliceAI max delta score for variants in the different aberration categories: no (blue; Canonical transcript [CT] reduction ≤ 0%), low (gray; CT \u0026gt;0% to ≤20%), intermediate (pink; \u0026gt;20% to \u0026lt;80%), and high/complete (red; ≥80% to 100%). Median SpliceAI scores are represented by black horizontal lines. \u003cstrong\u003eB.\u003c/strong\u003e Density of categorised levels of splicing aberration across the full range of SpliceAI max delta score. \u003cstrong\u003eC. \u003c/strong\u003eROC analysis of SpliceAI max delta score and different levels of aberration. \u003cstrong\u003eD.\u003c/strong\u003eOptimal SpliceAI max delta thresholds (Youden index) for separating different levels of aberration. \u003cstrong\u003eE.\u003c/strong\u003e Number of variants with transcript prediction (shades of blue) and without prediction (shades of gray) by SAI-10k-calc and SpliceVault, n=673. SAI-10k-calc returned positive and negative predictions for SNVs only. SpliceVault returned positive predictions only for SNVs and indels, nearly all located in the splice region. \u003cstrong\u003eF. \u003c/strong\u003eConcordance of SAI-10k-calc transcript prediction with minigene assay results for all SNVs, n=600. The positive control set included SNVs inducing intermediate to complete aberration, while the negative control set included SNVs with no/low aberration. SAI-10k-calc with SpliceAI-visual inspection of 50 flagged variants slightly increased the concordance. *Inclusion of additional SNVs with concordant functional consequence (PTC or in-frame transcript) further increased the concordance (teal). \u003cstrong\u003eG. \u003c/strong\u003eConcordance of positive transcript prediction by SAI-10k-calc and SpliceVault with minigene assay results for SNVs located in the splice donor and acceptor motifs, n=257.\u003c/p\u003e","description":"","filename":"Figure5.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/f780841450d85d430a1b777e.jpg"},{"id":106960473,"identity":"0c64eaf2-c1d9-4110-9c4f-027124c98e0d","added_by":"auto","created_at":"2026-04-15 09:21:17","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":328247,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of minigene and patient RNA results. \u003cstrong\u003eA.\u003c/strong\u003e Number of variants with published patient RNA splicing assay results (n=114), drawn from 52 publications (Additional file 1: Tables S11 and S12). \u003cstrong\u003eB.\u003c/strong\u003e Comparison of patient RNA results for 34 variants assayed in multiple studies and shown to alter splicing in at least one study. \u003cstrong\u003eC.\u003c/strong\u003eComparison of minigene and patient RNA results for 91 variants that demonstrated altered splicing in at least one patient RNA assay.\u003c/p\u003e","description":"","filename":"Figure6.jpg","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/c46bbaed79373077bf9a3149.jpg"},{"id":107704952,"identity":"09b86248-0345-41d9-9d3d-cff6c11b398c","added_by":"auto","created_at":"2026-04-24 09:05:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":11863874,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/409c7026-d0f6-4ab4-924d-7464406aad44.pdf"},{"id":106901604,"identity":"44ffe1ee-d558-40ce-81e5-68e3edb714c0","added_by":"auto","created_at":"2026-04-14 15:03:33","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1088547,"visible":true,"origin":"","legend":"","description":"","filename":"Additionalfile1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/6ac53ab006e674852bd1b2bd.xlsx"},{"id":106961384,"identity":"15066225-8264-4c83-bdef-9373ec57cf8f","added_by":"auto","created_at":"2026-04-15 09:25:24","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":847307,"visible":true,"origin":"","legend":"","description":"","filename":"AdditionalFile2.docx","url":"https://assets-eu.researchsquare.com/files/rs-9081705/v1/1d32d5f065b550a95a345f65.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Evidence-based recommendations for application of construct-based splicing data in clinical variant classification","fulltext":[{"header":"BACKGROUND","content":"\u003cp\u003eIdentifying germline pathogenic variants is important for directing clinical care of patients and their families. In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published a framework for classifying variants using multiple categories and degrees of evidence [1]. However, interpreting the clinical significance of variants in disease susceptibility genes remains a significant diagnostic challenge. Furthermore, the body of genetic knowledge contributing to variant classification is notably biased according to ethnicity. For example, allele frequency data used to guide the assessment of genetic variants are not representative of minority population groups, contributing to inequities in the delivery of genetic health.\u003c/p\u003e \u003cp\u003eRNA diagnostics has emerged as an important strategy for establishing the clinical relevance of gene variants. Recent studies have shown that RNA sequencing (RNA-seq) improves the diagnostic yield by up to 25%, enabling more informed clinical management of patients [2\u0026ndash;6]. Importantly, RNA diagnostics also helps mitigate the reliance on biased population-specific reference data, offering an orthogonal approach to DNA-based variant interpretation. The current ACMG/AMP framework [1], supplemented with splicing-specific recommendations [7], denotes five ACMG/AMP codes for capturing potential splicing impact based on variant location, including: 1) splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions (PVS1); 2) variants located at other positions (PP3, BP4); 3) additional evidence against pathogenicity for synonymous or intronic variants with no predicted impact on splicing and thus no other likely mechanism (BP7); and 4) similarity of predicted impact compared to a known (likely) pathogenic variant (PS1). While bioinformatic tools that predict variant impact on splicing play a critical role as the initial step in variant assessment, RNA assay data adds value by: 1) confirming that a variant is spliceogenic i.e. alters native splicing profile compared to controls; 2) determining whether the impact on splicing is complete or partial; and 3) revealing the altered transcript profile, to infer protein-level impact e.g., spliceogenic predicted loss of function (LOF), functional or uncertain function [8].\u003c/p\u003e \u003cp\u003eAssay-based detection of splicing using RT-PCR and RNA-seq of RNA from patient samples has been used to measure impact on splicing associated with the variant allele, ranging from complete (i.e., no reference transcript), partial (i.e., varying levels of reference transcript), to none [9\u0026ndash;11]. However, results can vary based on factors such as tissue source, transcript isoform complexity, allele expression bias, and assay sensitivity [12], and it is difficult to track a partial effect for intronic variants using patient RNA. These limitations have led to the application of allele-specific minigene models in the research and diagnostic setting, not only to quantify allele-specific variant effects on splicing, but to enable variant impact assessment where patient RNA is unavailable [13\u0026ndash;15]. More recently, massively parallel splicing assays (MPSAs) have enabled high-throughput functional assessment of thousands of variants in a single experiment [16\u0026ndash;22]. These studies generate large datasets capturing quantitative effects of sequence variation on splicing and potentially provide a rich resource for benchmarking computational prediction tools and informing clinical variant interpretation.\u003c/p\u003e \u003cp\u003eThe existing ClinGen recommendations for use of splicing prediction and assay data provide high-level considerations for application of construct data for variant classification, and suggest \u0026ldquo;a conservative approach would be to apply information from construct data alone at lower weight in the absence of calibration of an experimental system against clinical data for proven spliceogenic and non-spliceogenic variants\u0026rdquo; [7]. Despite the increased availability of splicing data from construct-based assays, and their potential to provide valuable insights into the type and level of impact induced by sequence variants, for MPSA data especially there has been limited evaluation of splicing data consistency across assay platforms, cell models, and the pipelines used to process RNA data. In this study, we conducted a comprehensive analysis of the spliceogenic effects of approximately 41,000 variants that utilised a construct-based MPSA method or RT-PCR detection of traditional minigenes, in order to provide recommendations about the utility of such data for diagnostic variant classification following the ACMG/AMP framework.\u003c/p\u003e"},{"header":"METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy datasets\u003c/h2\u003e \u003cp\u003eRNA splicing data used in this study were accessed from published MPSA datasets and strategically selected construct-based RT-PCR datasets. We analyzed a total of 21 published datasets\u0026thinsp;\u0026minus;\u0026thinsp;7 MPSAs and 14 RT-PCR (details provided in Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). The MPSA datasets selected for analysis met the following inclusion criteria: 1) used a construct-based assay approach to assess the impact of sequence variants on RNA splicing; and 2) analysed more than 100 variants.\u003c/p\u003e \u003cp\u003eTo limit potential variability due to high-level differences in assay design and execution, the RT-PCR datasets selected for inclusion were from a single laboratory (author E.A.V.-S.), all of which used multi-exon minigene constructs to assess gene variants in established cancer susceptibility genes and applied the same method to semi-quantify the transcript products: 1) transfected cells were treated with cycloheximide, an inhibitor of nonsense mediated decay (NMD); and 2) fluorescent RT-PCR products were assessed by capillary electrophoresis on an automated DNA sequencer. We termed the combined RT-PCR datasets as the traditional minigene assay dataset in this study.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMPSA data preparation\u003c/h3\u003e\n\u003cp\u003eData from each study was downloaded from either the manuscript, supplementary information or an online repository (Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). A step-wise and study-specific approach was taken to identify the relevant transcript for SpliceAI prediction. O\u0026rsquo;Neill et al. (2024) [18] and Rong et al. (2023) [21] provided the relevant Ensembl or RefSeq transcript identifiers. For Adamson et al. (2018) [16] and Chong et al. (2019) [17], transcripts were identified based on matching the provided exon coordinates with MANE and RefSeq transcripts exon coordinates. Patel et al. (2021) [19], Soemedi et al. 2017) [22] and Rhine et al. (2022) [20] provided no exon coordinates for the respective assay designs, therefore transcripts were selected based on the variant coordinate, prioritising the MANE transcripts. Additionally, for Soemedi et al. (2017) [22], we assumed alleles were provided for the RNA transcript and not DNA. As SpliceAI requires variants to be annotated based on the DNA sequence, the allele information provided by Soemedi et al. (2017) [22] was compared to the DNA reference at the given locus; for variants discordant to the DNA reference (n\u0026thinsp;=\u0026thinsp;2,483), the complement of reported reference and variant allele was used. See Additional file 1: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e for detailed summary of MPSA designs, characteristics and study-reported thresholds, including limitations to detect certain types of aberrations.\u003c/p\u003e\n\u003ch3\u003eEvaluation of MPSA data for clinical application\u003c/h3\u003e\n\u003cp\u003eWe employed a multi-step approach to evaluate the discriminatory performance of MPSAs and establish their potential clinical utility.\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo investigate whether each MPSA effectively differentiated spliceogenic from non-spliceogenic variants using study-reported thresholds, the distribution of assay scores (e.g. delta percentage spliced in (PSI)) were visually assessed using a histogram; the presence of a bimodal or multi-modal distribution was considered as evidence for the ability of the assay to demarcate spliceogenic and non-spliceogenic variants. Unimodality was tested using Hartigan\u0026rsquo;s dip test with the dip.test package in R.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eWe assessed if the study-reported thresholds for detecting spliceogenic variants for different MPSA datasets were consistent with expectations based on SpliceAI predictions, using max delta score (hereafter referred to as SpliceAI score) thresholds from previous calibration of this tool for use in ACMG/AMP variant classification [7]. Splicing impact was only calculated for the transcript used in the relevant construct (see section \u0026lsquo;MPSA data preparation\u0026rsquo;).\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eTo determine if spliceogenic variants with predicted high impact were enriched within the splice site motifs as expected, particularly at the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, we visually examined the assay score distribution based on variant location. To further examine the capacity of MPSAs to detect variants highly likely to be spliceogenic, we selected the subset of variants located at the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, and plotted their assay and SpliceAI scores to determine if the MPSA consistently identified variants predicted to have impact as spliceogenic.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e\n\u003ch3\u003eData formatting and standardization for traditional minigene studies\u003c/h3\u003e\n\u003cp\u003eMinigene splicing data for 673 variants in seven cancer susceptibility genes (\u003cem\u003eATM, BRCA2, CHEK2, PALB2, RAD51C, RAD51D\u003c/em\u003e, and \u003cem\u003eTP53\u003c/em\u003e) were extracted from 14 publications (Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), and formatted for consistent presentation of semi-quantitative RT-PCR results (Additional file 1: Table S3). Assay results indicated the proportions of specific transcript products and the corresponding functional consequence in terms of effect on reading frame and the introduction of premature termination codon (PTC). To determine the level of splicing aberration induced by a variant relative to the level produced by the wild type (WT) construct, the reduction of canonical transcript (CT) was computed using this formula: [(CT\u003csub\u003eWT\u003c/sub\u003e \u0026ndash; CT\u003csub\u003evariant\u003c/sub\u003e)/CT\u003csub\u003eWT\u003c/sub\u003e] x 100. A 100% CT reduction indicated complete splicing aberration (that is, no canonical transcript produced by the variant allele). The variants were binned into four levels of splicing aberration based on CT reduction: high/complete (\u0026ge;\u0026thinsp;80% to 100%), intermediate (\u0026gt;\u0026thinsp;20% to \u0026lt;\u0026thinsp;80%), low (\u0026gt;\u0026thinsp;0% to \u0026le;\u0026thinsp;20%), or no aberration (\u0026le;\u0026thinsp;0%). The no aberration category included variants that increased the level of canonical transcript (that is, negative measure of CT reduction).\u003c/p\u003e\n\u003ch3\u003ePerformance evaluation of splicing prediction tools using the traditional minigene dataset\u003c/h3\u003e\n\u003cp\u003ePrediction tool evaluation was designed to build on our previous work, where SpliceAI was selected as the best-performing tool, and used to exemplify tool calibration providing evidence against/towards spliceogenicity [7]. Performance evaluation covered two aspects critical for the use of prediction data in variant curation. First we assessed if SpliceAI [23] score could be used to predict different levels of aberration: no vs. low-to-complete, no/low vs. intermediate-to-complete, and no-to-intermediate vs. high/complete. The maximum raw SpliceAI score was defined as the maximum probability of altered splicing across the four output probabilities at a maximum distance of 10,000 nucleotides (\u0026plusmn;\u0026thinsp;4,999 nucleotides from the variant of interest). To determine the optimal thresholds for separating the aberration levels, we performed Receiver Operating Characteristic (ROC) analysis using the pROC R package [24] and obtained the Youden index, which helped identify the optimal binary cut-off point on the ROC curve where the test performs best, balancing sensitivity and specificity. To determine the magnitude and direction of the correlation between the SpliceAI prediction and level of splicing aberration, we computed Kendall\u0026rsquo;s tau.\u003c/p\u003e \u003cp\u003eWe then examined the ability of the SAI-10k calculator (SAI-10k-calc) [25], recently modified to incorporate SpliceAI alternate scores [26], to accurately predict the variant-induced splicing events. For this assessment, variants with intermediate to complete aberration (\u0026gt;\u0026thinsp;20% CT reduction) were used as the positive control set, and no/low aberration (\u0026le;\u0026thinsp;20% CT reduction) as the negative control set. The concordance of SAI-10k-calc transcript predictions with the minigene assay results were calculated based on the following criteria: concordance of positive predictions (one or more predicted events were detected at \u0026gt;\u0026thinsp;20% of total transcript pool), concordance of negative predictions (agreement between predicted no aberrant transcript and no/low aberration detected in the assay), and overall concordance (total concordant positive and negative predictions). The SAI-10k-calc algorithm for predicting exon skipping and intron retention requires the loss of both native acceptor and donor splice sites. In instances where at least one of the SpliceAI-predicted losses affects a cryptic splice site, SAI-10k-calc flags the variant with an annotation \u0026ldquo;lost site/s do not match consensus\u0026rdquo;; in this study, we modified the flag into \u0026ldquo;lost site/s do not match native site\u0026rdquo; for clarity. For variants flagged by SAI-10k-calc with \u0026ldquo;lost site/s do not match native site\u0026rdquo; under the \u0026ldquo;Exon_skipping_aaseq\u0026rdquo; or \u0026ldquo;Intron_retention_aaseq\u0026rdquo; fields, we manually inspected the SpliceAI scores using SpliceAI-visual [27] based on the decision flowchart in Additional file 2: Fig. \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e to obtain their final transcript prediction. For variants where the predicted aberrant transcript was absent or expressed at low level and other dominant transcript/s were detected in the assay, we additionally checked if the predicted and observed transcripts had the same functional consequence (PTC or in-frame), and considered them as concordant in a separate comparison.\u003c/p\u003e \u003cp\u003eWe also assessed whether knowledge of naturally occurring splicing events improved the performance of SAI-10k-calc to predict splicing events, by considering information captured in the SpliceVault resource [28], a database of common splicing events observed in reference RNA-sequencing samples. To obtain the SpliceVault 300K-RNA Top-4 alternative splicing events [28], we used the Ensembl Variant Effect Predictor (VEP) v113.4 [29] with the SpliceVault VEP plugin (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/Ensembl/VEP_plugins/blob/release/113/SpliceVault.pm\u003c/span\u003e\u003cspan address=\"https://github.com/Ensembl/VEP_plugins/blob/release/113/SpliceVault.pm\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003e)\u003c/span\u003e, which annotates variants with SpliceAI-predicted loss of native splice sites only. We chose the Ensembl transcript identifier equivalent to the RefSeq MANE Select transcript. SpliceVault results were considered as concordant if the Top-4 events matched with at least one aberrant transcript exhibiting\u0026thinsp;\u0026gt;\u0026thinsp;20% expression in the minigene assay. Since native splice site loss is mostly caused by splice site motif disruption, SpliceVault concordance calculations were limited to variants in the splice donor and acceptor motifs, defined as in Walker et al., 2023 [7] (donor motif - the last three nucleotides of the exon to six nucleotides downstream of the exon; acceptor motif - the first nucleotide of the exon up to 20 nucleotides upstream of the exon).\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCalibration of SpliceAI for computational code application, using quantitative RNA results\u003c/h2\u003e \u003cp\u003eThe likelihood ratio (LR) of spliceogenicity for variants located outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions was estimated using the formula previously reported [30]. We initially used the previously established SpliceAI score thresholds [7] based on analysis of an RNA splicing data truth set that did not capture quantification of transcript levels (that is, categorization of splicing aberration was binary, recorded simply as Yes or No). Quantitative splicing data from the traditional minigene dataset analyzed in this study enabled an assessment of SpliceAI thresholds for optimal prediction of no aberration versus aberration at a given \u003cspan type=\"Underline\" class=\"Underline\" name=\"Emphasis\"\u003eminimum\u003c/span\u003e level, using the Youden index to estimate the optimal thresholds. Thresholds were estimated for best separation of no aberration versus low-to-complete, and for no/low aberration versus intermediate-to-complete aberration. We then used these thresholds to estimate LRs towards spliceogenicity (at the denoted minimum levels), and equated each LR with an evidence strength category according to the Bayesian framework of the ACMG/AMP variant classification guidelines [31].\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEvaluation of traditional minigene RNA results for diagnostic variant classification\u003c/h3\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eComparison of minigene and patient-derived RNA results:\u003c/h2\u003e \u003cp\u003eTraditional minigene assay results were compared with previously published splicing data derived from patient RNA. For positive patient RNA assay outcomes, concordance was assessed between the traditional minigene assay dataset and reported patient findings for all seven genes. Concordant results were defined as either complete concordance of all detected transcripts or high concordance where a single patient-derived transcript corresponded to the most abundant transcript observed in the minigene assay, or vice versa. Discordant results were defined as complete disagreement, where no transcripts overlapped between the two systems, or low concordance where patient transcript(s) aligned only with minor transcripts from the minigene assay. For variants with multiple patient RNA findings, outcomes were prioritised in the following order when assigning concordance: complete, high, low, and no concordance. For example, if a variant had two patient findings, one a complete concordance and the other a high concordance involving the most abundant transcript, it was classified as a complete concordance. Negative patient RNA assay outcomes were evaluated only for \u003cem\u003eBRCA2\u003c/em\u003e, leveraging our previously published in-house resource of splicing data extracted from publications, that provides comprehensive splicing information for this gene [7].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eClinical calibration to derive strength of evidence applicable to construct-based RNA findings\u003c/h2\u003e \u003cp\u003eAll variant annotations relevant for clinical calibration are provided in Additional file 1: Table S4. Classifications were extracted from ClinVar (October 2025), and crossmatched to the variants in the traditional minigene assay dataset. A total of 513 variants were reported at least once in ClinVar, with summary classification as follows: 12 Benign (B); 10 Benign/Likely benign (B/LB); 43 Likely benign (LB); 115 Uncertain significance (VUS); 111 conflicting classifications; 41 Likely pathogenic (LP); 81 Pathogenic/Likely pathogenic (P/LP); 100 Pathogenic (P). For variants with conflicting classifications of pathogenicity, application of a simple majority rule resolved summary classification for 109 variants: 13 B/LB, 6 LB, 73 VUS, 9 LP, 8 P/LP.\u003c/p\u003e \u003cp\u003eAs a critical step to perform a calibration that correlates the construct-based findings with clinical evidence of variant pathogenicity (and not simply variant spliceogenicity), RNA findings were reviewed to determine the percentage of transcripts predicted to cause gene LOF (RNA transcripts predicted non-coding, predicted protein truncating leading to NMD, predicted protein truncating without NMD but impacting a critical domain, and/or predicted to encode proteins lacking critical structural/functional motifs [8]. A transcript was assigned as predicted LOF if annotated as a PTC transcript, or if in-frame and eligible to be assigned at least moderate weight as reported in the original minigene RT-PCR publications (Additional file 1: Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e), or for \u003cem\u003eBRCA2\u003c/em\u003e, following the ClinGen ENIGMA Variant Curation Expert Panel (VCEP) specifications V1.2 [32]. The summed percentage of all predicted LOF transcripts observed for each variant was categorised as follows: \u0026le;20%; \u0026gt;20% and \u0026lt;\u0026thinsp;50%; \u0026ge;50% and \u0026lt;\u0026thinsp;80%; \u0026ge;80%.\u003c/p\u003e \u003cp\u003eVariants were assigned a variant type using VEP consequence annotation, to allow consideration of an alternative mechanism of pathogenicity for variants annotated as nonsense, frameshift or missense variants at the DNA level; stratification of splice region variants according to location inside or outside the highly conserved splice site dinucleotide positions. Variants were excluded from the calibration reference sets if they met the following criteria: summary classification as VUS (n\u0026thinsp;=\u0026thinsp;188) or conflicting (n\u0026thinsp;=\u0026thinsp;2); summary classification P/LP and DNA level consequence as nonsense, frameshift or deletion within a critical domain (n\u0026thinsp;=\u0026thinsp;43), of which only 5 displayed a high level of splice impact; or summary classification P/LP and missense consequence (n\u0026thinsp;=\u0026thinsp;28), of which only 7 displayed high level of splice impact.\u003c/p\u003e \u003cp\u003eVariants considered to be potential outliers for calibration were then reviewed for pathogenicity. This resulted in the exclusion of two variants with a (likely) pathogenic assertion and \u0026le;\u0026thinsp;20% predicted LOF transcripts. \u003cem\u003eBRCA2\u003c/em\u003e c.9501G\u0026thinsp;\u0026gt;\u0026thinsp;A (p.Glu3167=), with a single submitter (BIC), no assertion criteria, no star; classification VUS following the ClinGen ENIGMA BRCA1/2 VCEP specifications V1.2 [32], meeting PM2_Supporting (1 point), and no bioinformatic code (SpliceAI score 0.16). \u003cem\u003ePALB2\u003c/em\u003e c.108\u0026thinsp;+\u0026thinsp;2T\u0026thinsp;\u0026gt;\u0026thinsp;C, LP with a single submitter and no RNA data; this variant is designated as no PVS1 code applicable following the ClinGen Hereditary Breast Ovarian Pancreatic VCEP specifications V1.2 [33]. Classification remained unchanged for another five potential outlier variants: two B/LB classification and \u0026ge;\u0026thinsp;50% predicted LOF transcript from construct-based assays (\u003cem\u003eATM\u003c/em\u003e c.1066-6T\u0026thinsp;\u0026gt;\u0026thinsp;G; \u003cem\u003eBRCA2\u003c/em\u003e c.441A\u0026thinsp;\u0026gt;\u0026thinsp;G p.Gln147=); two P/LP \u003cem\u003ePALB2\u003c/em\u003e variants with complete impact on splicing in construct-based assays, but with \u0026lt;\u0026thinsp;50% predicted LOF transcripts due to exclusion of a transcript reaching only PVS1_Supporting (\u003cem\u003ePALB2\u003c/em\u003e c.211\u0026thinsp;+\u0026thinsp;1G\u0026thinsp;\u0026gt;\u0026thinsp;A and c.211\u0026thinsp;+\u0026thinsp;2T\u0026thinsp;\u0026gt;\u0026thinsp;C); and similarly the \u003cem\u003eBRCA2\u003c/em\u003e c.9257-1G\u0026thinsp;\u0026gt;\u0026thinsp;C variant with multiple P submissions (some supported by internal laboratory data), and complete impact on splicing resulting in an in-frame transcript reaching only PVS1_Supporting level. Additional review of ClinVar entries was then performed for all reference set variants located outside of the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, to assess potential for circularity due to use of the minigene data as the only RNA source for an individual variant classification (Additional file 1: Table S5). An additional variant, \u003cem\u003eBRCA2\u003c/em\u003e c.7875A\u0026thinsp;\u0026gt;\u0026thinsp;G (p.Arg2625=), was excluded since the two submitters that did not use the minigene data had conflicting assertions of VUS or LB.The final reference sets for clinical calibration were comprised of 83 B/LB variants (all outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions), and 166 P/LP variants (28 outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions).\u003c/p\u003e \u003cp\u003eLikelihood ratio (LR) estimates towards or against pathogenicity were estimated by comparing the proportion of variants meeting different splice percentage categories within the benign versus pathogenic reference sets, using the statistical method detailed previously [30]. The LR estimates were then used to assign weights for or against pathogenicity following recommendations arising from Bayesian modelling of the ACMG/AMP guidelines [31].\u003c/p\u003e \u003c/div\u003e"},{"header":"RESULTS","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eCharacterisation of MPSA study designs\u003c/h2\u003e \u003cp\u003eTo determine whether MPSA results can be used for clinical variant classification, we evaluated two types of splicing studies: 1) gene-specific assays that assessed the impact of variants in one or a few genes for which the clinical validity of gene-disease relationship is established [18, 19] and 2) broad MPSA screens including hundreds of genes that were selected irrespective of gene-disease mechanism [16, 17, 20\u0026ndash;22]. The study by Patel and colleagues used a construct with a 500 bp exon-intron-exon test sequence including partial exonic and intronic sequences surrounding the splice region (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eA), and all remaining studies evaluated constructs with a single test exon inserted between constant vector exons (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). The gene-specific ParSE-seq assay [18] used minigene constructs with a similar design to traditional single test exon splicing assays such that a wide range of exon sizes of up to ~\u0026thinsp;350 bp and 125\u0026ndash;250 bp of flanking intronic sequences were included (Additional file 1: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). The broad MPSA screens, including Massively Parallel Splicing Assay (MaPSy) [20\u0026ndash;22], Variant exon sequencing (Vex-seq) [16], and Multiplexed Functional Assay of Splicing using Sort-seq (MFASS) [17], used minigene constructs with inserted sequences limited to ~\u0026thinsp;170 bp. These broad screens included test exons of \u0026le;\u0026thinsp;120 bp in size and shorter flanking intronic sequences with a lower limit of 40\u0026ndash;50 bp upstream and 15\u0026ndash;30 bp downstream of the exon. To allow for the inclusion of variants in larger exons (120\u0026ndash;500 bp), Rong and colleagues designed a \u0026lsquo;half exon\u0026rsquo; construct that incorporated the 100 bp of context-specific sequence (including the nearest splice site) with a common exon sequence and the other splice site. Generally, the minigene constructs were transfected into cells, and the splicing outcomes were measured in a pooled manner with sequencing data as quantitative readout of the splicing impact of variants (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). For the MFASS method, splicing impact quantification was done by fluorescence-activated cell sorting based on green fluorescent protein and mCherry reporters [17] (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eC). Detailed characterisation of study designs is shown in Additional file 1: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eMPSA performance evaluation\u003c/h2\u003e \u003cp\u003eAll seven published MPSA datasets lacked information on specific splice isoforms produced by each variant assayed. For six of the seven MPSA datasets, the splicing assay scores were reported as the difference in the level of splicing (e.g. delta PSI) between the reference and variant constructs [16\u0026ndash;18] or as an allelic ratio [20\u0026ndash;22], and assay-specific thresholds determined whether or not a variant was considered to alter splicing (Additional file 1: Table \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). It should be noted that, the Vex-Seq and MFASS assays were limited to detecting exon skipping/inclusion, and at least three of the assays would (potentially) misclassify some splicing events as normal (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eD). Further, although assay scores were intended to provide a measure of splicing impact, outputs from all assays did not differentiate between the types of splicing aberration detected.\u003c/p\u003e \u003cp\u003eThe remaining dataset by Patel and colleagues used a Fisher Exact test of the proportion of reads supporting \u0026ldquo;no splicing\u0026rdquo;, \u0026ldquo;normal splicing\u0026rdquo; and \u0026ldquo;aberrant splicing\u003cem\u003e\u0026rdquo;\u003c/em\u003e produced from the variant construct compared to the WT construct with \u0026ldquo;splice-affecting\u0026rdquo; variants defined by a p\u0026thinsp;\u0026lt;\u0026thinsp;0.001 [19]. More detailed review of the study of Patel et al (2021) [19] revealed several issues that may compromise the clinical utility of their published assay results. The exon-intron-exon construct design could detect splice site loss or gain induced by splice region variants, but would be less likely to recapitulate the splicing aberrant transcripts \u003cem\u003ein vivo\u003c/em\u003e. For example, this design could not detect variant-induced exon skipping (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eA). The absence of splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants in the assay precluded their use for assay readout calibration, and a previous publication [34] did not show data to support their report of 100% concordance for 10 positive control variants known to alter splicing. To verify concordance of predicted spliceogenic variants with splice assay results, we recalculated the p values using the Fisher exact test with normalised counts as described by the authors, but could not replicate the findings using the authors\u0026rsquo; selected threshold for splice-affecting variants; 12 variants would have had different classification with our analysis.\u003c/p\u003e \u003cp\u003eTo reliably use splicing data generated by MPSA for variant curation, the assay scores from spliceogenic variants should be distinct from the distributions of non-spliceogenic variants. Only two studies [17, 18] had a multimodal distribution (Hartigan\u0026rsquo;s p \u0026lt;2x10\u003csup\u003e\u0026minus;\u0026thinsp;16\u003c/sup\u003e) of assay scores that were separated by thresholds defined by the study (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Additionally, O\u0026rsquo;Neill et al. defined three categories to distinguish \u0026ldquo;abnormal\u0026rdquo;, \u0026ldquo;indeterminate\u0026rdquo; and \u0026ldquo;normal\u0026rdquo; impact on splicing for the ParSE-seq dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAs a second approach to assess the assay reliability, correlation of SpliceAI predictions with assay-defined spliceogenic and non-spliceogenic variants was performed (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Only data from ParSE-seq correlated well with SpliceAI predictions: spliceogenic variants reported high SpliceAI scores (mean\u0026thinsp;=\u0026thinsp;0.79) and non-spliceogenic variants reported low SpliceAI scores (mean\u0026thinsp;=\u0026thinsp;0.14). \u0026ldquo;Indeterminate\u0026rdquo; variants from ParSE-seq had a bimodal SpliceAI score distribution with variants at each end of the spectrum. All other MPSA studies had notable overlap of SpliceAI score distributions between reported spliceogenic and non-spliceogenic variants, suggesting misclassification.\u003c/p\u003e \u003cp\u003eTo evaluate the suitability of MPSA data for clinical application, we assessed whether each MPSA categorized variants as spliceogenic based on prior expectations, namely variant location in relation to splice donor and acceptor motifs, and SpliceAI score. An overview of findings is summarized in Additional file 2: Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Spliceogenic variants with predicted impact were enriched within the splice site motifs as expected, particularly at the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e3\u003c/span\u003e, Additional file 2: Fig. \u003cspan refid=\"MOESM2\" class=\"InternalRef\"\u003eS2\u003c/span\u003e). For those studies that assessed variants located within the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions [16\u0026ndash;18], despite enrichment in spliceogenic variants, comparison of reported splice impact against SpliceAI predictions suggested differences in performance across the three methods (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e3\u003c/span\u003e). For ParSE-seq, 97% (30/31) of splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants were spliceogenic with only one variant (\u003cem\u003eSCN5A\u003c/em\u003e c.3957_3963\u0026thinsp;+\u0026thinsp;1dup) observed not to impact splicing; the SpliceAI score predicted no splicing impact, highlighting the accuracy of ParSE-seq for the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotides. For Vex-seq, only 76% (16/21) of splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants impacted splicing, and six variants had splicing assay results discordant with SpliceAI predictions (five predicted spliceogenic but with no observed splicing impact). For MFASS, 83% (180/217) of splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants impacted splicing, with 35 (19%) of these \u0026plusmn;\u0026thinsp;1,2 dinucleotide variants predicted as non-spliceogenic (SpliceAI score\u0026thinsp;\u0026lt;\u0026thinsp;0.1). While specificity measures (Supplementary Table S6) could be interpreted to indicate that all MPSA were more accurate at identifying variants that did not impact splicing compared to those that did impact splicing, this reflects the fact that most variants assayed (87%) were located outside the splice region where variation has low prior probability to alter splicing.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eTraditional minigene assay dataset\u003c/h2\u003e \u003cp\u003eThe dataset consisted of 600 single nucleotide variants (SNVs) and 73 indels from seven cancer susceptibility genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eA): 359 were exonic, 307 were intronic, and seven were indels spanning the exonic and intronic regions. Of the 673 variants, 23% (135 SNVs and 20 indels) were located at or disrupted the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions. Seven of the 16 WT minigene constructs produced 100% canonical transcript, and the other nine exhibited background alternative splicing (Supplementary Table S7). To account for background splicing, we normalised the level of canonical transcript induced by all variants by calculating the CT reduction (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eB). CT reduction distribution showed that the vast majority of variants resulted in either no or complete aberration (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). Specifically, 37% (252/673) of the variants resulted in no aberration or increased the canonical transcript level, while 63% (421/673) had splicing impact (82 low, 73 intermediate, and 266 high/complete) (Additional file 1: Table S3). All but one of the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants showed a high/complete aberration (\u0026ge;\u0026thinsp;80% to 100% CT reduction). The \u003cem\u003ePALB2\u003c/em\u003e c.108\u0026thinsp;+\u0026thinsp;2T\u0026thinsp;\u0026gt;\u0026thinsp;C variant showed low impact on splicing with evidence of alternative splicing in \u0026lt;\u0026thinsp;15% of transcripts. Notably, this variant resulted from a donor GT\u0026thinsp;\u0026gt;\u0026thinsp;GC substitution and therefore retains the ability to be processed by the U2-type spliceosome [35].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of SpliceAI performance for predicting level of aberration\u003c/h2\u003e \u003cp\u003eOur analysis of the traditional minigene dataset showed that SpliceAI score for a variant generally increases with the categorized levels of variant-induced splicing aberration (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eB). There was a significant difference in SpliceAI scores (Kruskal-Wallis p\u0026thinsp;\u0026lt;\u0026thinsp;0.0001) across the no (median\u0026thinsp;=\u0026thinsp;0.025), low (median\u0026thinsp;=\u0026thinsp;0.125), intermediate (median\u0026thinsp;=\u0026thinsp;0.38), and high/complete (median\u0026thinsp;=\u0026thinsp;0.96) aberration categories. Moreover, ROC analysis showed that SpliceAI can distinguish between different aberration level categories with good accuracy (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eC, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eD). The optimal thresholds to predict different ranges of aberration levels were: 0.185 for \u0026ldquo;no versus low-to-complete\u0026rdquo; (AUC\u0026thinsp;=\u0026thinsp;0.924), 0.285 for \u0026ldquo;no/low versus intermediate-to-complete\u0026rdquo; (AUC\u0026thinsp;=\u0026thinsp;0.955), and 0.45 for \u0026ldquo;no-to-intermediate versus high/complete\u0026rdquo; (AUC\u0026thinsp;=\u0026thinsp;0.967). Importantly, the 0.285 threshold showed 90% sensitivity and 90% specificity for separating no/low versus intermediate-to-complete aberration, while the 0.45 threshold achieved 95% sensitivity and 89% specificity for separating high/complete from no-to-intermediate aberration (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003eWe then tabulated results to assess the proportion of variants falling into the different aberration level categories for different SpliceAI score bins (Additional file 1: Tables S8 and S9), some selected to capture thresholds recommended previously e.g. \u0026lt;0.05, 0.1, 0.2, 0.5 and 0.8 [7, 23, 36]. For the 518 variants located outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, there was convincing evidence for a correlation between SpliceAI score and level of aberration (τ\u0026thinsp;=\u0026thinsp;0.66, p\u0026thinsp;=\u0026thinsp;3.2 e-72) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Additional file 1: Table S8). Of the spliceogenic variants with SpliceAI score\u0026thinsp;\u0026le;\u0026thinsp;0.05, 78% had low level impact and only 8% (3/26) had high/complete impact. For the latter, the results for two of three variants are indicative of impact via splicing regulatory elements which remain poorly predicted by SpliceAI and other tools [26, 36]. In contrast, for spliceogenic variants with SpliceAI score between \u0026ge;\u0026thinsp;0.2 and \u0026lt;\u0026thinsp;0.5, 24% had low, 50% had intermediate, and 26% had high/complete aberration. At SpliceAI score\u0026thinsp;\u0026ge;\u0026thinsp;0.5, 72% of spliceogenic variants had high/complete aberration: 56% for those from \u0026ge;\u0026thinsp;0.5 and \u0026lt;\u0026thinsp;0.8, and 83% for those with score\u0026thinsp;\u0026ge;\u0026thinsp;0.8. Notably, no variants with SpliceAI score\u0026thinsp;\u0026ge;\u0026thinsp;0.8 were shown to be non-spliceogenic in this dataset. Unsurprisingly, the correlation was even more striking when analyses included variants located at the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions (Additional file 1: Table S9), the vast majority of which had SpliceAI score\u0026thinsp;\u0026ge;\u0026thinsp;0.8 (τ\u0026thinsp;=\u0026thinsp;0.76, p\u0026thinsp;=\u0026thinsp;1.6e-121).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDistribution of variants outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions across different ranges of SpliceAI max delta scores.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eSpliceAI max delta score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c6\" namest=\"c2\"\u003e \u003cp\u003eNo. of variants\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c9\" namest=\"c7\"\u003e \u003cp\u003eProportion in the spliceogenic set\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-spliceogenic set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSpliceogenic set\u003csup\u003e\u003cem\u003ea\u003c/em\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eLow\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eIntermediate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eHigh/\u003c/p\u003e \u003cp\u003ecomplete\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eLow\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eIntermediate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eHigh/\u003c/p\u003e \u003cp\u003ecomplete\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e167\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3\u003csup\u003e\u003cem\u003eb\u003c/em\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.08\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.05 \u0026amp; \u0026le;0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.1 \u0026amp; \u0026lt;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e22\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.03\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.2 \u0026amp; \u0026lt;0.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e54\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.26\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.5 \u0026amp; \u0026lt;0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e29\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.56\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c8\"\u003e \u003cp\u003e0.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e0.83\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL =\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e252\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e266\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e112\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003e\u003csup\u003e\u003cem\u003ea\u003c/em\u003e\u003c/sup\u003eThe spliceogenic set is subdivided into three levels of splicing impact based on CT reduction - low (\u0026gt;\u0026thinsp;0% to \u0026le;\u0026thinsp;20%), intermediate (\u0026gt;\u0026thinsp;20% to \u0026lt;\u0026thinsp;80%), and high/complete (\u0026ge;\u0026thinsp;80% to 100%). \u003csup\u003e\u003cem\u003eb\u003c/em\u003e\u003c/sup\u003eTwo of the three false-negatives (\u003cem\u003eBRCA2\u003c/em\u003e c.441A\u0026thinsp;\u0026gt;\u0026thinsp;G and c.451G\u0026thinsp;\u0026gt;\u0026thinsp;A) were located outside the donor/acceptor motifs and led to (multi-)exon skipping, indicating effect on splicing regulatory elements; the remaining false-negative (\u003cem\u003ePALB2\u003c/em\u003e c.3113\u0026thinsp;+\u0026thinsp;3A\u0026thinsp;\u0026gt;\u0026thinsp;G) resulted in donor loss leading to cryptic donor activation and exon skipping. Correlation between SpliceAI score and level of aberration: τ\u0026thinsp;=\u0026thinsp;0.66, p\u0026thinsp;=\u0026thinsp;3.2 e-72.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFor variants outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, further stratification of the bins for scores\u0026thinsp;\u0026ge;\u0026thinsp;0.2 indicated that, based on similarity of proportions in the spliceogenic set for different bin strata, there was slightly improved correlation between SpliceAI score and level of aberration when scores were binned as: \u0026ge;0.2 \u0026amp; \u0026lt;0.3; \u0026ge;0.3 \u0026amp; \u0026lt;0.4; \u0026ge;0.4 \u0026amp; \u0026lt;0.75; \u0026ge;0.75 (τ\u0026thinsp;=\u0026thinsp;0.66, p\u0026thinsp;=\u0026thinsp;1.4e-73; Additional file 1: Table S9).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eCalibration of SpliceAI for refined ACMG/AMP computational code application using quantitative traditional minigene assay data\u003c/h2\u003e \u003cp\u003eThe ClinGen SVI Splicing Subgroup [7] previously performed a calibration to demonstrate the utility of SpliceAI to provide evidence towards or against spliceogenicity for variants located outside of the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions, providing the basis for selecting score thresholds for conservatively assigning computational codes BS4 and PP3. However, this previous analysis only considered whether a variant may result in aberrant splicing since information about the level of the variant-induced event/s was either lacking or measured heterogeneously across the different studies forming the reference datasets.\u003c/p\u003e \u003cp\u003eBuilding on the findings above (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e; Additional file 1: Tables S8 and S9), we performed LR analysis to estimate the evidence strength applicable for SpliceAI score thresholds to distinguish different ranges of splicing aberration levels. Results are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. Using \u0026ldquo;no aberration\u0026rdquo; as the negative truthset, and thresholds previously shown to yield moderate evidence towards or against spliceogenicity [7], the LR for a SpliceAI score threshold of \u0026ge;\u0026thinsp;0.2 for predicted aberrant splicing equated to a moderate evidence for spliceogenicity (low to complete aberration), score\u0026thinsp;\u0026gt;\u0026thinsp;0.1 and \u0026lt;\u0026thinsp;0.2 equated to indeterminate evidence, but score\u0026thinsp;\u0026le;\u0026thinsp;0.1 equated to supporting evidence for no aberration. Given that this semi-quantitative data would be expected to place previously undetected \u0026ldquo;low\u0026rdquo; level events in the positive reference set, we estimated LRs using alternative score thresholds and impact-level groups. Altering the lower threshold to \u0026le;\u0026thinsp;0.05 gave an LR equating to moderate evidence for no versus low-to-complete aberration, with \u0026gt;\u0026thinsp;0.05 and \u0026lt;\u0026thinsp;0.02 providing no evidence. Then, since there is data to indicate that low level of aberrant splicing (\u0026le;\u0026thinsp;20% expression) is very unlikely to confer pathogenicity in the context of hereditary cancer genes at least [37, 38], while intermediate level aberrant splicing is adequate to confer pathogenicity for at least some genes [39, 40], we compared no/low as the negative reference set versus intermediate-to-complete aberration as the positive control set. Our analysis showed that SpliceAI score threshold of \u0026ge;\u0026thinsp;0.3 provides moderate evidence for intermediate to complete aberration, and score\u0026thinsp;\u0026le;\u0026thinsp;0.2 provides moderate evidence for no/low aberration. These thresholds showed 82% sensitivity and 86% specificity for predicting intermediate to complete aberration due to variants located outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions. For genes where lower levels of aberrant splicing (e.g. 10\u0026ndash;20% expression) are proven to be associated with disease predisposition, other thresholds may be considered relevant for LR estimation and application of bioinformatic evidence of disease-associated splicing aberrations.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eLikelihood ratio analysis of the SpliceAI max delta score for variants outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eSpliceAI max delta score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eNegative set\u003csup\u003e\u003cem\u003ea\u003c/em\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003ePositive set\u003csup\u003e\u003cem\u003ea\u003c/em\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLow CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eHigh CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eEvidence strength\u003csup\u003e\u003cem\u003eb\u003c/em\u003e\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003en\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eProportion\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003en\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eProportion\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"9\" nameend=\"c9\" namest=\"c1\"\u003e \u003cp\u003eno aberration vs. low-to-complete aberration using score thresholds from previous calibration\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;0.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e198\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eSupporting (no aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.1 \u0026amp; \u0026lt;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eIndeterminate\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e184\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e7.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e4.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e10.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eModerate (low-to-complete aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL =\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e252\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e266\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"9\" nameend=\"c9\" namest=\"c1\"\u003e \u003cp\u003eno aberration vs. low-to-complete aberration, using score thresholds set for this semiquantitative dataset\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e167\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e36\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eModerate (no aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.05 \u0026amp; \u0026lt;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.71\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eIndeterminate\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e184\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.69\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e7.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e4.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e10.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eModerate (low-to-complete aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e252\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e266\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"9\" nameend=\"c9\" namest=\"c1\"\u003e \u003cp\u003eno/low aberration vs. intermediate-to-complete aberration,using score thresholds set for this semiquantitative dataset\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;0.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e287\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.23\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eModerate (no/low aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;0.2 \u0026amp; \u0026lt;0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e2.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eIndeterminate\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;0.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e151\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.82\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e8.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e5.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e11.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eModerate (intermediate-to-complete aberration)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e333\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e185\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003eAbbreviation: LR, likelihood ratio; CI, confidence interval; n, number.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003e\u003csup\u003e\u003cem\u003ea\u003c/em\u003e\u003c/sup\u003eThe negative and positive sets were adjusted according to the level of aberration being tested. Data drawn from traditional minigene assay dataset only.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003e\u003csup\u003e\u003cem\u003eb\u003c/em\u003e\u003c/sup\u003eCriteria thresholds as defined in Tavtigian et al., 2018 [31].\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of SAI-10k-calc and SpliceVault performance for predicting aberration type\u003c/h2\u003e \u003cp\u003eOf the 673 variants, 600 were SNVs enabling computational splicing assessment with SAI-10k-calc (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eE), which automatically predicts the transcript type based on the combination of SpliceAI scores. For this assessment, we designated the variants with intermediate-to-complete aberration as the positive control set (n\u0026thinsp;=\u0026thinsp;299), and no/low aberration as the negative control set (n\u0026thinsp;=\u0026thinsp;301). This grouping was selected based on results shown in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, where the proportion of true positives (82%) and true negatives (86%) using SpliceAI score thresholds\u0026thinsp;\u0026le;\u0026thinsp;0.2 and \u0026ge;\u0026thinsp;0.3 were markedly higher than true positives (69%) and true negatives (79%) observed using previously recommended thresholds (\u0026le;\u0026thinsp;0.1 and \u0026ge;\u0026thinsp;0.2) from calibration using non-quantitative RNA findings. Detailed descriptions of concordance and discordance between predicted and observed transcripts are provided in Additional file 1: Tables S3 and S10. SAI-10k-calc aberrant transcript prediction for SNVs was 83% (496/600) concordant with minigene assay results (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eF). Another 8% (50/600) of the SNVs were flagged by SAI-10k-calc and underwent manual inspection using SpliceAI-visual [27] to obtain the final transcript prediction (see Methods). SAI-10k-calc transcript prediction with additional SpliceAI-visual manual inspection achieved a slightly greater overall concordance of 84% (506/600) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eF); concordance was 79% (235/299) for positive prediction of aberration (as defined above, at least intermediate (\u0026gt;\u0026thinsp;20%) expression level), and 90% (271/301) for negative prediction of aberration (no/low aberration). For 18/27 variants assigned as discordant or partial concordance, where the predicted aberrant transcript was absent or expressed at low level and other dominant transcript/s were detected in the assay (coded as 3\u0026thinsp;+\u0026thinsp;and 4\u0026thinsp;+\u0026thinsp;in Additional file 1: Table S10), the predicted and observed transcripts had the same functional consequence (PTC or in-frame); considering these additional 18 variants with concordance in functional consequence, SAI-10k-calc overall concordance increased to 87% (524/600) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eF).\u003c/p\u003e \u003cp\u003eWe also assessed the concordance of SpliceVault [28] Top-4 alternative splicing events with minigene assay results (Additional file 1: Table S3). The SpliceVault VEP plugin returned Top-4 events for 39% (261/673) variants, including 242 SNVs and 19 indels (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eE); of these, 254 variants were located in the splice donor and acceptor motifs. Since SpliceVault predicts the alternative splicing events including effect on reading frame chiefly when the native splice site motif is abrogated, we calculated the concordance of SpliceVault Top-4 using 291 positive controls (257 SNVs, 34 indels) located in the splice donor and acceptor motifs only. SpliceVault Top-4 events matched with at least one transcript with \u0026gt;\u0026thinsp;20% expression for 75% (192/257) of the positive control SNVs (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eG) and 50% (17/34) of the indels. All variants returning SpliceVault predictions had SpliceAI score\u0026thinsp;\u0026ge;\u0026thinsp;0.2. For the same set of 257 positive control SNVs, we observed greater concordance for SAI-10k-calc alone (79%) and SAI-10k-calc with SpliceAI-visual (82%) (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e5\u003c/span\u003eG).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eComparison of minigene and patient RNA results\u003c/h2\u003e \u003cp\u003eOur literature search identified 114 variants in the traditional minigene assay dataset with published patient RNA splicing assay results (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e6\u003c/span\u003eA), of which 91 variants demonstrated impact on splicing in patient assays (Additional file 1: Table S11) and 23 variants did not (Additional file 1: Table S12). Patient assays were conducted without NMD inhibition for 55%, only with NMD inhibition for 28%, and with or without NMD inhibition across different studies for 17%. For the 34 variants assayed in multiple patient RNA studies, 47% showed complete concordance, 32% high concordance, 12% low concordance (consistent outcomes in only a subset of assays), and 9% no concordance across all assays (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e6\u003c/span\u003eB). Variability across patient RNA results for the same variant could be ascribed to methodological issues, in particular failure to detect transcripts due to lack of NMD inhibition or detection limits (Additional file 1: Table S11). Patient genomic context such as common polymorphisms may also cause upregulation of naturally occurring alternative transcripts, which could contribute to patient RNA results variability.\u003c/p\u003e \u003cp\u003eFor variants with impact on splicing from the patient RNA assays, comparison of minigene and patient-derived RNA results revealed a high level of agreement across all seven genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, Additional file 1: Table S11). Overall, 89% of variants had concordant splicing outcomes (45% complete concordance, and 44% high concordance where the predominant transcripts agreed across the two systems). The remaining 11% were discordant, comprising 8% with low concordance (where agreement was limited to minor transcripts only) and 3% with no concordance.\u003c/p\u003e \u003cp\u003eAmong the 40 highly concordant cases, two involved a single aberrant minigene transcript matching with the most abundant patient-derived transcript, and 38 had a single patient-derived transcript matching the most abundant transcript detected in the minigene assay. Of the latter, the minigene assay revealed additional transcripts in comparison to patient RNA assays, either lowly expressed (\u0026le;\u0026thinsp;20% of the total RNA pool, n\u0026thinsp;=\u0026thinsp;30) or moderately expressed transcripts (23\u0026ndash;48%, n\u0026thinsp;=\u0026thinsp;8) that were likewise absent in patient RNA assays.\u003c/p\u003e \u003cp\u003ePossible explanations for the observed discrepancies between the minigene and patient findings are detailed in Additional file 1: Table S11. In general, the splicing outcome pattern likely reflects the greater sensitivity of the minigene semi-quantification method in detecting low-abundance splice products as these are probably missed in patient RNA assays using agarose gel electrophoresis and Sanger sequencing. In addition to the detection limit, agarose gel lacks the resolution to distinguish transcripts that differ by only a few nucleotides from the FL transcript and therefore may not be flagged for further characterisation. Moreover, most of the patient RNA assays were conducted without NMD inhibition, which likely contributed to the absence of certain PTC-containing transcripts that were detected in the minigene assay under NMD-inhibited conditions. Other possible explanations include differences in the location of primers, alternative splicing specific to leukocytes or lymphoblastoid cell lines from patients, minigene construct design limitation (e.g. removal of gene context important to splicing), and use of different techniques and reagents.\u003c/p\u003e \u003cp\u003eFor \u003cem\u003eBRCA2\u003c/em\u003e, where comprehensive splicing assay data were available through our in-house database, assay outcomes indicating no aberration also demonstrated high agreement with patient RNA data. Of the 23 \u003cem\u003eBRCA2\u003c/em\u003e variants with no aberration in patient RNA assays (Additional file 1: Table S12), 22 (96%) showed no/low aberration in the minigene assay. One variant (\u003cem\u003eBRCA2\u003c/em\u003e c.441A\u0026thinsp;\u0026gt;\u0026thinsp;G), showed complete splicing impact (100% expression of PTC-containing transcripts) in the minigene assay but no detectable impact in the assay of patient RNA from blood without NMD inhibition [9]. This variant has a low SpliceAI score of 0.04; however, it was previously identified to lie within an exonic splicing enhancer (ESE) motif based on microdeletion analysis [41]. In addition to possible degradation of PTC-containing transcripts in the patient sample, differences in splicing factor activity between the minigene system and blood may explain the discordance.\u003c/p\u003e \u003cp\u003eThese findings indicate that the minigene assay reliably recapitulates patient-derived splicing patterns in most cases, with discrepancies occurring primarily in low-abundance splicing events.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eClinical calibration to derive evidence strength applicable for construct-based RNA results\u003c/h2\u003e \u003cp\u003eClinical calibration of the collated minigene dataset showed that variants leading to high (\u0026ge;\u0026thinsp;80%) or low (\u0026le;\u0026thinsp;20%) expression of transcripts with predicted LOF consequence provide strong evidence towards or against pathogenicity, respectively (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Evidence strength was unchanged after exclusion of higher-scoring variants at splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions (Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Given the far lower confidence in the evidence strength estimate for the remaining two categories (\u0026gt;\u0026thinsp;20 and \u0026lt;\u0026thinsp;50; \u0026ge;50 and \u0026lt;\u0026thinsp;80), it is suggested to apply no evidence for these categories based on the current findings. Sensitivity analysis limiting the B/LB reference set to variants with at least one B classification, and the P/LP reference set to variants with at least one P classification, did not alter findings (data not shown).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClinical calibration of traditional minigene assay results considering levels of predicted LOF transcripts\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePercentage of predicted LOF transcripts\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eB/LB reference set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eP/LP reference set\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLow CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eHigh CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eEvidence strength based on LR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eSuggested evidence strength considering CI\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003ePoints\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003en\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eProportion\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003en\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eProportion\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"11\" nameend=\"c11\" namest=\"c1\"\u003e \u003cp\u003eCalibration including variants at splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.05\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eBenign_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eBenign_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e-4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;20 \u0026amp; \u0026lt;50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eBenign_moderate\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo evidence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;50 \u0026amp; \u0026lt;80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3.50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e27.98\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePathogenic_supporting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo evidence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e156\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.94\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e78.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e11.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e547.44\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePathogenic_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ePathogenic_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e+\u0026thinsp;4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL =\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e166\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"11\" nameend=\"c11\" namest=\"c1\"\u003e \u003cp\u003eAll reference set variants, n\u0026thinsp;=\u0026thinsp;249\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"11\" nameend=\"c11\" namest=\"c1\"\u003e \u003cp\u003eCalibration excluding variants at splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026le;\u0026thinsp;20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e76\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.92\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eBenign_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eBenign_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e-4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026gt;\u0026thinsp;20 \u0026amp; \u0026lt;50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.06\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.02\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e4.62\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003eBenign_supporting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo evidence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;50 \u0026amp; \u0026lt;80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.19\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e45.84\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePathogenic_supporting\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003eNo evidence\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u0026ge;\u0026thinsp;80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.01\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e80.04\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e11.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e562.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ePathogenic_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ePathogenic_strong\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003e+\u0026thinsp;4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTOTAL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"11\"\u003eAll reference set variants, n\u0026thinsp;=\u0026thinsp;111\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"11\"\u003eAbbreviation: B, Benign; LB, Likely Benign; P, Pathogenic; LP, Likely Pathogenic; LR, likelihood ratio; CI, confidence interval; n, number.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eWe conducted a comprehensive evaluation of splicing assay datasets from MPSAs (7 datasets, 41,178 variants in total) and traditional minigene RT-PCR assays (14 selected datasets, 673 variants in total), and summarise below how the findings may inform recommendations for the application of construct-based splicing assay and bioinformatic prediction data for clinical variant classification.\u003c/p\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eCaution related to use of MPSA data for clinical variant classification\u003c/h2\u003e \u003cp\u003eOur analysis of MPSA datasets ranging from gene-specific multiplexed assays (ParSE-seq and \u003cem\u003eTTN\u003c/em\u003e assay) to broad MPSA screens (MaPSy, Vex-seq, and MFASS) has highlighted the need for rigorous evaluation of design and performance of individual datasets to justify use (or not) of MPSA-derived RNA results in clinical variant classification. The MPSAs all lacked information on specific alternatively spliced transcripts produced by each variant assayed; this absence of detailed characterisation of aberrant transcripts in MPSAs prevents the determination of effect on the amino acid sequence, a crucial factor for predicting the ultimate functional consequence of a spliceogenic variant, and thus its likelihood to cause disease. All MPSA datasets were also limited by assay design, restricting detection to specific splicing events and thereby capturing only a subset of RNA alterations (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e1\u003c/span\u003eD; Additional file 1: Table S3). Even considering the assay design limitations, the assay scores generated by five of seven MPSA datasets (MaPSy (3 datasets), Vex-seq, and \u003cem\u003eTTN\u003c/em\u003e assay) were not able to clearly differentiate aberrant splicing events from normal splicing events, based on both overall distribution and correlation of reported splicing impact with SpliceAI score (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Notably, our scrutiny of assay data for splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide variants with high probability to impact splicing revealed that a considerable number had no observed splicing impact for the Vexseq (24%) and MFASS (17%) datasets. In summary, all these assays exhibited significant experimental noise that contributed to the uncertainty of assay results. For Vex-seq and MaPSy, this was previously highlighted by The Critical Assessment of Genome Interpretation Consortium [42]. For MaPSy, this noise may be in part due to the assay normalisation. MaPSy used the variant sequence to de-multiplex and could only measure reads that mapped to normal exon inclusion. These counts were normalised to the DNA input (the amount of minigene added into the experiment) and makes the large assumption that the transfection efficiency and transcription rate of each minigene for each cell will be equal.\u003c/p\u003e \u003cp\u003eIn contrast, the ParSE-seq dataset showed all the characteristics of a well-performing assay, with a bimodal distribution, expected correlation with SpliceAI score, and complete concordance between observed aberration and prediction of splicing impact for variants at the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions. This likely reflects that this \u003cem\u003eSCN5A\u003c/em\u003e gene-specific assay offered a key experimental advantage that was absent in broad MPSA screens. The ParSE-seq minigene constructs included longer flanking intronic sequences and thus are expected to have captured more intronic splicing signals than the minigene constructs used in broad MPSA screens. In addition, the study design itself allowed for assessment of assay performance: the ParSE-seq approach utilized two cell lines, one of which was physiologically relevant; validation was carried out for selected variants using orthogonal methods; the clinical validity of the assay was evaluated by including ClinVar P/LP and B/LB variants as validation controls. The ClinVar controls formed the basis for clinical calibration of the assay according to the framework developed by Brnich et al. (2019) [43], reaching a strong level of evidence towards or against pathogenicity. However, we note that the subsequent application of RNA-based evidence for clinical variant classification as part of the study by O\u0026rsquo;Neill et al. (2024) [18] did not openly consider aberration type and effect on reading frame, and assumed that all variants that induced abnormal splicing had deleterious biological consequences. Characterisation of the aberrant transcript(s) produced by each variant assayed was not included in the published dataset to confirm this assumption.\u003c/p\u003e \u003cp\u003eGiven the poor performance of specific MPSA datasets, any inferences from these data (including benchmarking) should be viewed with caution. For example, AlphaGenome used the MFASS dataset to benchmark its algorithm against SpliceAI and other splicing predictors [44], and demonstrated poor performance for all predictors (AUC\u0026thinsp;\u0026lt;\u0026thinsp;0.54), in line with our findings.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eValue of selected traditional minigene data for informing clinical variant classification\u003c/h2\u003e \u003cp\u003eIn contrast to the findings from our evaluation of MPSA datasets, we illustrate that strategically selected traditional minigene studies generating semi-quantitative RT-PCR data provide information that can benefit clinical variant classification - either directly as RNA evidence of variant impact or by informing the application of splicing aberration prediction data in classification algorithms.\u003c/p\u003e \u003cp\u003eFirst, we demonstrate that the selected traditional minigene assays perform very well in measuring variant impact on splicing, from comparison of the experimentally observed variant-induced splicing aberration versus those expected based on SpliceAI prediction - using a variety of score thresholds previously recommended to separate variants according to likelihood to impact splicing. This is unsurprising, since the studies were specifically selected because they used multi-exon construct design (ranging from three to nine exons) that delivered good experimental reproducibility due to highly standardised protocols using the same reagents; all assays were performed in a minimum of three replicates, producing results with low standard deviation.\u003c/p\u003e \u003cp\u003eSecond, constructs composed of three or more exons generally show similar or identical results to those derived from patient RNA (where these are available) when the baseline splicing pattern of the minigene construct has been confirmed to match that observed in patient-derived RNA. As a specific example, variant \u003cem\u003eBRCA2\u003c/em\u003e c.8488-1G\u0026thinsp;\u0026gt;\u0026thinsp;A showed different splicing outcomes in 2-exon [45] and 9-exon [46] minigenes, with the latter emulating results from patient material. Here, we demonstrate that results from the selected construct-based assays were highly concordant with those generated from patient RNA; the single outlier variant (complete impact for construct, no impact for patient RNA) was located within an ESE, with splice impact potentially arising due to the truncated genomic context of the construct and absence of sequence essential for regulating splicing activity in vivo. Moreover, there was considerable variation between different patient RNA results for a given variant, likely due to differences in experimental methods. Assuming that patient RNA results remain the gold-standard as the source of information for variant impact on splicing, these findings indicate that minigene assay results can provide a reliable source of information where \u0026ldquo;validation\u0026rdquo; of construct-based assay findings is achieved through faithful replication of results from \u003cem\u003ewell-designed\u003c/em\u003e patient RNA assays for a subset of minigene results. Overall, these results also justify the use of construct-based data as a means to quantify allele-specific expression of aberrant transcripts to supplement patient-derived RNA results.\u003c/p\u003e \u003cp\u003eThird, we have demonstrated through clinical calibration that multi-exon construct-based RNA results, after considering the aberration type and level, can provide strong evidence towards or against pathogenicity. Our calibration findings provide guidance, for hereditary cancer genes at least, regarding the association between level of predicted LOF transcripts and pathogenicity; the LR estimates indicate that more extreme expression level categories (\u0026le;\u0026thinsp;20% and \u0026ge;\u0026thinsp;80%) are required to confidently assign evidence against or towards pathogenicity. Although there appeared to be an ordered trend in evidence strength for bins representing increasing levels of predicted LOF transcript/s induced by a variant, statistical power was constrained by the limited number of variants for the inner bins in particular. We believe that larger studies - preferably gene-specific - are needed to provide more clarity about strength of evidence associated with \u0026ldquo;intermediate\u0026rdquo; levels of aberrant splicing.\u003c/p\u003e \u003cp\u003eImportantly, we have also shown the extensive value of minigene assay results for informing bioinformatic prediction of both level and type of aberrant splicing induced by a variant. Our results provide a baseline for selecting SpliceAI score bin thresholds that provide not only a probability that a variant will be spliceogenic, but how likely the level of splicing will be high-to-complete. This allows more nuanced estimation of the LRs for bioinformatic bins, with potential to inform gene-specific score thresholds selected based on gene-disease knowledge, to separate disease-associated levels of aberrant splicing from those that are tolerated. It is also possible to better utilise SpliceAI predictions by using SAI-10k-calc to predict the variant-induced aberration(s), which in our analysis demonstrated 79% concordance for positive prediction of aberration type. This approach may advance the use of such predictions beyond spliceogenicity to determine pathogenicity (or benignity) through the application of gene-specific knowledge about transcript structure and clinically important protein domains. Moreover, our results suggest that concordance in functional consequence assigned to predicted transcripts versus observed transcripts may reach as high as 87%, with no material difference in the expected clinical relevance for ~⅔ of variants assigned as partial or complete discordance from our analysis.\u003c/p\u003e \u003cp\u003eAnother important outcome of the results from our minigene analysis is their implications for practical application of the PS1 code in the context of predicted splicing impact. Namely, it is possible to infer pathogenicity for a predicted spliceogenic variant based on pathogenic classification for another variant with the same predicted impact on splicing [7]. As currently stated, the prerequisite for applying the PS1 codes is as follows: the predicted event of the variant under assessment must \u0026ldquo;precisely match the predicted event of the comparison (likely) pathogenic variant (e.g., both predicted to lead to exon skipping, or both to lead to enhanced use of a cryptic splice motif), AND the \u0026ldquo;strength of the prediction for the variant under assessment must be of similar or higher strength than the strength of the prediction for the comparison (likely) pathogenic variant\u0026rdquo; [7]. The demonstrated good performance of SAI-10k-calc to predict aberration events provides confidence that it is possible to assess if two different variants have matching predicted events. Further, the observation that SpliceAI score range categories can be used to infer level of variant-induced aberration provides justification that variants falling into the same SpliceAI score bin might be considered to have similar strength of prediction for both variant type and level of aberration. We acknowledge that the actual measurement of aberration level might differ between different assay methods, but this is irrelevant for application of PS1 which is based on comparing predicted splicing impact, and relies on the fact that the comparison pathogenic variant reaches this classification due to consideration of clinical data (as is the situation for applying PS1 in the context of missense variants). Likewise, between-gene differences in the level of aberrant splicing required to confer pathogenicity are accommodated by the fact that PS1 application is restricted to variants in the same splice region of the same gene.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec24\" class=\"Section2\"\u003e \u003ch2\u003eUsing construct-based splicing assay results in variant curation\u003c/h2\u003e \u003cp\u003eThe use of RNA splicing data from traditional minigene RT-PCR assays as evidence for classification is already being implemented by some ClinGen VCEPs e.g. Criteria Specifications for BRCA1/2 v1.2, CTLA4 v1.0, LDLR v1.2, PTEN v3.1, RS1 v1.0. Moreover, the InSiGHT Hereditary Colorectal Cancer/Polyposis and ENIGMA BRCA1 and BRCA2 VCEPs specifically recognise the value of minigene assays to provide quantitative information for variants located outside of the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotide positions. RT-PCR assays using multi-exon minigene constructs can measure aberrant and naturally occurring alternative splicing spanning multiple exons, and can also characterise complex splicing aberrations and multiple transcripts arising from one variant allele. When well-designed, these assays can precisely define the type and level of splicing aberration, including whether the effect is complete or partial, its impact on the reading frame, the affected functional domains, and the presence of rescue transcripts from alternative splicing. That is, traditional minigene splicing assay studies offer a considerable level of detail in the characterisation of splicing products. They also provide an avenue to provide gene-specific recommendations on the minimum level of aberrant splicing that is associated with disease presentation. Altogether, a well-validated assay can generate all information necessary to assign an ACMG/AMP evidence code according to a decision flowchart proposed by the ClinGen SVI Splicing Subgroup, which allows weights up to Very Strong to be assigned based on RNA evidence for variants both inside and outside the splice donor/acceptor\u0026thinsp;\u0026plusmn;\u0026thinsp;1,2 dinucleotides [7], based on alignment with a gene-specific or generic PVS1 decision tree [47]. Importantly, this splicing-focussed flowchart weights the evidence that a spliceogenic variant may be pathogenic, downweighting from the highest PVS1 (RNA) code if assay results indicate less severe functional consequences such as incomplete splicing impact, presence of rescue transcripts, or in-frame amino acid deletion outside clinically relevant domains. For RNA results showing multiple transcripts, the recommended approach is to assign a PVS1 strength to each transcript, group transcripts with the same strength, and then apply a conservative overall PVS1 level that reflects their relative contribution to total expression (Walker et al., 2023). With respect to application of construct-based evidence in variant curation and classification, current recommendations state that results for synonymous and intronic variants outside the splice region showing no impact relative to controls can be assigned a BP7_Strong (RNA) code [7]. However, there has been no ClinGen general recommendation on the maximum evidence weight towards pathogenicity applicable to construct-based data in the absence of calibration of an experimental system against clinical data for proven spliceogenic and non-spliceogenic variants. The decision tree for RNA analyses proposed by Buisine and colleagues (2025) [15] allowed a maximum weight of Very Strong for minigene RT-PCR data. The ClinGen ENIGMA BRCA1 and BRCA2 VCEP (Parsons et al., 2024) recommended that the overall evidence strength applicable to minigene RT-PCR data be downweighted and may not exceed PVS1_Strong (RNA) due to the artificial nature of minigene systems. Previous calibration analysis by O\u0026rsquo;Neill et al.[18] indicated that strong level of evidence was applicable for impact on splicing for \u003cem\u003eSCN1A\u003c/em\u003e variants, as measured by their MPSA dataset. The results from the clinical calibration conducted in this study, which additionally assessed the level of predicted LOF transcripts induced by the variant, provide further justification to support application of strong level of evidence towards or against pathogenicity (PVS1_Strong (RNA), BP7_Strong (RNA) following current code recommendations) for RNA results from well-designed multi-exon minigene experiments.\u003c/p\u003e \u003cp\u003eWe anticipate that the revisions to the ACMG/AMP guidelines, introduced in v4.0, will limit the maximum weight applicable for a variant that leads to transcript/s for which NMD is predicted to 6 points, but the overall consideration of how to assess if RNA results from constructs may be equivalent to those from patient material remain the same. Following current recommended processes, clinical calibration is not required to assign a PVS1 (RNA) code if the result is complete impact or no impact from assays using RNA from patient material, although there is high-level advice to consider adaptive weighting based on factors that may influence splicing assay results[7]. The value of clinical calibration is to set the aberration level threshold that is considered pathogenic (or benign) for variants that have partial impact on splicing, and have been classified without use of RNA data. Construct-based studies can play a significant role here in providing allele-specific quantitative information to set the aberration level threshold to assign evidence towards or against pathogenicity for different gene-disease entities.\u003c/p\u003e \u003cp\u003eFor MPSAs that do not return exact details on aberration type, assay-level calibration approaches are very important to derive an overall evidence weight for an individual study. As a first step, the LR can be calculated to assign an evidence strength towards or against spliceogenicity for each splicing impact category. However for a true clinical calibration, several factors have to be considered: spliceogenic is not necessarily pathogenic (so some spliceogenic variants may justifiably be annotated as VUS or B/LB in ClinVar); non-spliceogenic is not necessarily benign (e.g. missense variants may justifiably be annotated as VUS or P/LP in ClinVar); ClinVar P/LP controls may have included splicing data in their classification introducing circularity (especially important for retrospective calibration); ClinVar LB controls may not be ideal negative clinical controls since variants can reach this classification with no clinical data (e.g. variant type and/or position (BP7) and bioinformatic prediction (BP4)). Having performed a study-wide calibration, assay-level and variant-level prediction information should preferably be used to inform further downweighting or upweighting of individual variant results from construct-based assays. For example, if there is a possibility of a multi-exon alternative event (based on literature, SAI-10k-calc or SpliceVault), then a single-exon construct design would not be able to capture this type of aberration, and a \u0026ldquo;no-impact\u0026rdquo; result should be discounted. Nevertheless, the same assay could provide valuable information to confirm predicted single exon skipping events, or predicted absence of an RNA impact. To allow increased clinical application of MPSA data consisting solely of assay scores such as delta PSI, we recommend the use of variant location information and bioinformatic predictions to assess likelihood of complete splicing and also the aberration type and relationship to pathogenesis. For example, given gene-specific knowledge, an in-frame deletion within a functional critical domain may be reasonably assigned evidence towards pathogenicity based on RNA findings, whereas an in-frame deletion outside a clinically important functional domain should not be considered evidence towards pathogenicity.\u003c/p\u003e \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e \u003ch2\u003eGuidance for design and critique of construct-based assays for application in variant classification\u003c/h2\u003e \u003cp\u003eGood experimental design, data quality control, and clinical validation are essential for the application of minigene and multiplex splicing assay results in clinical variant classification. General recommendations exist for the use of multiplexed functional data for clinical variant classification, but these are not specific for RNA findings, and cover a broad range of assays primarily measuring variant effects on protein function, cell growth, or cell viability [48]. In contrast, a recent publication from a multidisciplinary French network includes detailed recommendations for the experimental design of minigene splicing assays for diagnostic implementation [15]. Based on these previous publications, our critique and consideration of the performance of the MPSAs reviewed and our selected minigene dataset, and our observations about reporting requirements to facilitate clinical application of RNA findings, we have compiled guidance for the application of minigene or multiplexed splicing assay results in clinical variant classification. This guidance highlights modifications necessary to account for construct design limitations, different quantification methods, and the large-scale nature of MPSAs. We provide a broad overview of the most important considerations below, and more detailed information in Additional file 1: Table S13.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eAssay design\u003c/strong\u003e \u003cp\u003eA clear understanding of the assay design, including knowledge of the types of splicing aberration that can (and cannot) be detected, is essential for accurately interpreting assay results. Sufficient intronic sequence flanking the exons is needed to ensure correct RNA splicing. Existing recommendations for traditional minigenes are that intronic sequence at both ends of the insert must be at least 150\u0026ndash;200 bp [14, 15, 49], and this rationale holds true for MPSAs; O\u0026rsquo;Neill and colleagues (2024) [18] have demonstrated that it is possible to generate reliable MPSA results using constructs containing a single test exon with 125\u0026ndash;250 bp of flanking intronic sequences in a multiplexed assay. When designing new assays, we strongly recommend identification and formal quantification of transcripts. Since minigene assays do not fully preserve the physiological and genomic contexts, it is critical to assess the quality of the assay before it can be used for clinical variant classification.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eExperimental controls\u003c/strong\u003e \u003cp\u003eUse of appropriate experimental controls is fundamental for establishing assay reliability, and comparison of minigene results with pre-existing splicing assay data is important to establish confidence in assay results. Both traditional RT-PCR or multiplexed minigene assays should include\u003c/p\u003e \u003c/p\u003e \u003cp\u003e1) WT construct control to determine baseline splicing; in the context of residual alternative splicing events, normalising the level of spliced products from variant constructs to the WT value will be necessary for accurate interpretation of results;\u003c/p\u003e \u003cp\u003e2) No-template control to assess whether sample contamination has occurred;\u003c/p\u003e \u003cp\u003e3) Positive control variant(s) known to be spliceogenic, to assess the ability to detect splicing aberration relative to WT;\u003c/p\u003e \u003cp\u003e4) Negative control variant(s) known to have no impact on splicing relative to WT (non-spliceogenic) to assess likelihood of false positives in a minigene context.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eNote\u003c/strong\u003e \u003cp\u003ewhile results using RNA from patient-derived material are generally set as the benchmark for detecting splicing aberration types since this is the clinical standard for validation, it should be recognised that every assay comes with its caveats, as demonstrated in this study.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eReporting\u003c/strong\u003e \u003cp\u003eWe provide the following guidance for reporting the results of minigene assays, in particular MPSAs, to enhance their usability by clinicians and variant curators in diagnostic laboratories\u003c/p\u003e \u003c/p\u003e \u003cp\u003e1) For variant reporting, genomic coordinates should be provided and defined according to a standard genome build [1], and the HGVS c. nomenclature must be reported based on the MANE Select transcript or the clinically relevant transcript [7].\u003c/p\u003e \u003cp\u003e2) If the method is capable of characterising the spliced transcripts, we recommend reporting the HGVS r. nomenclature based on the MANE Select transcript or the clinically relevant transcript. In addition, building on previous reports by the ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles) consortium, we advise use of the shorthand nomenclature adapted and extended from Colombo et al. (2014)[50] to help simplify transcript annotation and facilitate variant curation. This additional aberrant transcript nomenclature, described in detail in Additional file 1: Table S14, enables easy identification of affected exons or introns and the sizes of deleted or retained sequences. Since this nomenclature style does not indicate the exact change in the transcript nucleotide sequence, it must always be accompanied by the HGVS description. We also recommend that transcripts be annotated to indicate whether they result in LOF or not.\u003c/p\u003e \u003cp\u003e3) If the method can formally quantify transcripts, we recommend reporting the percentages of variant-induced full-length and aberrant transcripts. For MPSAs, thresholds enabling categorical annotations of splicing impact, such as \u0026ldquo;abnormal splicing\u0026rdquo;, \u0026ldquo;indeterminate\u0026rdquo;, or \u0026ldquo;normal splicing\u0026rdquo;, must be reported to facilitate the interpretation of assay scores.\u003c/p\u003e \u003cp\u003e4) Data and experimental details must be deposited on publicly accessible databases and repositories. Further information on disclosure of datasets and statistical methods for multiplexed assays is detailed in Gelman et al. (2019) [48].\u003c/p\u003e \u003cp\u003eWe anticipate this guidance may aid the design of new minigene splicing assays intended to have clinical application, and the quality assessment of data from existing minigene splicing assays.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"CONCLUSIONS","content":"\u003cp\u003eThere is no formal guidance on optimal assay design and validation of construct-based RNA results for application within the ACMG/AMP variant interpretation framework. We highlight that it is essential to evaluate the design limitations and performance of broad MPSA screens before considering their use in clinical variant classification, either directly or indirectly. We have shown that only one of seven MPSA datasets evaluated demonstrated performance suitable for clinical application of the RNA evidence in variant classification.\u003c/p\u003e \u003cp\u003eThese findings also raise concerns about use of MPSA datasets for benchmarking of new prediction tools. By extension, our findings have important implications for use of public collated splicing datasets in research or clinical studies. For example, the SpliceVarDB repository of splicing assay data was created to improve access to variant-specific splicing information for curators and researchers [51], and includes splicing assay data from multiple MPSA studies. Our findings suggest the value for quality control assessment of any studies included in such a repository.\u003c/p\u003e \u003cp\u003eWe have also shown that evaluation and analysis of well-designed multi-exon minigene assays reliably recapitulate patient splicing outcomes, and when clinically calibrated, can provide strong evidence towards or against pathogenicity using the ACMG/AMP variant classification framework. Analysis of this quantitative construct-based data highlighted potential for a more nuanced interpretation of splicing impact prediction, by demonstrating that SpliceAI score bins predict the level of splicing impact, and SAI-10k-calc accurately predicts variant-induced aberration types and their functional consequences.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cdiv class=\"DefinitionList\"\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eACMG\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eAmerican College of Medical Genetics and Genomics\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eAMP\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eAssociation for Molecular Pathology\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eB/LB\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eBenign / Likely Benign\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eCE\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eCapillary Electrophoresis\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eClinVar\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eClinical Variant database\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eESE\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eExonic Splicing Enhancer\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003egnomAD\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eGenome Aggregation Database\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eLOF\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eLoss of Function\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eLR\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eLikelihood Ratio\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eMANE\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMatched Annotation from NCBI and EMBL\u0026mdash;EBI\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eMaPSy\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMassively Parallel Splicing (Yeast\u0026mdash;based) Assay\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eMFASS\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMultiplexed Functional Assay of Splicing using Sort\u0026mdash;seq\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eMPSA\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eMassively Parallel Splicing Assay\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eNMD\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eNonsense\u0026mdash;Mediated Decay\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eP/LP\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003ePathogenic / Likely Pathogenic\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003ePSI\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003ePercent Spliced In\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003ePTC\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003ePremature Termination Codon\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eRNA\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003e \u003cb\u003eseq\u003c/b\u003e\u0026mdash;RNA sequencing\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eROC\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eReceiver Operating Characteristic\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eRT\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003e \u003cb\u003ePCR\u003c/b\u003e\u0026mdash;Reverse Transcription Polymerase Chain Reaction\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003esiRNA\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eSmall Interfering RNA\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eSVI\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eSequence Variant Interpretation (ClinGen subgroup)\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eVCEP\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eVariant Curation Expert Panel\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eVEP\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eVariant Effect Predictor\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eVex\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003e \u003cb\u003eseq\u003c/b\u003e\u0026mdash;Variant Exon Sequencing\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eVUS\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eVariant of Uncertain Significance\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv class=\"DefinitionListEntry\"\u003e \u003cdiv class=\"Term\"\u003e\u003cb\u003eWT\u003c/b\u003e\u003c/div\u003e \u003cdiv class=\"Description\"\u003e \u003cp\u003eWild Type\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work exclusively used publicly available data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eABS was supported in part by an NHMRC Investigator Fellowship (APP177524). The work of D.C was supported in part by funding to QIMR Berghofer from an anonymous donor, and support from the Estate of Pamela G. Webb, in honour of William Alexander (Alec) McKay. GARW and LCW were supported by funding from the Health Research Council of New Zealand (22/187). EAV-S is supported by a grant from the Spanish Ministry of Science and Innovation, Plan Nacional de I+D+I 2023, ISCIII (ref. PI23/00047), co-funded by FEDER from Regional Development European Funds (European Union).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eD.M.C, G.A.R.W., L.C.W. and A.B.S. conceived the study design. \u0026nbsp; E.A.V-S. provided a the traditional minigene dataset. Data analysis was performed by all authors. All authors read, contributed to and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank Michael Parsons for advice relating to \u003cem\u003eBRCA1\u0026nbsp;\u003c/em\u003eand \u003cem\u003eBRCA2\u0026nbsp;\u003c/em\u003evariant classification following VCEP specifications.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eRichards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405 − 24.\u003c/li\u003e\n\u003cli\u003eFresard L, Smail C, Ferraro NM, Teran NA, Li X, Smith KS, et al. Identification of rare-disease genes using blood transcriptome sequencing and large control cohorts. Nat Med. 2019;25(6):911-9.\u003c/li\u003e\n\u003cli\u003eKaram R, Conner B, LaDuca H, McGoldrick K, Krempely K, Richardson ME, et al. Assessment of Diagnostic Outcomes of RNA Genetic Testing for Hereditary Cancer. JAMA Netw Open. 2019;2(10):e1913900.\u003c/li\u003e\n\u003cli\u003eYamada M, Suzuki H, Shiraishi Y, Kosaki K. Effectiveness of integrated interpretation of exome and corresponding transcriptome data for detecting splicing variants of genes associated with autosomal recessive disorders. Mol Genet Metab Rep. 2019;21:100531.\u003c/li\u003e\n\u003cli\u003eYepez VA, Gusic M, Kopajtich R, Mertes C, Smith NH, Alston CL, et al. Clinical implementation of RNA sequencing for Mendelian disease diagnostics. Genome Med. 2022;14(1):38.\u003c/li\u003e\n\u003cli\u003eJaramillo Oquendo C, Wai HA, Rich WI, Bunyan DJ, Thomas NS, Hunt D, et al. Identification of diagnostic candidates in Mendelian disorders using an RNA sequencing-centric approach. Genome Med. 2024;16(1):110.\u003c/li\u003e\n\u003cli\u003eWalker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, et al. Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup. Am J Hum Genet. 2023;110(7):1046-67.\u003c/li\u003e\n\u003cli\u003eSpurdle AB, Greville-Heygate S, Antoniou AC, Brown M, Burke L, de la Hoya M, et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J Med Genet. 2019;56(6):347 − 57.\u003c/li\u003e\n\u003cli\u003eWai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med. 2020;22(6):1005-14.\u003c/li\u003e\n\u003cli\u003eLord J, Baralle D. Splicing in the Diagnosis of Rare Disease: Advances and Challenges. Front Genet. 2021;12:689892.\u003c/li\u003e\n\u003cli\u003eWai H, Douglas AGL, Baralle D. RNA splicing analysis in genomic medicine. Int J Biochem Cell Biol. 2019;108:61–71.\u003c/li\u003e\n\u003cli\u003eWalker LC, Whiley PJ, Houdayer C, Hansen TV, Vega A, Santamarina M, et al. Evaluation of a 5-tier scheme proposed for classification of sequence variants using bioinformatic and splicing assay data: inter-reviewer variability and promotion of minimum reporting guidelines. Hum Mutat. 2013;34(10):1424-31.\u003c/li\u003e\n\u003cli\u003eRhine CL, Neil C, Glidden DT, Cygan KJ, Fredericks AM, Wang J, et al. Future directions for high-throughput splicing assays in precision medicine. Hum Mutat. 2019;40(9):1225-34.\u003c/li\u003e\n\u003cli\u003eGaildrat P, Killian A, Martins A, Tournier I, Frebourg T, Tosi M. Use of splicing reporter minigene assay to evaluate the effect on splicing of unclassified genetic variants. Methods Mol Biol. 2010;653:249 − 57.\u003c/li\u003e\n\u003cli\u003eBuisine MP, Bellanne-Chantelot C, Calmels N, Vaché C, Besnard T, Cogne B, et al. RNA-based diagnostic studies in genetics: Review and guidance from a multidisciplinary French network. Eur J Hum Genet. 2025;33(10):1219-27.\u003c/li\u003e\n\u003cli\u003eAdamson SI, Zhan L, Graveley BR. Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency. Genome Biol. 2018;19(1):71.\u003c/li\u003e\n\u003cli\u003eChong R, Insigne KD, Yao D, Burghard CP, Wang J, Hsiao YE, et al. A Multiplexed Assay for Exon Recognition Reveals that an Unappreciated Fraction of Rare Genetic Variants Cause Large-Effect Splicing Disruptions. Mol Cell. 2019;73(1):183 − 94 e8.\u003c/li\u003e\n\u003cli\u003eO'Neill MJ, Yang T, Laudeman J, Calandranis ME, Harvey ML, Solus JF, et al. ParSE-seq: a calibrated multiplexed assay to facilitate the clinical classification of putative splice-altering variants. Nat Commun. 2024;15(1):8320.\u003c/li\u003e\n\u003cli\u003ePatel PN, Ito K, Willcox JAL, Haghighi A, Jang MY, Gorham JM, et al. Contribution of Noncanonical Splice Variants to TTN Truncating Variant Cardiomyopathy. Circ Genom Precis Med. 2021;14(5):e003389.\u003c/li\u003e\n\u003cli\u003eRhine CL, Neil C, Wang J, Maguire S, Buerer L, Salomon M, et al. Massively parallel reporter assays discover de novo exonic splicing mutants in paralogs of Autism genes. PLoS Genet. 2022;18(1):e1009884.\u003c/li\u003e\n\u003cli\u003eRong S, Neil CR, Welch A, Duan C, Maguire S, Meremikwu IC, et al. Large-scale functional screen identifies genetic variants with splicing effects in modern and archaic humans. Proc Natl Acad Sci U S A. 2023;120(21):e2218308120.\u003c/li\u003e\n\u003cli\u003eSoemedi R, Cygan KJ, Rhine CL, Wang J, Bulacan C, Yang J, et al. Pathogenic variants that alter protein code often disrupt splicing. Nat Genet. 2017;49(6):848 − 55.\u003c/li\u003e\n\u003cli\u003eJaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, et al. Predicting Splicing from Primary Sequence with Deep Learning. Cell. 2019;176(3):535 − 48 e24.\u003c/li\u003e\n\u003cli\u003eRobin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77.\u003c/li\u003e\n\u003cli\u003eCanson DM, Davidson AL, de la Hoya M, Parsons MT, Glubb DM, Kondrashova O, et al. SpliceAI-10k calculator for the prediction of pseudoexonization, intron retention, and exon deletion. Bioinformatics. 2023;39(4).\u003c/li\u003e\n\u003cli\u003eCanson DM, Parsons MT, Moir-Meyer G, Dumenil T, Montalban G, Lin E, et al. The SeqSplice multiplexed minigene splicing assay for characterization and quantitation of variant-induced BRCA1 and BRCA2 splice isoforms. Genome Res. 2025;35(9):2104-15.\u003c/li\u003e\n\u003cli\u003ede Sainte Agathe JM, Filser M, Isidor B, Besnard T, Gueguen P, Perrin A, et al. SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation. Hum Genomics. 2023;17(1):7.\u003c/li\u003e\n\u003cli\u003eDawes R, Bournazos AM, Bryen SJ, Bommireddipalli S, Marchant RG, Joshi H, et al. SpliceVault predicts the precise nature of variant-associated mis-splicing. Nat Genet. 2023;55(2):324 − 32.\u003c/li\u003e\n\u003cli\u003eMcLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17(1):122.\u003c/li\u003e\n\u003cli\u003eO'Mahony DG, Ramus SJ, Southey MC, Meagher NS, Hadjisavvas A, John EM, et al. Ovarian cancer pathology characteristics as predictors of variant pathogenicity in BRCA1 and BRCA2. Br J Cancer. 2023;128(12):2283-94.\u003c/li\u003e\n\u003cli\u003eTavtigian SV, Greenblatt MS, Harrison SM, Nussbaum RL, Prabhu SA, Boucher KM, et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med. 2018;20(9):1054-60.\u003c/li\u003e\n\u003cli\u003eParsons MT, de la Hoya M, Richardson ME, Tudini E, Anderson M, Berkofsky-Fessler W, et al. Evidence-based recommendations for gene-specific ACMG/AMP variant classification from the ClinGen ENIGMA BRCA1 and BRCA2 Variant Curation Expert Panel. Am J Hum Genet. 2024;111(9):2044-58.\u003c/li\u003e\n\u003cli\u003eRichardson ME, Bishop MFH, Holdren MA, de la Hoya M, Spurdle AB, Tavtigian SV, et al. Specifications of the ACMG/AMP variant curation guidelines for the analysis of germline PALB2 sequence variants. Am J Hum Genet. 2025;112(10):2266-80.\u003c/li\u003e\n\u003cli\u003eIto K, Patel PN, Gorham JM, McDonough B, DePalma SR, Adler EE, et al. Identification of pathogenic gene mutations in LMNA and MYBPC3 that alter RNA splicing. Proc Natl Acad Sci U S A. 2017;114(29):7689-94.\u003c/li\u003e\n\u003cli\u003eSheth N, Roca X, Hastings ML, Roeder T, Krainer AR, Sachidanandam R. Comprehensive splice-site analysis using comparative genomics. Nucleic Acids Res. 2006;34(14):3955-67.\u003c/li\u003e\n\u003cli\u003eMoles-Fernandez A, Domenech-Vivo J, Tenes A, Balmana J, Diez O, Gutierrez-Enriquez S. Role of Splicing Regulatory Elements and In Silico Tools Usage in the Identification of Deep Intronic Splicing Variants in Hereditary Breast/Ovarian Cancer Genes. Cancers (Basel). 2021;13(13).\u003c/li\u003e\n\u003cli\u003ede la Hoya M, Soukarieh O, López-Perolio I, Vega A, Walker LC, van Ierland Y, et al. Combined genetic and splicing analysis of BRCA1 c.[594-2A \u0026gt; C; 641A \u0026gt; G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. Hum Mol Genet. 2016;25(11):2256-68.\u003c/li\u003e\n\u003cli\u003eThompson BA, Martins A, Spurdle AB. A review of mismatch repair gene transcripts: issues for interpretation of mRNA splicing assays. Clin Genet. 2015;87(2):100-8.\u003c/li\u003e\n\u003cli\u003eFortuno C, Llinares-Burguet I, Canson DM, de la Hoya M, Bueno-Martínez E, Sanoguera-Miralles L, et al. Exploring the role of splicing in TP53 variant pathogenicity through predictions and minigene assays. Hum Genomics. 2025;19(1):2.\u003c/li\u003e\n\u003cli\u003eMinnerop M, Kurzwelly D, Wagner H, Soehn AS, Reichbauer J, Tao F, et al. Hypomorphic mutations in POLR3A are a frequent cause of sporadic and recessive spastic ataxia. Brain. 2017;140(6):1561-78.\u003c/li\u003e\n\u003cli\u003eFraile-Bethencourt E, Valenzuela-Palomo A, Díez-Gómez B, Goina E, Acedo A, Buratti E, et al. Mis-splicing in breast cancer: identification of pathogenic BRCA2 variants by systematic minigene assays. J Pathol. 2019;248(4):409 − 20.\u003c/li\u003e\n\u003cli\u003eCAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods. Genome Biol. 2024;25(1):53.\u003c/li\u003e\n\u003cli\u003eBrnich SE, Rivera-Munoz EA, Berg JS. Quantifying the potential of functional evidence to reclassify variants of uncertain significance in the categorical and Bayesian interpretation frameworks. Hum Mutat. 2018;39(11):1531-41.\u003c/li\u003e\n\u003cli\u003eAvsec Ž, Latysheva N, Cheng J, Novati G, Taylor KR, Ward T, et al. Advancing regulatory variant effect prediction with AlphaGenome. Nature. 2026;649(8099):1206-18.\u003c/li\u003e\n\u003cli\u003eAcedo A, Sanz DJ, Durán M, Infante M, Pérez-Cabornero L, Miner C, et al. Comprehensive splicing functional analysis of DNA variants of the BRCA2 gene by hybrid minigenes. Breast Cancer Res. 2012;14(3):R87.\u003c/li\u003e\n\u003cli\u003eAcedo A, Hernández-Moro C, Curiel-García Á, Díez-Gómez B, Velasco EA. Functional classification of BRCA2 DNA variants by splicing assays in a large minigene with 9 exons. Hum Mutat. 2015;36(2):210 − 21.\u003c/li\u003e\n\u003cli\u003eAbou Tayoun AN, Pesaran T, DiStefano MT, Oza A, Rehm HL, Biesecker LG, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat. 2018;39(11):1517-24.\u003c/li\u003e\n\u003cli\u003eGelman H, Dines JN, Berg J, Berger AH, Brnich S, Hisama FM, et al. Recommendations for the collection and use of multiplexed functional data for clinical variant interpretation. Genome Med. 2019;11(1):85.\u003c/li\u003e\n\u003cli\u003eRiedmayr LM, Böhm S, Michalakis S, Becirovic E. Construction and Cloning of Minigenes for in vivo Analysis of Potential Splice Mutations. Bio Protoc. 2018;8(5):e2760.\u003c/li\u003e\n\u003cli\u003eColombo M, Blok MJ, Whiley P, Santamariña M, Gutiérrez-Enríquez S, Romero A, et al. Comprehensive annotation of splice junctions supports pervasive alternative splicing at the BRCA1 locus: a report from the ENIGMA consortium. Hum Mol Genet. 2014;23(14):3666-80.\u003c/li\u003e\n\u003cli\u003eSullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. Am J Hum Genet. 2024;111(10):2164-75.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":true,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"genome-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Genome Medicine](https://genomemedicine.biomedcentral.com/)","snPcode":"13073","submissionUrl":"https://submission.springernature.com/new-submission/13073/3","title":"Genome Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"RNA splicing, Variant interpretation, Construct-based assays, Massively parallel splicing assays (MPSA), Minigenes, SpliceAI, SAI‑10k‑calc, Clinical calibration, ACMG/AMP guidelines, Loss‑of‑function (LOF)","lastPublishedDoi":"10.21203/rs.3.rs-9081705/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9081705/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground:\u003c/h2\u003e \u003cp\u003eMinigene RT-PCR assays are widely used to assess variant impact on splicing, with increasing reports of massively parallel splicing assays (MPSAs) demonstrating potential to upscale diagnostic use of construct-based data. This study conducted a comprehensive evaluation of \u0026gt;\u0026thinsp;41,000 variants from construct-based splicing assays, to build evidence-based recommendations to support the consistent application of such assays in clinical variant interpretation.\u003c/p\u003e\u003ch2\u003eMethods:\u003c/h2\u003e \u003cp\u003eSeven MPSAs were reviewed for design limitations, and their discriminatory performance evaluated by assessing: assay score distribution; consistency of splice-impact thresholds with expectations based on SpliceAI predictions. A traditional minigene RT-PCR dataset comprising 673 variants from 14 studies was analysed to: assess potential for SpliceAI score to predict level of aberration; demonstrate performance of SAI-10k-calc to accurately predict variant-induced splicing events; calibrate evidence strength towards or against spliceogenicity based on SpliceAI score. Traditional minigene results were compared to patient-derived RNA results, and calibrated for evidence strength towards or against pathogenicity using assertions from ClinVar.\u003c/p\u003e\u003ch2\u003eResults:\u003c/h2\u003e \u003cp\u003eMPSA datasets lacked specific information on variant-induced transcripts and had design limitations preventing detection of some aberrant splice events. Assay scores generated by five of seven MPSAs were unable to differentiate aberrant from natural splicing events. Traditional minigene results showed high predictive agreement between predicted and observed variant-induced events: SpliceAI score of 0.285 showed 90% sensitivity and 90% specificity for separating no/low versus intermediate-to-complete aberration, and agreement was 87% for aberration type using SAI-10k-calc. Minigene and patient-derived RNA results showed high agreement (45% complete concordance, 44% high concordance for predominant transcripts). Clinical calibration of minigene results showed that high (\u0026ge;\u0026thinsp;80%) or low (\u0026le;\u0026thinsp;20%) expression of variant-induced LOF transcripts provide strong evidence towards or against pathogenicity, respectively. Evidence-based recommendations for design and critique of construct-based assays were built based on these and other findings.\u003c/p\u003e\u003ch2\u003eConclusions:\u003c/h2\u003e \u003cp\u003eMPSA screens require review of design limitations and performance evaluation before considering their suitability for clinical variant interpretation. Well-designed multi-exon minigene assays can provide quantitative RNA results to supplement patient-derived RNA findings, and clinical calibration justifies their use in the diagnostic setting. Altogether, these findings support more consistent application of RNA evidence in clinical practice.\u003c/p\u003e","manuscriptTitle":"Evidence-based recommendations for application of construct-based splicing data in clinical variant classification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-14 15:03:27","doi":"10.21203/rs.3.rs-9081705/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-04-16T14:07:45+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-14T21:42:46+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-03T19:52:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"108933873881233943166449389454127109419","date":"2026-04-02T11:45:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"99270066827729678765847678720270769108","date":"2026-03-25T18:15:32+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-25T13:47:47+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-20T15:04:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-10T14:46:12+00:00","index":"","fulltext":""},{"type":"submitted","content":"Genome Medicine","date":"2026-03-10T08:51:38+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"genome-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Genome Medicine](https://genomemedicine.biomedcentral.com/)","snPcode":"13073","submissionUrl":"https://submission.springernature.com/new-submission/13073/3","title":"Genome Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"4f0fd349-b3db-455f-a4ac-e3bc4de04440","owner":[],"postedDate":"April 14th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-05T09:38:29+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-14 15:03:27","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9081705","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9081705","identity":"rs-9081705","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.