False discovery rate control: Moving beyond the Benjamini–Hochberg method | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article False discovery rate control: Moving beyond the Benjamini–Hochberg method Salil Koner, Navonil Sarkar, Nilanjana Laha This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3861673/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Modern bioinformatics studies often involve numerous simultaneous statistical tests, increasing the risk of false discoveries. To control the false discovery rate (FDR), these studies typically employ a statistical method called the Benjamini–Hochberg (BH) method. Often, the BH approach tends to be overly conservative and overlooks valuable biological insights associated with data structures, particularly those of groups. Group structures can manifest when closely located genomic coordinates are functionally active and closely related because of co-regulation. Recent statistical advancements have led to the development of updated BH methods tailored for datasets featuring pre-existing group structures. These methods can improve the statistical power and potentially enhance scientific discoveries. In this study, we elucidated the advantages of contemporary group-aware BH methods using a previously published microRNA (miRNA) dataset. For this dataset, group-aware BH methods identified a larger set of miRNAs with significantly deregulated expression (p-value < 0.05) than the traditional BH method. These new findings are supported by existing literature on miRNAs and a related 2017 study. Our results underscore the potential of specialized BH methods for controlling the FDR in high throughput omics studies with pre-defined group structures. Physical sciences/Mathematics and computing/Computational science Physical sciences/Mathematics and computing/Scientific data Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Statistical methods Figures Figure 1 Figure 2 1 INTRODUCTION High throughput and massively parallel genome-scale molecular profiling technologies transformed the field of genomic and precision medicine by making top-down molecular mapping approaches feasible. For example, RNA-seq or microarray studies routinely analyze several thousands of genes and gene regulatory elements to identify mRNA expression in both normal and aberrant tissues under disease conditions [ 1 , 2 ]. In genome-wide association studies (GWASs), numerous genetic variants are simultaneously examined at the population level to explore their potential associations with specific traits [ 3 ]. Statistical analysis of such data often requires simultaneous evaluation of many hypotheses, each corresponding to one biological element, such as one gene expression in a microarray or RNA-seq study or a single nucleotide polymorphism in a GWAS. The rejection of a hypothesis is generally associated with biologically relevant elements, indicating statistical “discovery.” However, the simultaneous testing of multiple hypotheses using statistical tests leads to an increased likelihood of falsely rejecting a hypothesis, resulting in a high number of false discoveries, necessitating additional statistical methods for their control [ 4 ]. Although several strategies have been proposed for addressing the false discovery rate in multiple hypothesis-testing problems, the false discovery rate (FDR) control method introduced by Benjamini and Hochberg [ 5 ] has become the most widely used method in high throughput bioinformatics analysis [ 6 ]. The FDR is the expected proportion of false rejections among all rejected hypotheses. In other words, if the expressions of 100 genes are found to be significantly deregulated following the statistical analysis of a microarray study, a 5% FDR indicates that 5/100 identified genes are false discoveries and that their expression is not really deregulated. In the past decades, various statistical methods have been proposed for controlling the FDR. However, the Benjamini and Hochberg (BH) procedure remains the most prominent tool for this purpose in the realms of bioinformatics [ 5 ]. The BH method, initially described for independent hypotheses, preserves the FDR provided the hypotheses are weakly dependent [ 7 ]. The latter assumption generally holds for most biological applications [ 7 ]. The BH method relies solely on p-values of multiple hypothesis tests. It provides a p-value cutoff, and only tests when lower p-values are rejected. Although the BH method reliably controls the FDR, it is known to be conservative when the number of hypotheses to be tested is much larger than sample size, which is common in many modern biological studies, including high throughput studies [ 8 , 9 ]. The high dimensionality of the data increases the noise within the data, making it harder to distill a weaker signal from the noise. The conservativeness of the BH method can therefore potentially hinder scientific discoveries based on modern data. The conservativeness of the BH method stems from its inability to utilize the available biological information in the hypotheses [ 6 ]. It treats all hypotheses as exchangeable, ignoring any extra information on the dependence between the hypotheses [ 10 ]. However, in some domains, such as bioinformatics, extra information for distinguishing the hypotheses may be available. This study focuses on hypotheses with pre-existing group structures. These groups often arise naturally in biological data. For example, closely located genes are often correlated and may be co-expressed because of their shared transcription programs and regulatory switches, thus forming positional clusters [ 11 ]. Several studies have repeatedly shown that grouped FDR control methods that exploit the information on groups exhibit higher statistical power than the BH method with such data [ 6 , 8 , 9 ]. A higher statistical power indicates the ability to detect lower signals, thus potentially increasing scientific discoveries without compromising the FDR. The grouped BH method achieves this by recalibrating the p-values to place more importance on potentially critical groups. However, such group-aware FDR methods are rarely adopted in high throughput data analysis by bioinformaticians. Briefly, a group's criticality is determined by the proportion of true null hypotheses within it; smaller proportions indicating greater criticality. Grouped BH methods prioritize rejections from the potential critical groups, estimating groupwise null proportions via statistical methods, such as the two-stage method (TST), least-slope method (LSL), and likelihood-based methods. This results in the grouped BH methods of TST-GBH, LSL-GBH, and structure adaptive BH algorithm (SABHA), respectively [ 8 , 9 ]. The TST-GBH and LSL-GBH methods were previously applied by Zhang et al. [ 12 ] to discover differentially methylated regions in the human genome and by Liu et al. [ 13 ] to discover differentially expressed unigenes during the blooming process of Asteraceae flowers. SABHA is a more general method that works for a range of structured data and is not limited to grouped hypotheses. When the p-values are independent or weakly correlated, the grouped BH methods reportedly control the FDR at the desired level if group-wise null proportions are reasonably well-estimated [ 8 , 9 ]. Apart from the grouped BH methods, an emerging statistical technique called knock-off has also shown promise for FDR control and is flexible enough to incorporate group structures [ 3 ]. Notably, the abovementioned methods do not use groups as the unit of rejection. Instead, rejections are conducted at the level of the individual hypotheses, where information on the groups is used to improve statistical power. However, FDR control methods that reject hypotheses at the group level are also available [ 14 , 15 ]. This study demonstrates the advantage of grouped BH methods by utilizing the microRNA (miRNA) data from a study by De Sarkar et al. [ 16 ] on 18 cancer patients with gingival buccal squamous cell carcinoma (GBSCC). In the study, the authors determined the expression of several miRNAs to investigate their role in this type of oral carcinogenesis. They compared the miRNA expression in histopathologically confirmed malignant and healthy normal (control) tissues using one-sided paired t-tests. As there were 522 miRNA assays in the reverse transcription polymerase chain reaction-based miRNAome panel, 522 t-tests were performed. The BH method was used to control the FDR at 5%, and seven miRNAs whose expression were significantly deregulated in the malignant tissues were identified. However, some critical miRNAs for oral squamous cell carcinoma (OSCC), including hsa-miR-21-5p and hsa-miR-99a-3p, were either borderline or below the BH threshold. Some of the target genes of these miRNAs were significantly deregulated by Singh et al. [ 17 ] in a follow-up whole transcriptome analysis study, which analyzed a case series substantially overlapping with the current study case series. This observation raised the question of whether the BH method has enough statistical power to detect miRNAs with deregulated expression in this De Sarkar et al. [ 16 ] case series. Considering our limited sample size (n = 18), the number of miRNAs analyzed (n = 522), and their genome wide distribution pattern, we opted for a relatively straightforward grouping approach in this proof-of-concept analysis, focusing on the physical location of miRNAs on the chromosome. The broad assumption was proximal miRNAs tend to be co-expressed [ 11 ]. Therefore, we divided the chromosomes into positional groups based on their shared chromosome number, arm, and strand. We incorporated this group information into the statistical analysis of our data via the grouped BH methods LSL-GBH, TST-GBH, and SABHA, and compared their performance to that of the BH method. Existing literature and whole transcriptome analysis by Singh et al. [ 17 ] were used to support the findings. 2 RESULTS The study participants were 18 unrelated Indians aged 39–80 years with tobacco habits, with a male:female ratio of 5:4. All patients had histopathologically confirmed GBSCC, which is a type of OSCC prevalent in the tobacco-chewing population in South Asia. The dataset comprised ∆Ct values for 522 miRNAs derived from 18 tumor-normal sample pairs of the 18 patients. ∆Ct values serve as surrogate estimates for relative miRNA expression levels (relative to geometric mean expression of three endogenous control genes). The data collection method was previously described [ 16 ]. 2.1 Missingness: Some miRNAs were not expressed in all 18 patients in the data from De Sarkar et al. [ 16 ]. For each miRNA, if one data point in the paired miRNA expression data fell below the detection threshold, De Sarkar et al. [ 16 ] excluded both data points to reduce complexity. Consequently, some miRNA pairs were missing in the miRNA expression dataset, with 59.77% of miRNAs having at least one pair of missing observations. Moreover, 30% of miRNAs had more than two pairs of missing observations. Each patient had 12 or more missing miRNA expression pairs, with the majority having fewer than 50 missing pairs. For further details on missingness, refer to Supplementary Section S1. The missingness in the miRNA expressions could have stemmed from low detection threshold which led to the inability to detect low miRNA expression. The parallel miRNA expression assay was conducted using TLDA-A (V2) and TLDA-B (V3) cards on the 7900HT FAST Real Time PCR system (Applied Biosystems, USA) with the TLDA flat block, which is known to have a fixed lower detection limit [ 16 ]. Although this could be one of the reasons for data missingness, various other factors could have also contributed to the data-missingness. Nevertheless, it is evident that the missingness of the miRNAome data was not at random (MNAR) [ 18 ]. Although De Sarkar et al. [ 16 ] imputed the missing observations using a simple median imputation strategy in their statistical analysis, we observed a sharp increase in the number of discoveries regardless of the FDR control method when using median or mean imputation. This is unsurprising because it is well-known that imputation with the median or mean can artificially decrease the sample variance, especially when the sample size is small, leading to false discoveries [ 19 , 20 ]. Given our limited sample size and the complexity of measuring all biological factors behind miRNA expression, we chose not to impute missing values. Instead, we conducted our statistical analysis using only complete cases. Hughes et al. [ 18 ] showed that complete case analysis can be statistically valid in MNAR data. 2.2 Grouping: As mentioned previously, we grouped the miRNAs based on their shared chromosome number, arm, and strand, resulting in 67 groups after excluding groups with no membership among our 522 miRNAs. Supplementary Table S1 shows the number of miRNAs in each group. Some groups were thinly populated with only two to five members, but merging them with the groups on the other arm on the same strand yielded only a nominal difference—only hsa-miR-486-5p ceased to be statistically significant after merging. Most groups were of small to moderate size, but three groups had more than 30 miRNAs each, located on chromosome 14 (positive strand), chromosome 19 (positive strand), and the X chromosome (negative strand); see Supplementary Table S2. Splitting these large groups into adjacent equal halves does not alter the outcome of subsequent statistical analyses. In consideration of these factors, we implemented the grouped BH methods using the original 67 groups. Groupings are known to be statistically meaningful only when the intra-group association (association between test units within the same group) is stronger than the inter-group association (association between test units in different groups) [ 21 ]. Hu et al. [ 18 ] observed that the group BH methods demonstrated higher statistical power than the BH method in such scenarios. We heuristically compared the intra-group and inter-group associations between the ∆∆Ct values utilizing a widely used statistical approach—a simple random effect model [ 22 ]. This model facilitates the quantification and estimation of inter-group and intra-group correlations, which measures the association within and across the groups, respectively [ 23 , 24 ]. The intra-group correlation corresponds to the correlation between ∆∆Ct values of miRNAs within the same group, where the inter-group correlation is the correlation between ∆∆Ct values of miRNAs belonging to two different groups. The simple random effect model assumes the inter-group and intra-group correlations to be constant across all groups. Even if these assumptions are violated, the estimated correlations provide meaningful assessments of the overall strength of the inter-group and intra-group association. Details on fitting the simple random effect model can be found in Section 4.3 . The estimated inter-group correlation was 0.04, indicating a weak inter-group association. In contrast, the intra-group correlation was estimated to be 0.14, noticeably larger than the inter-group correlation. Thus, the simple random effect model suggested a weaker inter-group association compared to intra-group association. However, visual examination of pairwise correlations between ∆∆Ct values revealed that miRNAs within the same group tended to exhibit higher correlations compared to those across different groups, supporting the findings of the heuristic analysis. Figure 1 illustrates two cases where overall within-group correlations are higher compared to correlations between miRNA pairs belonging to different groups. In these instances, we also observed that the ∆∆Ct values of adjacent miRNAS tended to cluster, aligning with the assertion that adjacent miRNAs are co-expressed. 2.3 Significantly deregulated miRNAs Table 1 lists the deregulated miRNAs identified by TST-GBH, LSL-GBH, SABHA, and BH methods. Figure 2 shows a heatmap of the ∆∆Ct values of these miRNAs. As reported by Sarkar and Zhao [ 20 ], the TST-GBH method was the most liberal of the four methods, leading to the highest number of discoveries. Table 1 includes the missing data frequencies for the identified genes. Notably, 50% of the data for hsa-miR-1293 was missing, rendering the conclusions drawn regarding this miRNA questionable. The LSL-GBH and SABHA methods agreed with each other, with five rejections each. The BH method had the lowest number of discoveries, with only three rejections. Notably, De Sarkar et al. [ 16 ] also used the BH method but reported seven discoveries because they used median-imputed data. As mentioned previously, median imputation inflates the number of discoveries for all methods. Supplementary Table S1 lists the miRNAs detected by the four FDR control methods using the median-imputed data. The grouped BH method identified seven additional miRNAs alongside the seven miRNAs previously identified by De Sarkar et al. (Table 1 ) [ 16 ]. Among the newly discovered miRNAs, the expression of hsa-miR-1247-5p, hsa-miR-1, hsa-miR-99a-3p, and hsa-miR-486 3p/5p was downregulated, whereas that of hsa-miR-21-5p and hsa-miR-147b was upregulated in GBSCC cancer tissues. Although De Sarkar et al. [ 16 ] did not identify these miRNAs, the p-value of hsa-miR-1 was just below the BH-corrected level of Table 1 microRNAs (miRNAs) with significantly deregulated expression. The miRNAs listed here were detected using at least one of the four methods tested: BH, TST-GBH (abbreviated as TST), LSL-GBH (abbreviated as LSL), and SABHA. The "Chr." column indicates the chromosome number; the "strand" column indicates the transcribing strand; and the “arm” column indicates the arm of the miRNA gene. The ∆∆Ct values are used as surrogate estimates of differential expression for each miRNA. As TST-GBH, LSL-GBH, and SABHA do not yield FDR-corrected p-values, we have reported the raw p-values before FDR correction. The arrow signs after the p-values indicate whether the expression of the miRNAs was upregulated (↑) or downregulated (↓) (per the sign of ΔΔCt). The "Missing pairs" column shows the number of missing observations for each miRNA. The "Significant tests" column indicates the methods that detected a miRNA as one with significantly deregulated expression. The expressions of miRNAs superscripted with an asterisk were reported to be deregulated by De Sarkar et al. [ 16 ]. BH, Benjamini–Hochberg; SABHA, structure adaptive Benjamini–Hochberg algorithm; TST-GBH, two-stage step-up–group Benjamini–Hochberg; LSL-GBH, least-squares group Benjamini–Hochberg. miRNA Chr. Strand Arm ΔΔCt p-value Missing pairs BH TST LSL SABHA hsa-miR-133a-3p* chr18 - q 6.7 5.32E − 05 ↓ 0 x x x x hsa-miR-206* chr6 + p 6.0 1.97E − 04 ↓ 0 x x x hsa-miR-31-3p* chr9 - p -3.8 1.31E − 04 ↑ 2 x x x x hsa-miR-1293* chr12 - q -4.8 2.77E − 03 ↑ 9 x hsa-miR-1247-5p chr14 - q 2.4 8.93E − 03 ↓ 2 x hsa-miR-147b chr15 + q -2.1 5.52E − 03 ↑ 2 x hsa-miR-7-5p* chr15 + q -3.1 9.50E − 04 ↑ 2 x x hsa-miR-21-5p chr17 + q -2.2 3.67E − 03 ↑ 0 x hsa-miR-1 chr18 - q 5.2 9.38E − 04 ↓ 0 x x x hsa-miR-99a-3p chr21 + q 2.8 5.83E − 03 ↓ 0 x hsa-miR-486-3p chr8 - p 2.4 2.81E − 03 ↓ 0 x hsa-miR-486-5p + chr8 - p 2.0 1.78E − 02 ↓ 0 x hsa-miR-31-5p* chr9 - q -3.4 6.18E − 04 ↑ 0 x x x hsa-miR-204-5p* chr9 - q 4.6 8.81E − 04 ↓ 2 x 2.4 Relevance of the newly discovered miRNAs Although it would have been desirable to experimentally validate the pathways downstream of the newly detected deregulated miRNAs, such analyses were beyond the scope of the current study because of the exhaustion of biospecimens used by De Sarkar et al [ 16 ]. However, several other functional works provided supporting evidence for the role of these newly “discovered” miRNAs in carcinogenesis. In particular, the availability of the whole transcriptome analysis data from Singh et al. (2017) allowed for correlative analysis, and our findings showed concordance with previously published onco-miRNAome work. We utilized these published studies and Singh et al. [ 17 ] whole-transcriptome analysis to analyze our findings. Singh et al. assessed 12 cancer tissues, with 10 overlapping with the De Sarkar et al. miRNAome study [ 16 ]. Notably, the expression of 5 of the seven newly detected miRNAs was deregulated in the opposite direction of their target genes, as reported in Singh et al. [ 17 ] (Table 2 ). Singh et al. [ 17 ] evaluated 12 cancer tissues, 10 of which were included in the miRNAome study of De Sarkar et al. [ 16 ]. Expectedly, the expression of five (of the seven) newly detected miRNAs was deregulated in the opposite ]direction of their target genes, and reported in Singh et al. [ 17 ] (Table 2 ). This observation concordantly supports the validity of our discovery. Below, we summarize the relevance of these five miRNAs. The remaining two miRNAs, hsa-miR-1247-5p and hsa-miR-147b, also exhibited oncogenic activity. However, there is limited evidence supporting their involvement in OSCC. Table 2 Target genes of the newly discovered miRNAs with deregulated expressions, as identified by Singh et al. (2017 ). Arrows next to the genes indicate the direction of deregulation. Information on the target genes of each miRNA was obtained from the corresponding references. miRNA Deregulated target genes found by Singh et al. (2017) References hsa-miR-21-5p MEF2C (↓), PDCD4 (↓), TIMP3 (↓), PPARA (↓), RECK (↓), SPRY1 (↓), SPRY2 (↓), THRB (↓) [28] hsa-miR-1 FAM101B (↑), TRANK1 (↑), ZNF281 (↑), Runx2 (↑) CXCR4 (↑) [ 36 ], [ 38 ], [ 53 ], [ 54 ] hsa-miR-486-3p FLNA (↑) [43] hsa-miR-486-5p KIAA1199 (↑) [ 38 ], [43], [ 44 ] hsa-miR-99a-3p BCAT1 (↑), MTHFD2 (↑), RRM2 (↑) [ 38 ] Among the newly detected miRNAs, hsa-miR-21-5p (upregulated; Table 1 ), and hsa-miR-99a-3p (downregulated; Table 1 ) have been extensively studied in OSCC [ 25 – 27 ]. The miRNA hsa-miR-21 is a ubiquitous oncogene with prognostic potential for OSCC [28–31]. The analysis by Singh et al. [ 17 ] identified eight target genes of hsa-miR-21 (MEF2C, PDCD4, TIMP3, PPARA, RECK, SPRY1, SPRY2, and THRB) with downregulated expression, further supporting our new “discovery” (Table 2 ). The downregulation of the expression of these genes has been linked to various hallmarks of cancer [ 32 ]. Notably, PDCD4 is one of the main target genes of hsa-miR-21 [33]. The miRNA hsa-miR-99a-3p acts as a tumor suppressor in some squamous cell carcinomas. Three oncogenes regulated by this miRNA (BCAT1, MTHFD2, and RRM2) were significantly upregulated in Singh et al. [ 17 ]. These oncogenes are associated with poor cancer prognosis [ 34 , 35 ]. The list of the seven newly detected miRNA included three additional miRNAs with potential tumor-suppressing activities: hsa-miR-1, hsa-miR-486-3p, and hsa-miR-486-5p [ 36 , 37 ]. Among them, hsa-miR-1 is a well-established tumor suppressor, and is considered a cancer biomarker [ 38 ]. Its target genes Runx2 and CXCR4 , which exhibited significantly upregulated expression in the analysis by Singh et al. [ 17 ] (Table 2 ). RUNX2 is implicated in breast cancer [39, 40], CXCR4 in thyroid cancer and osteosarcoma [ 41 ], and ZNF281 in pancreatic cancer [ 42 ]. However, FLNA and KIAA1199, target genes of hsa-miR-486-3p and hsa-miR-486-5p, respectively, which were upregulated in Singh et al. [ 17 ] (Table 2 ). FLNA promotes cell migration in laryngeal squamous cell carcinoma [43] and KIAA1199 is a well-known promoter of invasion and metastasis in thyroid cancer [ 44 ]. 3 DISCUSSION The importance of adopting an accurate statistical approach is well-recognized in the field of high-throughput biological data analytics. High throughput biological data analysis often requires the adoption of a multitude of statistical tests for a large number of “hypothesis tests” conductd simultaneously, increasing the risk of false discoveries. Controlling the FDR allows investigators to strike a balance between identifying potentially meaningful findings (true positive findings/discoveries) and minimizing the likelihood of spurious results. Researchers have simultaneously examined numerous genes, proteins, and other biomarkers. The BH method is arguably the most widely used approach adopted in high-throughput bioinformatics analysis for FDR control [ 45 ]. However, unless the tests are independent or weakly dependent, the BH method compromises statistical power as it remains agnostic to any information on the association between test units [ 8 , 9 ]. For example, in the study on miRNAs [ 16 ], well-known oncogenes, such as hsa-miR-1, failed to meet the FDR cutoff with the stringent BH method. Nevertheless, a subsequent study by Singh et al. [ 17 ] revealed that several key hsa-miR-1 target oncogenes were significantly upregulated in GBSCC tumors [16, 17; shared significant sample overlap]. The grouped BH method is a modification of the BH method, which incorporates information on group structures to improve power [ 8 , 9 ]. We utilized grouped BH methods, incorporating a simple miRNA grouping strategy into the FDR control step, to improve the statistical power and demonstrate a proof of principle. In this proof-of-principle study, we chose a dataset in which miRNAome data was generated independent of the total transcriptome. The sample size was smaller than the number of hypothesis tests, and data generation followed rigorous analysis techniques. The dataset by De Sarkar et al. [ 16 ] fulfilled all the required criteria for our study. During analysis, we considered a simplistic group structure in miRNA expressions. We grouped all neighboring miRNAs on the chromosome that could potentially be co-regulated. Although we acknowledge that miRNA associations can arise for different reasons, we opted for this simple approach considering the limited sample size. We also recognize that the adopted positional clustering approach is overly simplistic, but our intent was to illustrate how relevant biological relatedness can improve statistical inference. Despite our simplistic grouping strategy, the grouped BH methods adjusted the q-values to include a few additional miRNAs whose expression was speculated to be deregulated in OSCC (Table 1 ). These new statistical findings were supported by follow-up transcriptome results and independent oral cancer miRNAome study results. The new detections may be attributed to the gain in statistical power from the incorporation of positional information for the miRNAs. However, the sensitivity of the TST-GBH method to missing data, as evidenced by the inclusion of hsa-miR-1293 in Table 1 , may be of concern when the sample size is small. In the present study, the grouping of miRNAs was based on broad spatial information. However, groupings can also occur based on other axis of functional or genetic relatedness information (e.g. ancestral relatedness). Liu et al. [ 13 ] used a gene ontology system for group assignments. In addition, Hu et al. [ 8 ] suggested using clustering algorithms, such as k-means, to partition the test units into groups when information on the group structure is less apparent. However, recent studies found that reusing the same data for clustering and downstream testing can inflate false discoveries [46, 47]. We remind the readers that in using grouped data, independence between groups is not a requirement. In our motivating example, the miRNAs did not function independently. However, for practitioners to fully benefit from grouped FDR methods, the association between units within groups must be stronger than that between units across groups [ 21 ]. Our heuristic analyses suggested that the intra-group association among our groups was stronger than the inter-group association. The grouped BH methods were advantageous over the traditional BH method also when: (a) most hypotheses are null, that is, when the expressions of the majority of miRNAs are not deregulated and (b) the groups are heterogeneous in terms of the proportion of nulls across the groups [ 8 , 48 ]. Based on the findings reported by De Sarkar et al. [ 16 ] the first scenario applies to this dataset. Verifying the second scenario poses challenges. However, the physical separation of our groups, coupled with stronger intra-group associations than inter-group associations, does not refute the possibility of heterogeneous groups in our case. Our results underscore the promise of grouped BH methods for managing the FDR in bioinformatics studies characterized by pre-existing group structures. However, the extrapolation of our conclusions to future datasets may be restricted because of our limited sample size. Future experimental designs are required to ascertain the generalizability of our conclusion to different populations and different types of datasets. The replicability of our results in the resulting datasets remains a promising avenue for further investigation. Notably, although we have performed the group BH multiple test correction in the miRNAome dataset, this strategy might be applied to any multiple test scenario with pre-existing groups among the test hypotheses. Controlled future analyses are needed to test such generalizability. Of particular interest are the datasets pertaining to treatment-emerging cancer subtypes, which often have small sample sizes similar to our study [49, 50]. These types of cancers arise as a result of treatment-related factors, potentially as a consequence of the treatment itself or because of the underlying conditions being treated. Given that the genes responsible for such cancer subtypes are still not fully elucidated, future studies aiming to unveil these genetic factors will necessitate FDR control because of multiple hypothesis testing. Additionally, in the advancing era of precision medicine, we increasingly recognize the importance of a systematic approach to ensure durable therapy benefits to patients with minimal side effects. This approach provides a new dimension of information about gene-gene relatedness, making the relevance of GBH more practical and exceedingly important for precision discovery. 4 METHODS Here we provide a brief overview of the data generation strategies, data preprocessing techniques used by De Sarkar et al. [ 16 ], and the implementation of the discussed statistical methods. 4.1 miRNAome data generation using TLDA assay We downloaded publicly available ΔΔCt data from De Sarkar et al. study [ 16 ]. Very briefly, the data generation approach adopted by their study is as follows. TLDA is a high throughput quantitative PCR assay for relative quantification of standard and low expressing miRNAs in biospecimens [ 51 ]. The De Sarkar et al. [ 16 ] study adopted recommended RNA isolation protocol using mirVana kit (Life Technologies, USA). Samples were QCd for RIN value and ≥ 6.9 were chosen for this analysis. Ct, ΔCt and ∆∆Ct values were derived using SDS and Data Assist (Life Technologies, USA) software packages, (Ct = Cycles at which the PCR product quantity reaches a defined threshold, ΔCt = Ct of a miRNA in cancer tissue - Ct of geometric mean of expression of the three most stable endogenous control miRNAs in that tissue and ∆∆CT = ΔCt of a miRNA in cancer tissue - ΔCt of that miRNA in control tissue ). Assayed miRNA expression was appropriately normalized and trimmed as per recommended TLDA data analysis SOP. De Sarkar et al. [ 16 ] pruned the data down to the expression of 531 miRNAs and recommended using 522 high-quality miRNA expression values (∆∆CT) for future analysis. 4.2 Pairwise t-tests Upregulation or downregulation of miRNA expression was determined by checking if the change of miRNA expression exceeded 2-fold compared to the control. We performed 522 one-sided pairwise t-tests to determine whether the expression of miRNAs was significantly deregulated, as described in De Sarkar et al [ 16 ]. Direction of the one-sided pairwise t-test was determined by checking if the median ∆∆CT value was positive or negative. The null hypothesis of the one-tailed paired t-test was ‘‘expression of a particular miRNA is not greater than 2 fold upregulated (or downregulated)’’. 4.3 Exploratory analysis using the simple random effect model We used the simple random effect model to quantify and compare the intra-group and inter-group associations. To formalize the setup, we will introduce some new notations. We denote {Y ij : j = 1,. . ., 522} to be the ΔΔCT values of the jth miRNA of the ith patient. Let g(j) be the group indicator of the jth miRNA. For example, if the jth miRNA belongs to the first group, then g(j) = 1. Because there are 67 groups, the group index g(j) takes values between 1 and 67. We posit a random effect model [ 23 ] for the ∆∆ CT values as Yij = µ + Si + Gig ( j ) + Eij , where µ is the overall mean, S i is the random effect of the ith patient, Gig ( j ) is the random effect of the g ( j )th group for the i th patient, and E ij is the random error for the i th patient and the j th miRNA. The random error can represent independent measurement errors or other underlying biological factors. We let the collections of random variables {S i : i = 1,. . ., 18}, {Gig(j) : 1 ≤ i ≤ 18, 1 ≤ j ≤ 522}, and {E ij : 1 ≤ i, j ≤ 522} be independent of each other. We also assumed that \({S}_{1},\dots ,{S}_{18}\) are independent and identically distributed centered Gaussian random variables with variance \({\sigma }_{s}^{2}\) , \({G}_{ig\left(j\right)}\) ’s identically distributed independent centered Gaussian random variables with variance \({\sigma }_{g}^{2},\) and the \({E}_{ij}\) ’s are also identically distributed independent centered Gaussian random variables with variance \({\sigma }_{e}^{2}.\) Under the above model, the correlation between the ∆∆ Ct values of any two miRNAs belonging to any two different groups within the same patient remained constant. The above correlation will thus be a natural candidate for the inter-group correlation. Straight-forward algebra shows that the inter-group correlation equals σ 2 s / ( σ 2 s + σ 2 g + σ 2 e ). Likewise, under our model, the ∆∆ Ct values of two miRNAs belonging to the same group within the same patient have identical correlations across the patients and the miRNAs – this correlation will be a natural candidate for the intra-group correlation. The intra-group correlation is given by ( σ 2 s + σ 2 g ) / ( σ 2 s + σ 2 g + σ 2 e ). We fit the simple random-effect model using SAS proc mixed [ 52 ]. 4.4 Implementation of BH and grouped BH methods The BH method was implemented using the package Stats in the software R. The TST-GBH and LSL-GBH methods were implemented following the algorithms provided by Hu et al. [ 8 ]. To implement SABHA, we used the R codes provided by the authors of the corresponding paper by Li and Barber [9). SABHA requires two tuning parameters: (1) ϵ , a lower bound on the proportion of the true null hypotheses, and (2) a threshold τ , used for calibrating the individual p-value thresholds. Similar to Li and Barber [ 9 ], we used ϵ = 0.1 and τ = 0.5. We found that the discoveries were not sensitive to small perturbations of either of these tuning parameters. Declarations ACKNOWLEDGEMENTS Nilanjana Laha's research was partially supported by the NSF-DMS grant DMS-2311098. The authors are thankful to Editage for providing editing support. AUTHOR CONTRIBUTIONS Salil Koner contributed to the statistical analyses. Navonil De Sarkar contributed to the paper's writing and conceived the idea. Nilanjana Laha supervised the project, secured funding for editing-related services, wrote the paper, and conceptualized the statistical part of the project. COMPETING INTERESTS The authors declare that there are no competing interests associated with this manuscript. DATA AVAILABILITY The authors affirm the unrestricted availability of all data supporting the findings. Relevant data can be accessed in the Supplementary files of De Sarkar et al. on the journal's website: (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4134240/). References Dudoit, S., Shaffer, J. P. & Boldrick, J. C. Multiple hypothesis testing in microarray experiments. Stat. Sci. 18 , 71–103 (2003). Goeman, J. J. & Solari, A. Multiple hypothesis testing in genomics. Stat. Med. 33 , 1946–1978 (2014). Sesia, M., Bates, S., Candes, E., Marchini, J. & Sabatti, C. False discovery rate control in` genome-wide association studies with population structure. Proc. Natl. Acad. Sci. U.S.A. 118 , e2105841118 (2021). Menyhart, O., Weltz, B. & Gyorffy, B. Multipletesting.com: A tool for life science researchers˝ for multiple hypothesis testing correction. PLoS One 16 , e0245824 (2021). Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Methodol. 57 , 289–300 (1995). Korthauer, K., et al. A practical guide to methods controlling false discoveries in computational biology. Genome Biol. 118 , 1–21 (2019). Benjamini, Y. & Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. Ann . Stat . 29 , 1165–1188 (2001). Hu, J. X., Zhao, H. & Zhou, H. H. False discovery rate control with groups. J. Am. Stat. Assoc. 105 , 1215–1227 (2010). Li, A. & Barber, R. F. Multiple testing with the structure-adaptive Benjamini–Hochberg algorithm. J. R. Stat. Soc. Methodol. 81 , 45–74 (2019). Genovese, C. R., Roeder, K. & Wasserman, L. False discovery control with p-value weighting. Biometrika 93 , 509–524 (2006). Koutna, I., et al. New insights into´ gene positional clustering and its properties supported by large-scale analysis of various differentiation pathways. Genomics 89 , 81–88 (2007). Zhang, B., et al. Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M&M algorithm. Genome Res. 23 1522–1540 (2013). Liu, H., et al. Whole-transcriptome analysis of differentially expressed genes in the vegetative buds, floral buds and buds of chrysanthemum morifolium. PLoS One 10 , e0128009 (2015). Efron, B. Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Stat. 2 , 197–223 (2008). Benjamini, Y. & Heller, R. False discovery rates for spatial signals. J. Am. Stat. Assoc. 102 , 1272–1281 (2007). De Sarkar, N., et al. A quest for mirna bio-marker: a track back approach from gingivo buccal cancer to two different types of precancers. PloS One 9 , e104839 (2014). Singh, R., et al. Analysis of the whole transcriptome from gingivo-buccal squamous cell carcinoma reveals deregulated immune landscape and suggests targets for immunotherapy. PloS One 12 , e0183606 (2017). Hughes, R. A., Heron, J., Sterne, J. A. C. & Tilling, K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int. J. Epidemiol. 48 , 1294–1304 (2019). Zhang, Z. Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4 (2016). Sarkar, S. K. & Zhao, Z. Local false discovery rate based methods for multiple testing of one-way classified hypotheses. Electron. J. Stat. 16 , 6043–6085 (2022). Chu, Y., et al. mir-1247-5p functions as a tumor suppressor in human hepatocellular carcinoma by targeting wnt3. Oncol. Rep. 38 , 343–351 (2017). Nakagawa, S., Johnson, P. C. & Schielzeth, H. The coefficient of determination R 2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded. J. R. Soc. Interface 14 , 20170213 (2017). Searle, S. R., Casella, G. & McCulloch, C. E. Variance Components . (John Wiley & Sons, 2009) . Montgomery, D. C., Peck, E. A. & Vining, G. G. Introduction to Linear Regression Analysis. (John Wiley & Sons, 2021). Troiano, G., et al. Circulating miRNAs from blood, plasma or serum as promising clinical biomarkers in oral squamous cell carcinoma: A systematic review of current findings. Oral Oncology 63, 30–37 (2016).26. Setién-Olarra, A., et al. Genomewide miRNA profiling of oral lichenoid disorders and oral squamous cell carcinoma. Oral Dis. 22 , 754–760 (2016). Chamorro Petronacci, C. M., et al. miRNAs expression of oral squamous cell carcinoma patients: Validation of two putative biomarkers. Medicine 98 (2019). Buscaglia, L. E. B. & Li, Y. Apoptosis and the target genes of microrna-21. Chin. J. Cancer 30 , 371–380 (2011). Dioguardi, M., et al. Microrna-21 expression as a prognostic biomarker in oral cancer: Systematic review and meta-analysis. Int. J. Environ. Res. Public Health 19 , 3396 (2022). Troiano, G., et al. Predictive prognostic value of tissue-based microRNA expression in oral squamous cell carcinoma: a systematic review and meta-analysis. J. Dent. Res . 97 , 759–766 (2018). McQueen, C. Comprehensive Toxicology (Elsevier, 2017). Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144 , 646–674 (2011). Jenike, A. E. & Halushka, M. K. mir-21: a non-specific biomarker of all maladies. Biomarker Res. 9 , 1–7. (2021). Okada, R., et al. Regulation of oncogenic targets by mir-99a-3p (passenger strand of mir-99a-duplex) in head and neck squamous cell carcinoma. Cells 8 , 1535 (2019). Osako, Y., et al. Potential tumor-suppressive role of microrna-99a-3p in sunitinib-resistant renal cell carcinoma cells through the regulation of rrm2. Int. J. Oncol. 54 , 1759–1770 (2019). Khan, P., et al. Microrna-1: Diverse role of a small player in multiple cancers. Semin. Cell Dev. Biol. 124 , 114–126 (2022). Yang, H., et al. Mir-486-3p inhibits the proliferation, migration and invasion of retinoblastoma cells by targeting ecm1. Biosci. Rep . 40 (2020). Safa, A., et al. mir-1: A comprehensive review of its role in normal development and diverse disorders. Biomed. Pharmacother . 132 , 110903 (2020). Pratap, J., et al. Regulatory roles of runx2 in metastatic tumor and cancer cell interactions with bone. Cancer Metastasis Rev. 25 , 589–600 (2006). Wysokinski, D., Blasiak, J. & Pawlowska, E. Role of runx2 in breast carcinogenesis. Int. J. Mol. Sci. 16 , 20969–20993 (2015). Li, B., et al. Epigenetic regulation of CXCL12 plays a critical role in mediating tumor progression and the immune response in osteosarcomaos fate determined by epigenetic regulation of cxcl12. Cancer Res. 78 , 3938–3953 (2018). Qian, Y., Li, J. & Xia, S. Znf281 promotes growth and invasion of pancreatic cancer cells by activating wnt/β-catenin signaling. Dig. Dis. Sci. 62 , 2011–2020 (2017). ElKhouly, A. M., Youness, R. & Gad, M. Microrna-486-5p and microrna-486-3p: Multifaceted pleiotropic mediators in oncological and non-oncological conditions. Non-coding RNA Res. 5 , 11–21 (2020). Jiao, X., et al. Kiaa1199, a target of micorna-486-5p, promotes papillary thyroid cancer invasion by influencing epithelial-mesenchymal transition (emt). Med. Sci. Monit. Basic Res. 25 , 6788–6796 (2019). Haynes, W. Benjamini–Hochberg Method (Springer New York, 2013). Lähnemann, D., et al. Eleven grand challenges in single-cell data science. Genome Biol. 21 , 1–35 (2020). Gao, L. L., Bien, J. & Witten, D. Selective inference for hierarchical clustering. arXiv Preprint arXiv:2012.02936 (2020). Francois, O., Martins, H., Caye, K., and Schoville, S. D. (2016). Controlling false discoveries in genome scans for selection. Mol Ecol. 25 , 454–469 Aggarwal, R. R., et al. Whole-genome and transcriptional analysis of treatment-emergent small-cell neuroendocrine prostate cancer demonstrates intraclass heterogeneity. Mol. Cancer Res . 17 , 1235–1240 (2019). Clermont, P. L., Ci, X., Pandha, H., Wang, Y. & Crea, F. Treatment-emergent neuroendocrine prostate cancer: molecularly driven clinical guidelines. Int. J. Endocrinol. 6 , IJE20. (2019). Wang B, et al. Systematic evaluation of three microRNA profiling platforms: Microarray, beads array, and quantitative real-time PCR array. PLoS One 6 e17167 (2011). SAS Institute Inc. SAS/STAT® 9.2 User’s Guide. (SAS Institute Inc., 2008). Leone, V., et al. MiR-1 Is a tumor suppressor in thyroid carcinogenesis targeting CCND2, CXCR4, and SDF-1Œ±. J. Clin. Endocrinol. Metab. 96 , E1388–E1398 (2011). doi:10.1210/jc.2011-0345 Nohata, N., et al. mir-1 as a tumor suppressive microRNA targeting tagln2 in head and neck squamous cell carcinoma. Oncotarget 2 , 29 (2011). Genovese, C. R., Lazar, N. A. & Nichols, T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. Neuroimage 15 , 870–878 (2002). Benjamini, Y., Krieger, A. M. & Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 , 491–507 (2006). Additional Declarations No competing interests reported. Supplementary Files SupplementV4.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3861673","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":267205850,"identity":"2c2aa180-257e-4789-bf7a-de7942fce44d","order_by":0,"name":"Salil Koner","email":"","orcid":"","institution":"Duke University","correspondingAuthor":false,"prefix":"","firstName":"Salil","middleName":"","lastName":"Koner","suffix":""},{"id":267205851,"identity":"a6f5bb7b-7673-41aa-93b0-3c5faa377b45","order_by":1,"name":"Navonil Sarkar","email":"","orcid":"","institution":"Medical College of Wisconsin","correspondingAuthor":false,"prefix":"","firstName":"Navonil","middleName":"","lastName":"Sarkar","suffix":""},{"id":267205852,"identity":"597da589-ac3b-46f4-877f-476de1d8782e","order_by":2,"name":"Nilanjana Laha","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAqUlEQVRIiWNgGAWjYDACdgY2BoYKKIeHKC3MIC1nDEjVwthGihZzZuZnD37O+yOvOyOB8cHbNiK0WDazmRv2bjMw3HYjgdlwLjFaDA4zmEnwbjNgBGphk+YlTgv7N8m/cwzsgVrYfxOphcdMmrfBIBFkCzOxWsqkZY4ZJ28787BZcs45YrQcb98m+aZGznbb8eSDH96UEaEFCTA2kKZ+FIyCUTAKRgFuAAAo4TMIjqfYyQAAAABJRU5ErkJggg==","orcid":"","institution":"Texas A\u0026M University","correspondingAuthor":true,"prefix":"","firstName":"Nilanjana","middleName":"","lastName":"Laha","suffix":""}],"badges":[],"createdAt":"2024-01-14 01:29:08","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3861673/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3861673/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":49729062,"identity":"7710448a-b6b1-4531-8d2e-6b2ad509ba9f","added_by":"auto","created_at":"2024-01-17 05:18:21","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":88452,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHeatmap of the absolute correlations between the ∆∆Ct values of the miRNAs from two groups.\u003c/strong\u003e (a) The first group corresponds to the q arm of the positive strand of chromosome 9 and the second group corresponds to the q arm of the negative strand of chromosome 8. (b) The first group corresponds to the p arm of the positive strand of chromosome 12 and the second group corresponds to the miRNAs on the q arm of the positive strand of chromosome 13. The diagonal blocks in Figures 1a and 1b provide the heatmaps of the absolute correlation between the ∆∆Ct values of miRNAs within the same group, whereas the off-diagonal blocks showcase the absolute correlation between ∆∆Ct values of miRNAs from different groups. Only absolute correlations above 0.5 are presented in the heatmaps. The ordering of miRNAs is based on their positions on the chromosome, ensuring that adjacent miRNAs are positioned next to each other in the heatmap. The sidebars below the heatmaps provide the start coordinates of the miRNAs.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-3861673/v1/cafa2dbf99cd3d4a10e81f5c.png"},{"id":49729060,"identity":"eb3361b3-e0dd-4e58-843d-4c40c14b2fea","added_by":"auto","created_at":"2024-01-17 05:18:21","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":112902,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003e\u003cstrong\u003eHeatmap representing the ∆∆Ct values of the deregulated miRNAs in Table 1 displaying their patterns of deregulation in cancer tissue samples from 18 gingival buccal squamous cell carcinoma patients\u003c/strong\u003e\u003c/em\u003e\u003cem\u003e. The miRNAs are ordered as per unadjusted p-values obtained from the paired t-tests.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-3861673/v1/f0001bd9e0e947236cd043c2.png"},{"id":57577101,"identity":"a5305cf8-7af8-498e-86bd-57733a13dd84","added_by":"auto","created_at":"2024-06-02 20:16:38","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":960628,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3861673/v1/b47660c4-4cbc-4118-a5da-d4dcf960f06c.pdf"},{"id":49729063,"identity":"deacb195-9a16-4dd7-aebb-c84b75d6632b","added_by":"auto","created_at":"2024-01-17 05:18:21","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":112568,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementV4.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3861673/v1/ec5376f3442bc429a6765945.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"False discovery rate control: Moving beyond the Benjamini–Hochberg method","fulltext":[{"header":"1 INTRODUCTION","content":"\u003cp\u003eHigh throughput and massively parallel genome-scale molecular profiling technologies transformed the field of genomic and precision medicine by making top-down molecular mapping approaches feasible. For example, RNA-seq or microarray studies routinely analyze several thousands of genes and gene regulatory elements to identify mRNA expression in both normal and aberrant tissues under disease conditions [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In genome-wide association studies (GWASs), numerous genetic variants are simultaneously examined at the population level to explore their potential associations with specific traits [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Statistical analysis of such data often requires simultaneous evaluation of many hypotheses, each corresponding to one biological element, such as one gene expression in a microarray or RNA-seq study or a single nucleotide polymorphism in a GWAS. The rejection of a hypothesis is generally associated with biologically relevant elements, indicating statistical \u0026ldquo;discovery.\u0026rdquo; However, the simultaneous testing of multiple hypotheses using statistical tests leads to an increased likelihood of falsely rejecting a hypothesis, resulting in a high number of false discoveries, necessitating additional statistical methods for their control [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Although several strategies have been proposed for addressing the false discovery rate in multiple hypothesis-testing problems, the false discovery rate (FDR) control method introduced by Benjamini and Hochberg [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] has become the most widely used method in high throughput bioinformatics analysis [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The FDR is the expected proportion of false rejections among all rejected hypotheses. In other words, if the expressions of 100 genes are found to be significantly deregulated following the statistical analysis of a microarray study, a 5% FDR indicates that 5/100 identified genes are false discoveries and that their expression is not really deregulated.\u003c/p\u003e \u003cp\u003eIn the past decades, various statistical methods have been proposed for controlling the FDR. However, the Benjamini and Hochberg (BH) procedure remains the most prominent tool for this purpose in the realms of bioinformatics [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. The BH method, initially described for independent hypotheses, preserves the FDR provided the hypotheses are weakly dependent [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The latter assumption generally holds for most biological applications [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The BH method relies solely on p-values of multiple hypothesis tests. It provides a p-value cutoff, and only tests when lower p-values are rejected. Although the BH method reliably controls the FDR, it is known to be conservative when the number of hypotheses to be tested is much larger than sample size, which is common in many modern biological studies, including high throughput studies [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The high dimensionality of the data increases the noise within the data, making it harder to distill a weaker signal from the noise. The conservativeness of the BH method can therefore potentially hinder scientific discoveries based on modern data.\u003c/p\u003e \u003cp\u003eThe conservativeness of the BH method stems from its inability to utilize the available biological information in the hypotheses [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. It treats all hypotheses as exchangeable, ignoring any extra information on the dependence between the hypotheses [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. However, in some domains, such as bioinformatics, extra information for distinguishing the hypotheses may be available. This study focuses on hypotheses with pre-existing group structures. These groups often arise naturally in biological data. For example, closely located genes are often correlated and may be co-expressed because of their shared transcription programs and regulatory switches, thus forming positional clusters [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Several studies have repeatedly shown that grouped FDR control methods that exploit the information on groups exhibit higher statistical power than the BH method with such data [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. A higher statistical power indicates the ability to detect lower signals, thus potentially increasing scientific discoveries without compromising the FDR. The grouped BH method achieves this by recalibrating the p-values to place more importance on potentially critical groups. However, such group-aware FDR methods are rarely adopted in high throughput data analysis by bioinformaticians.\u003c/p\u003e \u003cp\u003eBriefly, a group's criticality is determined by the proportion of true null hypotheses within it; smaller proportions indicating greater criticality. Grouped BH methods prioritize rejections from the potential critical groups, estimating groupwise null proportions via statistical methods, such as the two-stage method (TST), least-slope method (LSL), and likelihood-based methods. This results in the grouped BH methods of TST-GBH, LSL-GBH, and structure adaptive BH algorithm (SABHA), respectively [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. The TST-GBH and LSL-GBH methods were previously applied by Zhang et al. [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] to discover differentially methylated regions in the human genome and by Liu et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] to discover differentially expressed unigenes during the blooming process of Asteraceae flowers. SABHA is a more general method that works for a range of structured data and is not limited to grouped hypotheses.\u003c/p\u003e \u003cp\u003eWhen the p-values are independent or weakly correlated, the grouped BH methods reportedly control the FDR at the desired level if group-wise null proportions are reasonably well-estimated [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Apart from the grouped BH methods, an emerging statistical technique called knock-off has also shown promise for FDR control and is flexible enough to incorporate group structures [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Notably, the abovementioned methods do not use groups as the unit of rejection. Instead, rejections are conducted at the level of the individual hypotheses, where information on the groups is used to improve statistical power. However, FDR control methods that reject hypotheses at the group level are also available [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThis study demonstrates the advantage of grouped BH methods by utilizing the microRNA (miRNA) data from a study by De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] on 18 cancer patients with gingival buccal squamous cell carcinoma (GBSCC). In the study, the authors determined the expression of several miRNAs to investigate their role in this type of oral carcinogenesis. They compared the miRNA expression in histopathologically confirmed malignant and healthy normal (control) tissues using one-sided paired t-tests. As there were 522 miRNA assays in the reverse transcription polymerase chain reaction-based miRNAome panel, 522 t-tests were performed. The BH method was used to control the FDR at 5%, and seven miRNAs whose expression were significantly deregulated in the malignant tissues were identified. However, some critical miRNAs for oral squamous cell carcinoma (OSCC), including hsa-miR-21-5p and hsa-miR-99a-3p, were either borderline or below the BH threshold. Some of the target genes of these miRNAs were significantly deregulated by Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] in a follow-up whole transcriptome analysis study, which analyzed a case series substantially overlapping with the current study case series. This observation raised the question of whether the BH method has enough statistical power to detect miRNAs with deregulated expression in this De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] case series.\u003c/p\u003e \u003cp\u003eConsidering our limited sample size (n\u0026thinsp;=\u0026thinsp;18), the number of miRNAs analyzed (n\u0026thinsp;=\u0026thinsp;522), and their genome wide distribution pattern, we opted for a relatively straightforward grouping approach in this proof-of-concept analysis, focusing on the physical location of miRNAs on the chromosome. The broad assumption was proximal miRNAs tend to be co-expressed [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Therefore, we divided the chromosomes into positional groups based on their shared chromosome number, arm, and strand. We incorporated this group information into the statistical analysis of our data via the grouped BH methods LSL-GBH, TST-GBH, and SABHA, and compared their performance to that of the BH method. Existing literature and whole transcriptome analysis by Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] were used to support the findings.\u003c/p\u003e"},{"header":"2 RESULTS","content":"\u003cp\u003eThe study participants were 18 unrelated Indians aged 39\u0026ndash;80 years with tobacco habits, with a male:female ratio of 5:4. All patients had histopathologically confirmed GBSCC, which is a type of OSCC prevalent in the tobacco-chewing population in South Asia. The dataset comprised ∆Ct values for 522 miRNAs derived from 18 tumor-normal sample pairs of the 18 patients. ∆Ct values serve as surrogate estimates for relative miRNA expression levels (relative to geometric mean expression of three endogenous control genes). The data collection method was previously described [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Missingness:\u003c/h2\u003e \u003cp\u003eSome miRNAs were not expressed in all 18 patients in the data from De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. For each miRNA, if one data point in the paired miRNA expression data fell below the detection threshold, De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] excluded both data points to reduce complexity. Consequently, some miRNA pairs were missing in the miRNA expression dataset, with 59.77% of miRNAs having at least one pair of missing observations. Moreover, 30% of miRNAs had more than two pairs of missing observations. Each patient had 12 or more missing miRNA expression pairs, with the majority having fewer than 50 missing pairs. For further details on missingness, refer to Supplementary Section S1.\u003c/p\u003e \u003cp\u003eThe missingness in the miRNA expressions could have stemmed from low detection threshold which led to the inability to detect low miRNA expression. The parallel miRNA expression assay was conducted using TLDA-A (V2) and TLDA-B (V3) cards on the 7900HT FAST Real Time PCR system (Applied Biosystems, USA) with the TLDA flat block, which is known to have a fixed lower detection limit [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Although this could be one of the reasons for data missingness, various other factors could have also contributed to the data-missingness. Nevertheless, it is evident that the missingness of the miRNAome data was not at random (MNAR) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Although De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] imputed the missing observations using a simple median imputation strategy in their statistical analysis, we observed a sharp increase in the number of discoveries regardless of the FDR control method when using median or mean imputation. This is unsurprising because it is well-known that imputation with the median or mean can artificially decrease the sample variance, especially when the sample size is small, leading to false discoveries [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. Given our limited sample size and the complexity of measuring all biological factors behind miRNA expression, we chose not to impute missing values. Instead, we conducted our statistical analysis using only complete cases. Hughes et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] showed that complete case analysis can be statistically valid in MNAR data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Grouping:\u003c/h2\u003e \u003cp\u003eAs mentioned previously, we grouped the miRNAs based on their shared chromosome number, arm, and strand, resulting in 67 groups after excluding groups with no membership among our 522 miRNAs. Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e shows the number of miRNAs in each group. Some groups were thinly populated with only two to five members, but merging them with the groups on the other arm on the same strand yielded only a nominal difference\u0026mdash;only hsa-miR-486-5p ceased to be statistically significant after merging. Most groups were of small to moderate size, but three groups had more than 30 miRNAs each, located on chromosome 14 (positive strand), chromosome 19 (positive strand), and the X chromosome (negative strand); see Supplementary Table S2. Splitting these large groups into adjacent equal halves does not alter the outcome of subsequent statistical analyses. In consideration of these factors, we implemented the grouped BH methods using the original 67 groups.\u003c/p\u003e \u003cp\u003eGroupings are known to be statistically meaningful only when the intra-group association (association between test units within the same group) is stronger than the inter-group association (association between test units in different groups) [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Hu et al. [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] observed that the group BH methods demonstrated higher statistical power than the BH method in such scenarios. We heuristically compared the intra-group and inter-group associations between the ∆∆Ct values utilizing a widely used statistical approach\u0026mdash;a simple random effect model [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]. This model facilitates the quantification and estimation of inter-group and intra-group correlations, which measures the association within and across the groups, respectively [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. The intra-group correlation corresponds to the correlation between ∆∆Ct values of miRNAs within the same group, where the inter-group correlation is the correlation between ∆∆Ct values of miRNAs belonging to two different groups. The simple random effect model assumes the inter-group and intra-group correlations to be constant across all groups. Even if these assumptions are violated, the estimated correlations provide meaningful assessments of the overall strength of the inter-group and intra-group association. Details on fitting the simple random effect model can be found in Section \u003cspan refid=\"Sec11\" class=\"InternalRef\"\u003e4.3\u003c/span\u003e. The estimated inter-group correlation was 0.04, indicating a weak inter-group association. In contrast, the intra-group correlation was estimated to be 0.14, noticeably larger than the inter-group correlation. Thus, the simple random effect model suggested a weaker inter-group association compared to intra-group association. However, visual examination of pairwise correlations between ∆∆Ct values revealed that miRNAs within the same group tended to exhibit higher correlations compared to those across different groups, supporting the findings of the heuristic analysis. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates two cases where overall within-group correlations are higher compared to correlations between miRNA pairs belonging to different groups. In these instances, we also observed that the ∆∆Ct values of adjacent miRNAS tended to cluster, aligning with the assertion that adjacent miRNAs are co-expressed.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Significantly deregulated miRNAs\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e lists the deregulated miRNAs identified by TST-GBH, LSL-GBH, SABHA, and BH methods. Figure\u0026nbsp;2 shows a heatmap of the ∆∆Ct values of these miRNAs. As reported by Sarkar and Zhao [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], the TST-GBH method was the most liberal of the four methods, leading to the highest number of discoveries. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e includes the missing data frequencies for the identified genes. Notably, 50% of the data for hsa-miR-1293 was missing, rendering the conclusions drawn regarding this miRNA questionable. The LSL-GBH and SABHA methods agreed with each other, with five rejections each. The BH method had the lowest number of discoveries, with only three rejections. Notably, De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] also used the BH method but reported seven discoveries because they used median-imputed data. As mentioned previously, median imputation inflates the number of discoveries for all methods. Supplementary Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e lists the miRNAs detected by the four FDR control methods using the median-imputed data.\u003c/p\u003e \u003cp\u003eThe grouped BH method identified seven additional miRNAs alongside the seven miRNAs previously identified by De Sarkar et al. (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Among the newly discovered miRNAs, the expression of hsa-miR-1247-5p, hsa-miR-1, hsa-miR-99a-3p, and hsa-miR-486 3p/5p was downregulated, whereas that of hsa-miR-21-5p and hsa-miR-147b was upregulated in GBSCC cancer tissues. Although De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] did not identify these miRNAs, the p-value of hsa-miR-1 was just below the BH-corrected level of\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003emicroRNAs (miRNAs) with significantly deregulated expression.\u003c/b\u003e The miRNAs listed here were detected using at least one of the four methods tested: BH, TST-GBH (abbreviated as TST), LSL-GBH (abbreviated as LSL), and SABHA. The \"Chr.\" column indicates the chromosome number; the \"strand\" column indicates the transcribing strand; and the \u0026ldquo;arm\u0026rdquo; column indicates the arm of the miRNA gene. The ∆∆Ct values are used as surrogate estimates of differential expression for each miRNA. As TST-GBH, LSL-GBH, and SABHA do not yield FDR-corrected p-values, we have reported the raw p-values before FDR correction. The arrow signs after the p-values indicate whether the expression of the miRNAs was upregulated (\u0026uarr;) or downregulated (\u0026darr;) (per the sign of ΔΔCt). The \"Missing pairs\" column shows the number of missing observations for each miRNA. The \"Significant tests\" column indicates the methods that detected a miRNA as one with significantly deregulated expression. The expressions of miRNAs superscripted with an asterisk were reported to be deregulated by De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. BH, Benjamini\u0026ndash;Hochberg; SABHA, structure adaptive Benjamini\u0026ndash;Hochberg algorithm; TST-GBH, two-stage step-up\u0026ndash;group Benjamini\u0026ndash;Hochberg; LSL-GBH, least-squares group Benjamini\u0026ndash;Hochberg.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"11\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c10\" colnum=\"10\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c11\" colnum=\"11\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003emiRNA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eChr.\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eStrand\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eArm\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eΔΔCt\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eMissing pairs\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eBH\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003eTST\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c10\"\u003e \u003cp\u003eLSL\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c11\"\u003e \u003cp\u003eSABHA\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-133a-3p*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e5.32E\u003csup\u003e\u0026minus;\u0026thinsp;05\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-206*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e6.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.97E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-31-3p*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-3.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.31E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-1293*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-4.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.77E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-1247-5p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e8.93E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-147b\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-2.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e5.52E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-7-5p*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-3.1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e9.50E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-21-5p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-2.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e3.67E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e9.38E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-99a-3p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e+\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e5.83E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-486-3p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2.81E\u003csup\u003e\u0026minus;\u0026thinsp;03\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-486-5p\u003csup\u003e+\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003ep\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1.78E\u003csup\u003e\u0026minus;\u0026thinsp;02\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-31-5p*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-3.4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e6.18E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026uarr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-204-5p*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003echr9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eq\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.6\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e8.81E\u003csup\u003e\u0026minus;\u0026thinsp;04\u003c/sup\u003e\u0026darr;\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003ex\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c10\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c11\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Relevance of the newly discovered miRNAs\u003c/h2\u003e \u003cp\u003eAlthough it would have been desirable to experimentally validate the pathways downstream of the newly detected deregulated miRNAs, such analyses were beyond the scope of the current study because of the exhaustion of biospecimens used by De Sarkar et al [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. However, several other functional works provided supporting evidence for the role of these newly \u0026ldquo;discovered\u0026rdquo; miRNAs in carcinogenesis. In particular, the availability of the whole transcriptome analysis data from Singh et al. (2017) allowed for correlative analysis, and our findings showed concordance with previously published onco-miRNAome work. We utilized these published studies and Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] whole-transcriptome analysis to analyze our findings. Singh et al. assessed 12 cancer tissues, with 10 overlapping with the De Sarkar et al. miRNAome study [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Notably, the expression of 5 of the seven newly detected miRNAs was deregulated in the opposite direction of their target genes, as reported in Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] evaluated 12 cancer tissues, 10 of which were included in the miRNAome study of De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Expectedly, the expression of five (of the seven) newly detected miRNAs was deregulated in the opposite ]direction of their target genes, and reported in Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). This observation concordantly supports the validity of our discovery. Below, we summarize the relevance of these five miRNAs. The remaining two miRNAs, hsa-miR-1247-5p and hsa-miR-147b, also exhibited oncogenic activity. However, there is limited evidence supporting their involvement in OSCC.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eTarget genes of the newly discovered miRNAs with deregulated expressions, as identified by Singh et al. (2017\u003c/b\u003e). Arrows next to the genes indicate the direction of deregulation. Information on the target genes of each miRNA was obtained from the corresponding references.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003emiRNA\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eDeregulated target genes found by Singh et al. (2017)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eReferences\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-21-5p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMEF2C (\u0026darr;), PDCD4 (\u0026darr;), TIMP3 (\u0026darr;), PPARA (\u0026darr;), RECK (\u0026darr;), SPRY1 (\u0026darr;), SPRY2 (\u0026darr;), THRB (\u0026darr;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e[28]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFAM101B (\u0026uarr;), TRANK1 (\u0026uarr;), ZNF281 (\u0026uarr;), Runx2 (\u0026uarr;) CXCR4 (\u0026uarr;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e36\u003c/span\u003e], [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e38\u003c/span\u003e], [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e53\u003c/span\u003e], [\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-486-3p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFLNA (\u0026uarr;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e[43]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-486-5p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKIAA1199 (\u0026uarr;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e38\u003c/span\u003e], [43], [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e44\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ehsa-miR-99a-3p\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBCAT1 (\u0026uarr;), MTHFD2 (\u0026uarr;), RRM2 (\u0026uarr;)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e38\u003c/span\u003e]\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eAmong the newly detected miRNAs, hsa-miR-21-5p (upregulated; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), and hsa-miR-99a-3p (downregulated; Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e) have been extensively studied in OSCC [\u003cspan additionalcitationids=\"CR26\" citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. The miRNA hsa-miR-21 is a ubiquitous oncogene with prognostic potential for OSCC [28\u0026ndash;31]. The analysis by Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] identified eight target genes of hsa-miR-21 (MEF2C, PDCD4, TIMP3, PPARA, RECK, SPRY1, SPRY2, and THRB) with downregulated expression, further supporting our new \u0026ldquo;discovery\u0026rdquo; (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The downregulation of the expression of these genes has been linked to various hallmarks of cancer [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Notably, PDCD4 is one of the main target genes of hsa-miR-21 [33]. The miRNA hsa-miR-99a-3p acts as a tumor suppressor in some squamous cell carcinomas. Three oncogenes regulated by this miRNA (BCAT1, MTHFD2, and RRM2) were significantly upregulated in Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. These oncogenes are associated with poor cancer prognosis [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e34\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e35\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe list of the seven newly detected miRNA included three additional miRNAs with potential tumor-suppressing activities: hsa-miR-1, hsa-miR-486-3p, and hsa-miR-486-5p [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e36\u003c/span\u003e, \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Among them, hsa-miR-1 is a well-established tumor suppressor, and is considered a cancer biomarker [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e38\u003c/span\u003e]. Its target genes \u003cem\u003eRunx2\u003c/em\u003e and \u003cem\u003eCXCR4\u003c/em\u003e, which exhibited significantly upregulated expression in the analysis by Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). RUNX2 is implicated in breast cancer [39, 40], CXCR4 in thyroid cancer and osteosarcoma [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e41\u003c/span\u003e], and ZNF281 in pancreatic cancer [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e]. However, FLNA and KIAA1199, target genes of hsa-miR-486-3p and hsa-miR-486-5p, respectively, which were upregulated in Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). FLNA promotes cell migration in laryngeal squamous cell carcinoma [43] and KIAA1199 is a well-known promoter of invasion and metastasis in thyroid cancer [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e44\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"3 DISCUSSION","content":"\u003cp\u003eThe importance of adopting an accurate statistical approach is well-recognized in the field of high-throughput biological data analytics. High throughput biological data analysis often requires the adoption of a multitude of statistical tests for a large number of \u0026ldquo;hypothesis tests\u0026rdquo; conductd simultaneously, increasing the risk of false discoveries. Controlling the FDR allows investigators to strike a balance between identifying potentially meaningful findings (true positive findings/discoveries) and minimizing the likelihood of spurious results. Researchers have simultaneously examined numerous genes, proteins, and other biomarkers. The BH method is arguably the most widely used approach adopted in high-throughput bioinformatics analysis for FDR control [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e45\u003c/span\u003e]. However, unless the tests are independent or weakly dependent, the BH method compromises statistical power as it remains agnostic to any information on the association between test units [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. For example, in the study on miRNAs [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], well-known oncogenes, such as hsa-miR-1, failed to meet the FDR cutoff with the stringent BH method. Nevertheless, a subsequent study by Singh et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] revealed that several key hsa-miR-1 target oncogenes were significantly upregulated in GBSCC tumors [16, 17; shared significant sample overlap].\u003c/p\u003e \u003cp\u003eThe grouped BH method is a modification of the BH method, which incorporates information on group structures to improve power [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. We utilized grouped BH methods, incorporating a simple miRNA grouping strategy into the FDR control step, to improve the statistical power and demonstrate a proof of principle. In this proof-of-principle study, we chose a dataset in which miRNAome data was generated independent of the total transcriptome. The sample size was smaller than the number of hypothesis tests, and data generation followed rigorous analysis techniques. The dataset by De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] fulfilled all the required criteria for our study. During analysis, we considered a simplistic group structure in miRNA expressions. We grouped all neighboring miRNAs on the chromosome that could potentially be co-regulated. Although we acknowledge that miRNA associations can arise for different reasons, we opted for this simple approach considering the limited sample size. We also recognize that the adopted positional clustering approach is overly simplistic, but our intent was to illustrate how relevant biological relatedness can improve statistical inference. Despite our simplistic grouping strategy, the grouped BH methods adjusted the q-values to include a few additional miRNAs whose expression was speculated to be deregulated in OSCC (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). These new statistical findings were supported by follow-up transcriptome results and independent oral cancer miRNAome study results. The new detections may be attributed to the gain in statistical power from the incorporation of positional information for the miRNAs. However, the sensitivity of the TST-GBH method to missing data, as evidenced by the inclusion of hsa-miR-1293 in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, may be of concern when the sample size is small.\u003c/p\u003e \u003cp\u003eIn the present study, the grouping of miRNAs was based on broad spatial information. However, groupings can also occur based on other axis of functional or genetic relatedness information (e.g. ancestral relatedness). Liu et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] used a gene ontology system for group assignments. In addition, Hu et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] suggested using clustering algorithms, such as k-means, to partition the test units into groups when information on the group structure is less apparent. However, recent studies found that reusing the same data for clustering and downstream testing can inflate false discoveries [46, 47].\u003c/p\u003e \u003cp\u003eWe remind the readers that in using grouped data, independence between groups is not a requirement. In our motivating example, the miRNAs did not function independently. However, for practitioners to fully benefit from grouped FDR methods, the association between units within groups must be stronger than that between units across groups [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Our heuristic analyses suggested that the intra-group association among our groups was stronger than the inter-group association. The grouped BH methods were advantageous over the traditional BH method also when: (a) most hypotheses are null, that is, when the expressions of the majority of miRNAs are not deregulated and (b) the groups are heterogeneous in terms of the proportion of nulls across the groups [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. Based on the findings reported by De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] the first scenario applies to this dataset. Verifying the second scenario poses challenges. However, the physical separation of our groups, coupled with stronger intra-group associations than inter-group associations, does not refute the possibility of heterogeneous groups in our case.\u003c/p\u003e \u003cp\u003eOur results underscore the promise of grouped BH methods for managing the FDR in bioinformatics studies characterized by pre-existing group structures. However, the extrapolation of our conclusions to future datasets may be restricted because of our limited sample size. Future experimental designs are required to ascertain the generalizability of our conclusion to different populations and different types of datasets. The replicability of our results in the resulting datasets remains a promising avenue for further investigation. Notably, although we have performed the group BH multiple test correction in the miRNAome dataset, this strategy might be applied to any multiple test scenario with pre-existing groups among the test hypotheses. Controlled future analyses are needed to test such generalizability. Of particular interest are the datasets pertaining to treatment-emerging cancer subtypes, which often have small sample sizes similar to our study [49, 50]. These types of cancers arise as a result of treatment-related factors, potentially as a consequence of the treatment itself or because of the underlying conditions being treated. Given that the genes responsible for such cancer subtypes are still not fully elucidated, future studies aiming to unveil these genetic factors will necessitate FDR control because of multiple hypothesis testing. Additionally, in the advancing era of precision medicine, we increasingly recognize the importance of a systematic approach to ensure durable therapy benefits to patients with minimal side effects. This approach provides a new dimension of information about gene-gene relatedness, making the relevance of GBH more practical and exceedingly important for precision discovery.\u003c/p\u003e"},{"header":"4 METHODS","content":"\u003cp\u003eHere we provide a brief overview of the data generation strategies, data preprocessing techniques used by De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], and the implementation of the discussed statistical methods.\u003c/p\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e4.1 miRNAome data generation using TLDA assay\u003c/h2\u003e \u003cp\u003eWe downloaded publicly available ΔΔCt data from De Sarkar et al. study [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Very briefly, the data generation approach adopted by their study is as follows. TLDA is a high throughput quantitative PCR assay for relative quantification of standard and low expressing miRNAs in biospecimens [\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. The De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] study adopted recommended RNA isolation protocol using mirVana kit (Life Technologies, USA). Samples were QCd for RIN value and \u0026ge;\u0026thinsp;6.9 were chosen for this analysis. Ct, ΔCt and ∆∆Ct values were derived using SDS and Data Assist (Life Technologies, USA) software packages, (Ct = Cycles at which the PCR product quantity reaches a defined threshold, ΔCt = Ct \u003csub\u003eof a miRNA in cancer tissue\u003c/sub\u003e - Ct \u003csub\u003eof geometric mean of expression of the three most stable endogenous control miRNAs in that tissue\u003c/sub\u003e and ∆∆CT = ΔCt \u003csub\u003eof a miRNA in cancer tissue\u003c/sub\u003e - ΔCt \u003csub\u003eof that miRNA in control tissue\u003c/sub\u003e). Assayed miRNA expression was appropriately normalized and trimmed as per recommended TLDA data analysis SOP. De Sarkar et al. [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] pruned the data down to the expression of 531 miRNAs and recommended using 522 high-quality miRNA expression values (∆∆CT) for future analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Pairwise t-tests\u003c/h2\u003e \u003cp\u003eUpregulation or downregulation of miRNA expression was determined by checking if the change of miRNA expression exceeded 2-fold compared to the control. We performed 522 one-sided pairwise t-tests to determine whether the expression of miRNAs was significantly deregulated, as described in De Sarkar et al [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Direction of the one-sided pairwise t-test was determined by checking if the median ∆∆CT value was positive or negative. The null hypothesis of the one-tailed paired t-test was \u0026lsquo;\u0026lsquo;expression of a particular miRNA is not greater than 2 fold upregulated (or downregulated)\u0026rsquo;\u0026rsquo;.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Exploratory analysis using the simple random effect model\u003c/h2\u003e \u003cp\u003eWe used the simple random effect model to quantify and compare the intra-group and inter-group associations. To formalize the setup, we will introduce some new notations. We denote {Y\u003csub\u003eij\u003c/sub\u003e : j\u0026thinsp;=\u0026thinsp;1,. . ., 522} to be the ΔΔCT values of the jth miRNA of the ith patient. Let g(j) be the group indicator of the jth miRNA. For example, if the jth miRNA belongs to the first group, then g(j)\u0026thinsp;=\u0026thinsp;1. Because there are 67 groups, the group index g(j) takes values between 1 and 67.\u003c/p\u003e \u003cp\u003eWe posit a random effect model [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] for the ∆∆\u003cem\u003eCT\u003c/em\u003e values as\u003cdiv class=\"BlockQuote\"\u003e\u003cp\u003e \u003cem\u003eYij\u003c/em\u003e\u0026thinsp;=\u0026thinsp;\u003cem\u003e\u0026micro;\u003c/em\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eSi\u003c/em\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eGig\u003c/em\u003e(\u003cem\u003ej\u003c/em\u003e)\u0026thinsp;+\u0026thinsp;\u003cem\u003eEij\u003c/em\u003e,\u003c/p\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003ewhere \u003cem\u003e\u0026micro;\u003c/em\u003e is the overall mean, \u003cem\u003eS\u003c/em\u003e\u003csub\u003e\u003cem\u003ei\u003c/em\u003e\u003c/sub\u003e is the random effect of the ith patient, \u003cem\u003eGig\u003c/em\u003e(\u003cem\u003ej\u003c/em\u003e) is the random effect of the \u003cem\u003eg\u003c/em\u003e(\u003cem\u003ej\u003c/em\u003e)th group for the \u003cem\u003ei\u003c/em\u003eth patient, and \u003cem\u003eE\u003c/em\u003e\u003csub\u003e\u003cem\u003eij\u003c/em\u003e\u003c/sub\u003e is the random error for the \u003cem\u003ei\u003c/em\u003eth patient and the \u003cem\u003ej\u003c/em\u003eth miRNA. The random error can represent independent measurement errors or other underlying biological factors. We let the collections of random variables {S\u003csub\u003ei\u003c/sub\u003e : i\u0026thinsp;=\u0026thinsp;1,. . ., 18}, {Gig(j) : 1\u0026thinsp;\u0026le;\u0026thinsp;i\u0026thinsp;\u0026le;\u0026thinsp;18, 1\u0026thinsp;\u0026le;\u0026thinsp;j\u0026thinsp;\u0026le;\u0026thinsp;522}, and {E\u003csub\u003eij\u003c/sub\u003e : 1\u0026thinsp;\u0026le;\u0026thinsp;i, j\u0026thinsp;\u0026le;\u0026thinsp;522} be independent of each other. We also assumed that \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({S}_{1},\\dots ,{S}_{18}\\)\u003c/span\u003e\u003c/span\u003e are independent and identically distributed centered Gaussian random variables with variance \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\sigma }_{s}^{2}\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({G}_{ig\\left(j\\right)}\\)\u003c/span\u003e\u003c/span\u003e\u0026rsquo;s identically distributed independent centered Gaussian random variables with variance \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\sigma }_{g}^{2},\\)\u003c/span\u003e\u003c/span\u003e and the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({E}_{ij}\\)\u003c/span\u003e\u003c/span\u003e\u0026rsquo;s are also identically distributed independent centered Gaussian random variables with variance \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({\\sigma }_{e}^{2}.\\)\u003c/span\u003e\u003c/span\u003e Under the above model, the correlation between the ∆∆\u003cem\u003eCt\u003c/em\u003e values of any two miRNAs belonging to any two different groups within the same patient remained constant. The above correlation will thus be a natural candidate for the inter-group correlation. Straight-forward algebra shows that the inter-group correlation equals \u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003es\u003c/sub\u003e \u003cem\u003e/\u003c/em\u003e (\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003es\u003c/sub\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003ee\u003c/sub\u003e). Likewise, under our model, the ∆∆\u003cem\u003eCt\u003c/em\u003e values of two miRNAs belonging to the same group within the same patient have identical correlations across the patients and the miRNAs \u0026ndash; this correlation will be a natural candidate for the intra-group correlation. The intra-group correlation is given by (\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003es\u003c/sub\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e) \u003cem\u003e/\u003c/em\u003e (\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003es\u003c/sub\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003eg\u003c/sub\u003e\u0026thinsp;+\u0026thinsp;\u003cem\u003eσ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003csub\u003ee\u003c/sub\u003e). We fit the simple random-effect model using SAS proc mixed [\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e52\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Implementation of BH and grouped BH methods\u003c/h2\u003e \u003cp\u003eThe BH method was implemented using the package \u003cem\u003eStats in the software R.\u003c/em\u003e The TST-GBH and LSL-GBH methods were implemented following the algorithms provided by Hu et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. To implement SABHA, we used the R codes provided by the authors of the corresponding paper by Li and Barber [9). SABHA requires two tuning parameters: (1) \u003cem\u003eϵ\u003c/em\u003e, a lower bound on the proportion of the true null hypotheses, and (2) a threshold \u003cem\u003eτ\u003c/em\u003e, used for calibrating the individual p-value thresholds. Similar to Li and Barber [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], we used \u003cem\u003eϵ\u003c/em\u003e = 0.1 and \u003cem\u003eτ\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.5. We found that the discoveries were not sensitive to small perturbations of either of these tuning parameters.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003cp\u003eACKNOWLEDGEMENTS\u003c/p\u003e\n\u003cp\u003eNilanjana Laha\u0026apos;s research was partially supported by the NSF-DMS grant DMS-2311098. The authors are thankful to Editage for providing editing support.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAUTHOR CONTRIBUTIONS\u003c/p\u003e\n\u003cp\u003eSalil Koner contributed to the statistical analyses. Navonil De Sarkar contributed to the paper\u0026apos;s writing and conceived the idea. Nilanjana Laha supervised the project, secured funding for editing-related services, wrote the paper, and conceptualized the statistical part of the project.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCOMPETING INTERESTS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that there are no competing interests associated with this manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDATA AVAILABILITY\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors affirm the unrestricted availability of all data supporting the findings. Relevant data can be accessed in the Supplementary files of De Sarkar et al. on the journal\u0026apos;s website: (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4134240/).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eDudoit, S., Shaffer, J. P. \u0026amp; Boldrick, J. C. Multiple hypothesis testing in microarray experiments. \u003cem\u003eStat. Sci.\u003c/em\u003e \u003cstrong\u003e18\u003c/strong\u003e, 71\u0026ndash;103 (2003).\u003c/li\u003e\n\u003cli\u003eGoeman, J. J. \u0026amp; Solari, A. Multiple hypothesis testing in genomics. \u003cem\u003eStat. Med. \u003c/em\u003e\u003cstrong\u003e33\u003c/strong\u003e, 1946\u0026ndash;1978 (2014).\u003c/li\u003e\n\u003cli\u003eSesia, M., Bates, S., Candes, E., Marchini, J. \u0026amp; Sabatti, C. False discovery rate control in` genome-wide association studies with population structure. \u003cem\u003eProc. Natl. Acad. Sci. U.S.A.\u003c/em\u003e \u003cstrong\u003e118\u003c/strong\u003e, e2105841118 (2021).\u003c/li\u003e\n\u003cli\u003eMenyhart, O., Weltz, B. \u0026amp; Gyorffy, B. Multipletesting.com: A tool for life science researchers˝ for multiple hypothesis testing correction. \u003cem\u003ePLoS One \u003c/em\u003e\u003cstrong\u003e16\u003c/strong\u003e, e0245824 (2021).\u003c/li\u003e\n\u003cli\u003eBenjamini, Y. \u0026amp; Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. \u003cem\u003eJ. R. Stat. Soc. Methodol. \u003c/em\u003e\u003cstrong\u003e57\u003c/strong\u003e, 289\u0026ndash;300 (1995).\u003c/li\u003e\n\u003cli\u003eKorthauer, K., et al. A practical guide to methods controlling false discoveries in computational biology. \u003cem\u003eGenome Biol. \u003c/em\u003e\u003cstrong\u003e118\u003c/strong\u003e, 1\u0026ndash;21 (2019).\u003c/li\u003e\n\u003cli\u003eBenjamini, Y. \u0026amp; Yekutieli, D. The control of the false discovery rate in multiple testing under dependency. \u003cem\u003eAnn\u003c/em\u003e. \u003cem\u003eStat\u003c/em\u003e.\u003cem\u003e \u003c/em\u003e\u003cstrong\u003e29\u003c/strong\u003e, 1165\u0026ndash;1188 (2001).\u003c/li\u003e\n\u003cli\u003eHu, J. X., Zhao, H. \u0026amp; Zhou, H. H. False discovery rate control with groups. \u003cem\u003eJ. Am. Stat. Assoc.\u003c/em\u003e \u003cstrong\u003e105\u003c/strong\u003e, 1215\u0026ndash;1227 (2010).\u003c/li\u003e\n\u003cli\u003eLi, A. \u0026amp; Barber, R. F. Multiple testing with the structure-adaptive Benjamini\u0026ndash;Hochberg algorithm. \u003cem\u003eJ. R. Stat. Soc. Methodol.\u003c/em\u003e\u003cem\u003e \u003c/em\u003e\u003cstrong\u003e81\u003c/strong\u003e, 45\u0026ndash;74 (2019).\u003c/li\u003e\n\u003cli\u003eGenovese, C. R., Roeder, K. \u0026amp; Wasserman, L. False discovery control with p-value weighting. \u003cem\u003eBiometrika \u003c/em\u003e\u003cstrong\u003e93\u003c/strong\u003e, 509\u0026ndash;524 (2006).\u003c/li\u003e\n\u003cli\u003eKoutna, I., et al. New insights into\u0026acute; gene positional clustering and its properties supported by large-scale analysis of various differentiation pathways. \u003cem\u003eGenomics \u003c/em\u003e\u003cstrong\u003e89\u003c/strong\u003e, 81\u0026ndash;88 (2007).\u003c/li\u003e\n\u003cli\u003eZhang, B., et al. Functional DNA methylation differences between tissues, cell types, and across individuals discovered using the M\u0026amp;M algorithm. \u003cem\u003eGenome Res. \u003c/em\u003e\u003cstrong\u003e23\u003c/strong\u003e 1522\u0026ndash;1540 (2013).\u003c/li\u003e\n\u003cli\u003eLiu, H., et al. Whole-transcriptome analysis of differentially expressed genes in the vegetative buds, floral buds and buds of chrysanthemum morifolium. \u003cem\u003ePLoS One \u003c/em\u003e\u003cstrong\u003e10\u003c/strong\u003e, e0128009 (2015).\u003c/li\u003e\n\u003cli\u003eEfron, B. Simultaneous inference: When should hypothesis testing problems be combined? \u003cem\u003eAnn. Appl. Stat.\u003c/em\u003e\u003cstrong\u003e 2\u003c/strong\u003e, 197\u0026ndash;223 (2008).\u003c/li\u003e\n\u003cli\u003eBenjamini, Y. \u0026amp; Heller, R. False discovery rates for spatial signals. \u003cem\u003eJ. Am. Stat. Assoc.\u003c/em\u003e \u003cstrong\u003e102\u003c/strong\u003e, 1272\u0026ndash;1281 (2007).\u003c/li\u003e\n\u003cli\u003eDe Sarkar, N., et al. A quest for mirna bio-marker: a track back approach from gingivo buccal cancer to two different types of precancers. \u003cem\u003ePloS One \u003c/em\u003e\u003cstrong\u003e9\u003c/strong\u003e, e104839 (2014).\u003c/li\u003e\n\u003cli\u003eSingh, R., et al. Analysis of the whole transcriptome from gingivo-buccal squamous cell carcinoma reveals deregulated immune landscape and suggests targets for immunotherapy. \u003cem\u003ePloS One \u003c/em\u003e\u003cstrong\u003e12\u003c/strong\u003e, e0183606 (2017).\u003c/li\u003e\n\u003cli\u003eHughes, R. A., Heron, J., Sterne, J. A. C. \u0026amp; Tilling, K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. \u003cem\u003eInt. J. Epidemiol.\u003c/em\u003e \u003cstrong\u003e48\u003c/strong\u003e, 1294\u0026ndash;1304 (2019).\u003c/li\u003e\n\u003cli\u003eZhang, Z. Missing data imputation: focusing on single imputation. \u003cem\u003eAnn. Transl. Med.\u003c/em\u003e \u003cstrong\u003e4 \u003c/strong\u003e(2016).\u003c/li\u003e\n\u003cli\u003eSarkar, S. K. \u0026amp; Zhao, Z. Local false discovery rate based methods for multiple testing of one-way classified hypotheses. \u003cem\u003eElectron. J. Stat.\u003c/em\u003e \u003cstrong\u003e16\u003c/strong\u003e, 6043\u0026ndash;6085 (2022).\u003c/li\u003e\n\u003cli\u003eChu, Y., et al. mir-1247-5p functions as a tumor suppressor in human hepatocellular carcinoma by targeting wnt3. \u003cem\u003eOncol. Rep.\u003c/em\u003e \u003cstrong\u003e38\u003c/strong\u003e, 343\u0026ndash;351 (2017).\u003c/li\u003e\n\u003cli\u003eNakagawa, S., Johnson, P. C. \u0026amp; Schielzeth, H. The coefficient of determination R\u003csup\u003e2\u003c/sup\u003e and\u003cbr\u003e intra-class correlation coefficient from generalized linear mixed-effects models\u003cbr\u003erevisited and expanded. \u003cem\u003eJ. R. Soc. Interface\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 20170213 (2017).\u003c/li\u003e\n\u003cli\u003eSearle, S. R., Casella, G. \u0026amp; McCulloch, C. E. \u003cem\u003eVariance Components\u003c/em\u003e. (John Wiley \u0026amp; Sons, 2009)\u003cem\u003e.\u003c/em\u003e\u003c/li\u003e\n\u003cli\u003eMontgomery, D. C., Peck, E. A. \u0026amp; Vining, G. G. \u003cem\u003eIntroduction to Linear Regression Analysis.\u003c/em\u003e \u003cbr\u003e (John Wiley \u0026amp; Sons, 2021).\u003c/li\u003e\n\u003cli\u003eTroiano, G., et al. Circulating miRNAs from blood, plasma or serum as promising clinical biomarkers in oral squamous cell carcinoma: A systematic review of current findings. \u003cem\u003eOral Oncology \u003c/em\u003e63, 30\u0026ndash;37 (2016).26. Seti\u0026eacute;n-Olarra, A., et al. Genomewide miRNA profiling of oral lichenoid disorders and oral squamous cell carcinoma. \u003cem\u003eOral Dis.\u003c/em\u003e \u003cstrong\u003e22\u003c/strong\u003e, 754\u0026ndash;760 (2016).\u003c/li\u003e\n\u003cli\u003eChamorro Petronacci, C. M., et al. miRNAs expression of oral squamous cell carcinoma patients: Validation of two putative biomarkers. \u003cem\u003eMedicine \u003c/em\u003e\u003cstrong\u003e98 \u003c/strong\u003e(2019).\u003c/li\u003e\n\u003cli\u003eBuscaglia, L. E. B. \u0026amp; Li, Y. Apoptosis and the target genes of microrna-21. \u003cem\u003eChin. J. Cancer\u003c/em\u003e \u003cstrong\u003e30\u003c/strong\u003e, 371\u0026ndash;380 (2011).\u003c/li\u003e\n\u003cli\u003eDioguardi, M., et al. Microrna-21 expression as a prognostic biomarker in oral cancer: Systematic review and meta-analysis. \u003cem\u003eInt. J. Environ. Res. Public Health \u003c/em\u003e\u003cstrong\u003e19\u003c/strong\u003e, 3396 (2022).\u003c/li\u003e\n\u003cli\u003eTroiano, G., et al. Predictive prognostic value of tissue-based microRNA expression in oral squamous cell carcinoma: a systematic review and meta-analysis. \u003cem\u003eJ. Dent. Res\u003c/em\u003e. \u003cstrong\u003e97\u003c/strong\u003e, 759\u0026ndash;766 (2018).\u003c/li\u003e\n\u003cli\u003eMcQueen, C. \u003cem\u003eComprehensive Toxicology \u003c/em\u003e(Elsevier, 2017).\u003c/li\u003e\n\u003cli\u003eHanahan, D. \u0026amp; Weinberg, R. A. Hallmarks of cancer: the next generation. \u003cem\u003eCell \u003c/em\u003e\u003cstrong\u003e144\u003c/strong\u003e, 646\u0026ndash;674 (2011).\u003c/li\u003e\n\u003cli\u003eJenike, A. E. \u0026amp; Halushka, M. K. mir-21: a non-specific biomarker of all maladies. \u003cem\u003eBiomarker Res.\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 1\u0026ndash;7. (2021).\u003c/li\u003e\n\u003cli\u003eOkada, R., et al. Regulation of oncogenic targets by mir-99a-3p (passenger strand of mir-99a-duplex) in head and neck squamous cell carcinoma. \u003cem\u003eCells \u003c/em\u003e\u003cstrong\u003e8\u003c/strong\u003e, 1535 (2019).\u003c/li\u003e\n\u003cli\u003eOsako, Y., et al. Potential tumor-suppressive role of microrna-99a-3p in sunitinib-resistant renal cell carcinoma cells through the regulation of rrm2. \u003cem\u003eInt. J. Oncol.\u003c/em\u003e \u003cstrong\u003e54\u003c/strong\u003e, 1759\u0026ndash;1770 (2019).\u003c/li\u003e\n\u003cli\u003eKhan, P., et al. Microrna-1: Diverse role of a small player in multiple cancers. \u003cem\u003eSemin. Cell Dev. Biol.\u003c/em\u003e \u003cstrong\u003e124\u003c/strong\u003e, 114\u0026ndash;126 (2022).\u003c/li\u003e\n\u003cli\u003eYang, H., et al. Mir-486-3p inhibits the proliferation, migration and invasion of retinoblastoma cells by targeting ecm1. \u003cem\u003eBiosci. Rep\u003c/em\u003e. \u003cstrong\u003e40\u003c/strong\u003e (2020).\u003c/li\u003e\n\u003cli\u003eSafa, A., et al. mir-1: A comprehensive review of its role in normal development and diverse disorders. \u003cem\u003eBiomed. Pharmacother\u003c/em\u003e. \u003cstrong\u003e132\u003c/strong\u003e, 110903 (2020).\u003c/li\u003e\n\u003cli\u003ePratap, J., et al. Regulatory roles of runx2 in metastatic tumor and cancer cell interactions with bone. \u003cem\u003eCancer Metastasis Rev. \u003c/em\u003e\u003cstrong\u003e25\u003c/strong\u003e, 589\u0026ndash;600 (2006).\u003c/li\u003e\n\u003cli\u003eWysokinski, D., Blasiak, J. \u0026amp; Pawlowska, E. Role of runx2 in breast carcinogenesis. \u003cem\u003eInt. J. Mol. Sci. \u003c/em\u003e\u003cstrong\u003e16\u003c/strong\u003e, 20969\u0026ndash;20993 (2015).\u003c/li\u003e\n\u003cli\u003eLi, B., et al. Epigenetic regulation of CXCL12 plays a critical role in mediating tumor progression and the immune response in osteosarcomaos fate determined by epigenetic regulation of cxcl12. \u003cem\u003eCancer Res.\u003c/em\u003e \u003cstrong\u003e78\u003c/strong\u003e, 3938\u0026ndash;3953 (2018).\u003c/li\u003e\n\u003cli\u003eQian, Y., Li, J. \u0026amp; Xia, S. Znf281 promotes growth and invasion of pancreatic cancer cells by activating wnt/\u0026beta;-catenin signaling. \u003cem\u003eDig. Dis. Sci.\u003c/em\u003e \u003cstrong\u003e62\u003c/strong\u003e, 2011\u0026ndash;2020 (2017).\u003c/li\u003e\n\u003cli\u003eElKhouly, A. M., Youness, R. \u0026amp; Gad, M. Microrna-486-5p and microrna-486-3p: Multifaceted pleiotropic mediators in oncological and non-oncological conditions. \u003cem\u003eNon-coding RNA Res. \u003c/em\u003e\u003cstrong\u003e5\u003c/strong\u003e, 11\u0026ndash;21 (2020).\u003c/li\u003e\n\u003cli\u003eJiao, X., et al. Kiaa1199, a target of micorna-486-5p, promotes papillary thyroid cancer invasion by influencing epithelial-mesenchymal transition (emt). \u003cem\u003eMed. Sci. Monit. Basic Res. \u003c/em\u003e\u003cstrong\u003e25\u003c/strong\u003e, 6788\u0026ndash;6796 (2019).\u003c/li\u003e\n\u003cli\u003eHaynes, W. \u003cem\u003eBenjamini\u0026ndash;Hochberg Method \u003c/em\u003e(Springer New York, 2013). \u003c/li\u003e\n\u003cli\u003eL\u0026auml;hnemann, D., et al. Eleven grand challenges in single-cell data science. \u003cem\u003eGenome Biol. \u003c/em\u003e\u003cstrong\u003e21\u003c/strong\u003e, 1\u0026ndash;35 (2020).\u003c/li\u003e\n\u003cli\u003eGao, L. L., Bien, J. \u0026amp; Witten, D. Selective inference for hierarchical clustering. \u003cem\u003earXiv Preprint arXiv:2012.02936 \u003c/em\u003e(2020).\u003c/li\u003e\n\u003cli\u003eFrancois, O., Martins, H., Caye, K., and Schoville, S. D. (2016). Controlling false discoveries in genome scans for selection. \u003cem\u003eMol Ecol. \u003c/em\u003e\u003cstrong\u003e25\u003c/strong\u003e, 454\u0026ndash;469\u003c/li\u003e\n\u003cli\u003eAggarwal, R. R., et al. Whole-genome and transcriptional analysis of treatment-emergent small-cell neuroendocrine prostate cancer demonstrates intraclass heterogeneity. \u003cem\u003eMol. Cancer Res\u003c/em\u003e. \u003cstrong\u003e17\u003c/strong\u003e, 1235\u0026ndash;1240 (2019).\u003c/li\u003e\n\u003cli\u003eClermont, P. L., Ci, X., Pandha, H., Wang, Y. \u0026amp; Crea, F. Treatment-emergent neuroendocrine prostate cancer: molecularly driven clinical guidelines. \u003cem\u003eInt. J. Endocrinol.\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, IJE20. (2019).\u003c/li\u003e\n\u003cli\u003eWang B, et al. Systematic evaluation of three microRNA profiling platforms: Microarray, beads array, and quantitative real-time PCR array. \u003cem\u003ePLoS One\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e e17167 (2011).\u003c/li\u003e\n\u003cli\u003eSAS Institute Inc. \u003cem\u003eSAS/STAT\u0026reg; 9.2 User\u0026rsquo;s Guide.\u003c/em\u003e (SAS Institute Inc., 2008).\u003c/li\u003e\n\u003cli\u003eLeone, V., et al. MiR-1 Is a tumor suppressor in thyroid carcinogenesis targeting CCND2, CXCR4, and SDF-1\u0026OElig;\u0026plusmn;. \u003cem\u003eJ. Clin. Endocrinol. Metab.\u003c/em\u003e \u003cstrong\u003e96\u003c/strong\u003e, E1388\u0026ndash;E1398 (2011). doi:10.1210/jc.2011-0345\u003c/li\u003e\n\u003cli\u003eNohata, N., et al. mir-1 as a tumor suppressive microRNA targeting tagln2 in head and neck squamous cell carcinoma. \u003cem\u003eOncotarget \u003c/em\u003e\u003cstrong\u003e2\u003c/strong\u003e, 29 (2011). \u003c/li\u003e\n\u003cli\u003eGenovese, C. R., Lazar, N. A. \u0026amp; Nichols, T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. \u003cem\u003eNeuroimage \u003c/em\u003e\u003cstrong\u003e15\u003c/strong\u003e, 870\u0026ndash;878 (2002).\u003c/li\u003e\n\u003cli\u003eBenjamini, Y., Krieger, A. M. \u0026amp; Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate. \u003cem\u003eBiometrika \u003c/em\u003e\u003cstrong\u003e93\u003c/strong\u003e, 491\u0026ndash;507 (2006).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-3861673/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3861673/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eModern bioinformatics studies often involve numerous simultaneous statistical tests, increasing the risk of false discoveries. To control the false discovery rate (FDR), these studies typically employ a statistical method called the Benjamini\u0026ndash;Hochberg (BH) method. Often, the BH approach tends to be overly conservative and overlooks valuable biological insights associated with data structures, particularly those of groups. Group structures can manifest when closely located genomic coordinates are functionally active and closely related because of co-regulation. Recent statistical advancements have led to the development of updated BH methods tailored for datasets featuring pre-existing group structures. These methods can improve the statistical power and potentially enhance scientific discoveries. In this study, we elucidated the advantages of contemporary group-aware BH methods using a previously published microRNA (miRNA) dataset. For this dataset, group-aware BH methods identified a larger set of miRNAs with significantly deregulated expression (p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) than the traditional BH method. These new findings are supported by existing literature on miRNAs and a related 2017 study. Our results underscore the potential of specialized BH methods for controlling the FDR in high throughput omics studies with pre-defined group structures.\u003c/p\u003e","manuscriptTitle":"False discovery rate control: Moving beyond the Benjamini–Hochberg method","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-01-17 05:18:17","doi":"10.21203/rs.3.rs-3861673/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c9922c65-98f8-4340-b26a-082961103577","owner":[],"postedDate":"January 17th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":28148869,"name":"Physical sciences/Mathematics and computing/Computational science"},{"id":28148870,"name":"Physical sciences/Mathematics and computing/Scientific data"},{"id":28148871,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":28148872,"name":"Biological sciences/Computational biology and bioinformatics/Statistical methods"}],"tags":[],"updatedAt":"2024-06-02T20:08:30+00:00","versionOfRecord":[],"versionCreatedAt":"2024-01-17 05:18:17","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3861673","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3861673","identity":"rs-3861673","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.