GWAS and GS analysis revealed the selection and prediction efficiency for yield, plant morphological, and fiber quality in Gossypium barbadense

preprint OA: closed
Full text JSON View at publisher
Full text 208,480 characters · extracted from preprint-html · click to expand
GWAS and GS analysis revealed the selection and prediction efficiency for yield, plant morphological, and fiber quality in Gossypium barbadense | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article GWAS and GS analysis revealed the selection and prediction efficiency for yield, plant morphological, and fiber quality in Gossypium barbadense Tao Yang, Honggang Wang, Jikun Song, Kang Zhao, Bo Pang, Yongpan Wang, and 9 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5667934/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 09 Jun, 2025 Read the published version in Theoretical and Applied Genetics → Version 1 posted 5 You are reading this latest preprint version Abstract Sea Island cotton (Gossypium barbadense), a premier tetraploid cotton species, is globally renowned for its fibers, which exhibit thermal expansion and contraction properties similar to those of animal fibers such as cashmere. Despite its significance, there remains a limited understanding of how genes influence primary traits across germplasms and the relationship between predictive factors identified through genomic selection (GS) technology and heritability. This study aimed to address this academic gap. A total of 203 Sea Island cotton accessions were incorporated for resequencing. Population evolution analysis revealed three distinct groups, which were largely shaped by geographical distribution and breeding objectives. Then, Genome-Wide Association Study (GWAS) was performed on 15 traits related to yield, fiber quality, and plant morphological, identifying a greater number of loci associated with fiber quality traits that exhibited higher broad sense heritability. Transcriptomic and gene expression analysis identified six key genes involved in regulating fiber length (GB_A05G1764 and GB_A05G1761), fiber micronaire (GB_A05G1895 and GB_A05G1771), and fiber elongation (GB_A05G1702 and GB_A05G1707). Furthermore, geographical and temporal analyses indicated that these traits underwent directional selection in Sea Island cotton. In addition, this study explored the effects of marker density and population size on prediction accuracy using GS technology, finding that traits with higher broad sense heritability, such as fiber quality, achieved higher prediction accuracy, while those with lower broad sense heritability, such as plant morphological, showed reduced accuracy. This study provides an important reference for future GS breeding, in addition to deepening the scientific understanding of the genetic evolution of cotton Gossypium barbadense GWAS yield plant morphological fiber quality GS Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Key message Genetic variation in a Gossypium barbadense population was revealed using resquencing. GWAS and RNA-seq on Gossypium barbadense population identified several candidate genes associated with fiber length, micronaire and elongation. Introduction Cotton ( Gossypium L. ) is one of the world's premier sources of high-quality plant fibers. Its exceptional adaptability and thermal insulation properties make it an ideal choice for textile production (Jiang et al. 2021 ; Li et al. 2019 ; Wang et al. 2021 ). Two primary cultivated tetraploid species ( G. hirsutum , AD 1 and G. barbadense , AD 2 ) are extensively cultivated in tropical and temperate regions. Key producing countries include India, China, the United States, Pakistan, and Brazil (Su et al. 2020 ; Zafar et al. 2021 ; Zaidi et al. 2020 ). Upland cotton ( G. hirsutum ) dominates global cotton production due to its broad environmental adaptability, accounting for 90% of output. In contrast, Sea Island cotton ( G. barbadense ) represents only about 2% of total production because of its distinct regionalism (Sun et al. 2017 ; Wang et al. 2021 ). Sea island cotton, characterized by fine, long, and strong fibers, possesses unique thermal properties akin to those of animal fibers like cashmere, earning it the title "the gem of fibers." Consequently, there is a strong demand to develop high-yield, high-quality Sea Island cotton varieties, driving significant interest in the genetic study of fiber quality and yield-related traits. Meanwhile, the complexity of Sea Island cotton's genetic traits presents enormous challenges. Currently, marker-assisted selection (MAS) and genomic selection (GS) are pivotal techniques for incorporating desirable traits. Notably, even when prediction accuracy is low, GS has proven superior to MAS (Cerrudo et al. 2018 ; Guo et al. 2012). The two tetraploid cotton species ( G. hirsutum and G. barbadense, AD ) arose from the hybridization of the A and D genomes approximately 1–2 million years ago, followed by independent domestication across various regions (Hu et al. 2019 ; Paterson et al. 2012 ; Wang et al. 2017 ). To date, fifteen genome assemblies have been completed for nine Upland cotton varieties (TM-1, ZM24, NDM8, JBM, Zhongzhimian No. 2, B371, YZ1, Yuanmian11, and CSX8308) (Chen et al. 2020 ; Hu et al. 2019 ; Li et al. 2015 ; Ma et al. 2021 ; Sreedasyam et al. 2024 ; Wang et al. 2019 ; Zhang et al. 2015 ). In contrast, only three varieties of Sea Island cotton have undergone five genome assemblies. The genome assembly of Sea Island cotton (line 3–79) was completed for the first time in 2015 (Yuan et al. 2015 ). Since then, it has been continually updated (Chen et al. 2020 ; Wang et al. 2019 ). Subsequent assemblies of the Hai7124 genome in 2019 (Hu et al. 2019 ) and the Pima90 genome in 2021 (Ma et al. 2021 ) further advanced Sea Island cotton genomics. Research on cotton genomics has rapidly expanded in recent years, including sequencing efforts for nearly 10,000 accessions (Cheng et al. 2024 ; Geng et al. 2020 ; He et al. 2020 ; Huang et al. 2017 ; Li et al. 2020 ; Li et al. 2014 ; Li et al. 2023 ; Li et al. 2020 ; Nie et al. 2020 ; Wang et al. 2023 ). Conversely, Sea Island cotton remains in an early exploratory phase. Its narrow genetic variation and limited sample size have impeded the identification of specific genomic variations (Fang et al. 2021 ; Song et al. 2024 ; Yu et al. 2021). Genome selection (GS) has demonstrated significant potential in reducing breeding costs and enhancing breeding efficiency (Ez et al. 2023 ; Fu et al. 2022 ; Mir et al. 2023 ). It was initially proposed by Meuwissen et al. in 2001 (Meuwissen et al. 2001 ). CIMMYT first implemented GS in maize to develop technical models, evaluate factors influencing prediction accuracy, and establish GS protocols. It was found that GS models incorporating gene-environment interactions substantially improve prediction efficiency for complex traits. High-density markers further enhance the accuracy of phenotypic prediction (Zhang et al. 2015 ). Moreover, the optimal GS model depends on the genetic architecture of the target traits (Montesinos-Lopez et al. 2024 ; Velez-Torres et al. 2018 ), and incorporating multi-year data can increase prediction accuracy. To date, GS breeding has been successfully applied in crops such as soybean (Canella et al. 2022 ), wheat (Rabieyan et al. 2022 ), maize (Technow et al. 2013 ), and canola (Werner et al. 2018 ), along with the establishment of several intelligent breeding platforms (Li et al. 2024 ; Xu et al. 2022 ). However, the scarcity of GS in cotton research remarkably impedes improving cotton breeding efficiency. Despite the exceptional fiber quality of Sea Island cotton, which makes it a highly prized resource, its relatively low yield significantly hinders widespread commercialization. Consequently, genomic research on yield, fiber quality, and plant morphological in Sea Island cotton is rare. Fan (Fan et al. 2018 ), utilizing GBS-SNP technology, constructed the first intra-species linkage map for Sea Island cotton (5917 × Pima S-7), spanning 3,076.23 cM with an average marker density of 1.09 cM. This study identified 24 quantitative trait loci (QTLs) related to fiber quality and 18 QTLs associated with yield using 143 recombinant inbred lines (RILs). Su (Su et al. 2020 ) employed the CottonSNP80K array and identified the gene GB_A03G0335 (encoding E3 ubiquitin-protein ligase) as being linked to FL, fiber strength (FS), fiber uniformity (FU), and FE. Yu (Yu et al. 2021) through GWAS of 240 varieties, discovered three genes associated with FS: GB_D11G3437 (encoding casein kinase 1-like protein HD16, involved in regulating flowering time via gibberellin signaling), GB_D11G3460 (encoding a WVD2/WDL family microtubule-associated protein that modulates cortical microtubule orientation), and GB_D11G3471 (encoding tubulin alpha-1 chain (TUBA1), a core component of cytoskeletal microtubules). Additionally, two genes related to lint percentage (LP) were identified: GB_A07G1034 (HERK1, a receptor kinase essential for BR-regulated cell elongation) and GB_A13G0822 (GbTCP, which regulates fiber and root hair development through jasmonic acid biosynthesis and other pathways). Zhao (Zhao et al. 2022 ), through GWAS analysis of 336 varieties, identified five genes associated with four traits (Fusarium wilt resistance, FL, FS, and LP), including Gbar_A05G017500 (encoding a PUB4 ubiquitin ligase), Gbar_D11G032670 (encoding HD16 protein), Gbar_A05G014160 (encoding a RING-type zinc finger E3 ubiquitin ligase from the RBR family), Gbar_D03G001430 (encoding a putative ZHD6 protein), and Gbar_D03G001910 (encoding a predicted WAKL14 receptor kinase). Song (Song et al. 2024 ), carried out GWAS on 269 varieties and identified the gene GB_D03G0092 . A comparison between GB_D03G0092 H and GB_D03G0092 B revealed that frameshift mutation caused by 1-bp deletion significantly enhanced fiber quality in Sea Island cotton. This study leveraged GWAS to investigate the genetic associations between phenotypes and genotypes in Sea Island cotton. A comprehensive phenotypic assessment was conducted for 203 Sea Island cotton accessions across 15 traits over five years and four locations. GWAS was performed based on phenotypic data with resequencing results involving yield, fiber quality, and plant morphological. Moreover, RNA sequencing identified six candidate genes. For the first time, GS was applied to evaluate the impact of training population size and marker density on prediction accuracy. These findings offer a valuable reference for the genetic improvement of Sea Island cotton and the efficient breeding of high-quality cotton varieties. Materials and Methods Plant Material The study incorporated 203 Sea Island cotton ( Gossypium barbadense ) accessions sourced from the germplasm repository at the College of Agriculture, Xinjiang Agricultural University (Cotton Breeding Center, Ministry of Education). These accessions represent cotton-producing regions worldwide, including China (Xinjiang, the Yangtze River Basin, the Yellow River Basin, and the Pearl River Basin), the United States, the former Soviet Union, Albania, and Egypt, spanning Asia, Europe, Africa, and North America (Table S1). Prior to experimentation, all accessions had undergone multiple generations of self-pollination and exhibited normal growth and development under natural field conditions. Field Trial and Phenotypic Evaluation Field trials were conducted over five years at four locations in Xinjiang (totaling nine environments) to evaluate 15 phenotypic traits (Table S5 Table S6 and Table S7). The four trial sites included southern Xinjiang (S) (Korla and Aral) and northern Xinjiang (N) (Shihezi and Changji) (Table S14). A completely randomized design was employed, with two replicates per accession. Each accession was planted in two rows, with a row length of 2.50 meters, a row spacing of 0.66 meters, and a plant spacing of 0.10 meters. Sowing occurred in mid-April, and harvesting was completed by late October in southern Xinjiang and early to mid-October in northern Xinjiang. At the time of harvest, phenotypic data were collected for the following traits: plant height (PH), fruit-branch number per plant (FBN), effective fruit-branch number per plant (EFBN), boll number per plant (BN), effective boll number per plant (EBN), number of boll drops (NBD), height of first fruit-branch node (HFFBN) and first fruit-branch nodes (FFBN). Twenty naturally opened bolls were harvested for variety testing and flowering evaluation to assess boll weight (BW) and lint percentage (LP). Fiber quality was evaluated at the China Colored Cotton Company in Urumqi, utilizing the HFT9000 instrument to measure fiber length (FL), fiber strength (FS), fiber micronaire (MIC), fiber uniformity (FU), and fiber elongation (FE). Data collected over five years from nine distinct environments were statistically analyzed using Excel 2020. The broad sense heritability and the best linear unbiased prediction (BLUP) of breeding value were obtained using R 4.0 (the Matrix and lme4 packages). The BLUP values were categorized into three types: southern Xinjiang BLUP values (S-traits), northern Xinjiang BLUP values (N-traits), and overall environmental BLUP values (Traits). DNA Extraction and Sequencing Seeds were cultivated indoors until the trefoil stage. Then, fresh leaf samples were collected and immediately snap-frozen in liquid nitrogen. Genomic DNA was extracted using a plant DNA extraction kit, verifying the purity and integrity (Song et al. 2024 ). The qualified genomic DNA samples were dispatched to Biomarker Technologies in Beijing for library construction and sequenced on an Illumina HiSeq PE 150 platform. The sequenced reads underwent quality control and filtering for subsequent analysis. Reads containing adapters and exhibiting low quality (single-end reads with more than 10% N bases or having more than 50% of bases with a quality score (Q) below 5) were eliminated (Yu et al. 2021). Sequence Alignment An index file for the reference genome "Hai 7124" was constructed. The qualified sequencing data were aligned to the reference using BWA v2.2.2 (Li and Durbin 2010 ). The Haplotype Caller functionality of SAMtools v1.9, mosdepth v0.3.1, and GATK v3.8 was employed for variant detection and statistical analysis, including metrics such as sample alignment rate, sequencing depth, and genome coverage (Li et al. 2009 ; McKenna et al. 2010 ; Pedersen and Quinlan 2018 ). SNP filtering was conducted based on the following criteria: QD < 5.0, MQ 60.0, QUAL < 30.0, MQrankSum < -12.5, and ReadPosRankSum < -8.0. Other parameters defaulted. For InDels, the filtering criteria included QD < 2.0, MQ 100.0, MQrankSum < -10.0, ReadPosRankSum < -10.0, and QUAL < 30.0 (Yu et al. 2021). VCFtools v0.1.13 ( https://vcftools.github.io/examples.html ) was utilized to apply filters for minor allele frequency (MAF) ≥ 0.05 and missing data ≤ 20%, excluding low-quality SNPs from further graphing. Genotype imputation was performed using Beagle, contributing to downstream analysis. Variant Annotation The annotation information from the reference genome "Hai7124" was adopted to annotate SNPs based on their physical positions utilizing SnpEff v3.6c ( https://pcingola.github.io/SnpEff/ ) (Ai et al. 2022 ; Liggett et al. 2022 ). The SNPs were categorized into intergenic, upstream, downstream, exonic, and intronic regions. The exonic SNPs were further divided into star lost, stop gain, stop lost, synonymous stop, synonymous SNPs, and nonsynonymous SNPs. Population Structure and Principal Component Analysis PLINK v1.90 ( http://www.cog-genomics.org/plink2 ) was utilized to format the data and select effective SNPs, employing the following parameters: indep-pairwise with a window size of 50, step size of 50, and r² threshold of 0.2. Admixture v1.30 ( http://software.genetics.ucla.edu/admixture/ ) was employed for population structure analysis, with a convergence threshold (C) of 0.01 and five-fold cross-validation (Song et al. 2024 ). Principal component analysis (PCA) was performed using GCTA 1.92.4, and the significance was calculated using the twstats function in EIG 6.1.4 (Yu et al. 2021). Linkage Disequilibrium Analysis Linkage disequilibrium (LD) analysis for the Sea Island cotton population and its subgroups was performed using PLINK v1.90 (r² values were used to assess LD). The parameters were set as follows: --ld-window 999999 -ld-window-kb 2000 -ld-window-r² 0. Statistical analysis of the results was performed using a Perl script. Phylogenetic Tree Construction and Genetic Diversity Analysis Based on 2,717,759 high-quality SNP markers covering the entire genome of Sea Island cotton, a neighbor-joining (NJ) phylogenetic tree was constructed using VCF2Dis v1.50 ( https://github.com/BGI-shenzhen/VCF2Dis ) (Zhang et al. 2024 ). Nucleotide polymorphism (θπ) was computed using VCFtools 0.1.13 with a window parameter of 100 Kb, evaluating the genetic diversity of the Sea Island cotton population (Yu et al. 2021). GWAS In order to mitigate the environmental impact on association analysis, BLUP values were calculated, resulting in three datasets: S-BLUP, N-BLUP, and BLUP, which were subsequently used for GWAS. GEMMA 0.98.5 ( http://www.xzlab.org/software.html ) was employed to calculate the standard kinship matrix with the following parameters: -bfile gene -gk 2 -o kin. Significant principal components were incorporated as covariates for GWAS, rectifying the influence of population structure on the results. The parameters included -bfile gene -lmm 1 -n Traits -c PC -k kin -o (where "gene" denotes the genotype file, "Traits" represents the column numbers of the traits, "PC" refers to the principal component covariate file, and "kin" indicates the kinship matrix). Transcriptome Data Analysis Transcriptome data for various tissues, ovules, and fiber developmental stages of the Sea Island cotton 'Hai7124' (reference genome) were acquired, in addition to transcriptome data for 'XH58' (FL_Long) and 'Ashi' (FL_Short) at different fiber developmental stages (accession numbers: PRJNA490626 and GSE184965). The data were converted to the fastq format, followed by quality control, filtering, and alignment to calculate FPKM values (Pertea et al. 2016 ; Song et al. 2024 ). Expression Analysis Expression analysis was performed at various stages of boll growth (0, 5, 10, 15, 20, 25, and 30 days post-anthesis (DPA) using differential materials harvested from the bolls. For FL, So717 (long fiber length) and Ashi (short fiber length) were selected. For MIC, Ashi (high fiber micronaire) and Tu79-713 (low fiber micronaire) were chosen. For FE, 65-3049-6 (high fiber elongation) and XH25 (low fiber elongation) were utilized. Total RNA was extracted, followed by reverse transcription quantitative PCR (qRT-PCR) analysis. GbUBQ7 served as the internal reference gene. Each sample had three biological replicates and technical replicates (Table S15). Key Gene Selection The association analysis results underwent Bonferroni correction, with a significance threshold of -log( P ) > 5.4. Key genes were selected based on their consistent identification across at least two environments, accompanied by prominent peaks. Gene annotation for selected regions was conducted using ANNOVA v1.0.0 (Wang et al. 2010 ), filtering nonsynonymous SNPs (resulting in amino acid changes). Finally, critical genes of interest were identified by integrating expression profiling, transcriptome data, phenotypic traits, and quantitative PCR analysis. GS PLINK v1.90 was utilized to filter out redundant SNPs, yielding 268,995 effective markers. R 4.3.1 with the rrBLUP package was employed for GS (Endelman 2011 ). The five-fold cross-validation approach was implemented to evaluate the influence of training population size and marker quantity on the prediction accuracy of GS. Eighty percent of the samples were randomly designated as the training population, and the remaining twenty percent served as the prediction population. This process was repeated 100 times to enhance the likelihood of incorporating all samples, providing a comprehensive assessment of various factors affecting prediction accuracy. Moreover, the effects of different training population ratios on GS prediction accuracy were investigated. The training population was randomly selected, ranging from 10–90% (in 10% increments), and the remaining varieties served as the prediction population, repeating 100 times. Additionally, the impact of varying marker counts on prediction accuracy was examined. The marker count was set at 10, 50, 100, 500, 1,000, 5,000, 10,000, and 50,000. Similarly, this procedure was randomly repeated 100 times. Results Genomic Variability and Population Structure of Sea Island Cotton A total of 5.4 Tb of sequencing data was derived, achieving a Q30 score of 93.31%. The average alignment rate of the Sea Island cotton population to the reference genome (Table S1) was 97.33%, with an average coverage depth of 11.02× and a coverage ratio of 97.96% (indicating at least one base was covered) (Table S1). This study identified 2,718,759 high-quality SNPs (Table S2 and Table 1 , Fig. S1) and 353,901 high-quality InDels (Table S3, Fig. S2), which were unevenly distributed across the 26 chromosomes of Sea Island cotton. Specifically, 1,633,794 SNPs were located in the At sub-group (60.10%), approximately 1.5 times that of the Dt sub-group (1,084,965 SNPs, 39.90%). This is consistent with previous findings that the At sub-group is approximately twice as large as the Dt sub-group (Li et al. 2014 ; Zhao et al. 2022 ).The annotation of SNPs revealed that they were concentrated in intergenic regions, which comprised 71.11% of the entire genome. The intronic regions accounted for only 4.57%. The exonic regions contained 54,028 SNPs, constituting merely 1.99% of the total genome. Meanwhile, 33,793 nonsynonymous mutations were identified. Table 1 Summary of SNPs information of Sea Island cotton Region Category SNP Downstream 262,205 Upstream 342,830 Intergenic 1,933,286 Intronic 124,383 Splice 2,026 5’ UTR prime 1 Exonic Star lost 71 Exonic Stop gain 794 Exonic Stop lost 230 Exonic Synonymous 19,099 Exonic Nonsynonymous 33,793 Exonic Synonymous stop 41 Total 2,718,759 A neighbor-joining (NJ) phylogenetic tree was constructed (Fig. 1 a), along with population structure analysis (Fig. 1 e) and PCA (Fig. 1 b and Table S4) to elucidate the evolutionary relationships among the 203 Sea Island cotton varieties. The Sea Island cotton population was classified into three groups: G1 comprised 134 varieties with diverse origins. Modern Sea Island cotton varieties were dominant, including 37 from the former Soviet Union, 26 from Xinjiang, 45 from the Yangtze-Huanghe and Pearl River basins, six from the United States, 16 from Egypt, two from Albania, and two from unidentified sources. G2 consisted of 42 varieties derived from early materials in Xinjiang and the former Soviet Union. G3 had 27 varieties, mainly from early materials in the United States, Through the calculation of diversity index (θπ), the average θπ across all varieties was 3.96×10 − 4 , with G1 at 3.85×10 − 4 , G2 at 2.95×10 − 4 , and G3 at 2.53×10 − 4 , suggesting a high level of diversity in modern varieties (Fig. 1 c). The FST values calculated between subgroups were 0.110, 0.108, and 0.154, revealing a greater genetic differentiation between G2 and G3 (Fig. 1 c). LD analysis of the three subgroups indicated that the r² value declined to 0.5 at a distance of 442 kb for the overall population, which is significantly less than the distances reported by Song (Song et al. 2024 ) (2000 kb) and Yu (Yu et al. 2021) (1000 kb), but greater than the 388 kb observed by Zhao (Zhao et al. 2022 ) (half of the maximum value). The decay distances for G3 and G2 were greater, measuring 1271 kb and 922 kb, respectively. G1 and the overall population exhibited a consistent decay trend with a distance of 344 kb (Fig. 1 d). Phenotypic Analysis BLUP values for 15 traits were calculated separately for the northern and southern regions of Xinjiang. The northern region exhibited significantly higher variability than the southern area (Table S5 and Table S6). A T-test analysis conducted on traits from both regions revealed that PH, FBN, EFBN, BN, EBN, BW, HFFBN, and NBD were significantly lower in the southern region of Xinjiang ( P < 0.001). In contrast, LP, FFBN, FL, FS, and FU were evidently higher in the southern region ( P < 0.001). No significant differences were observed in MIC and FE (Fig. 2 , Table S5 and S6), indicating that Sea Island cotton fibers from the southern region are superior, characterized by higher seed cotton percentages and greater suitability for cultivation. To further elucidate the relationships among traits, breeding values (BLUP) were computed across nine environmental conditions. Notable positive correlations were found between LP and several traits, including PH, FBN, EFBN, BN, EBN, BW, NBD, HFFBN, FFBN, FL, FS, and FU. Conversely, LP exhibited a significant negative correlation with FE. Furthermore, MIC demonstrated positive correlations with FL, FS, and FU. FL was positively correlated with FS (Fig. S3). The assessment of broad sense heritability revealed that it was high in fiber quality traits and low in plant morphological (Table S7). Association Analysis GWAS was conducted on three types of data: S-BLUP, N-BLUP, and BLUP. The results were subjected to Bonferroni correction, with a significance threshold of -log 10 ( P ) > 5.4. At least two environments and distinct co-located peaks were set as selection criteria. A total of 26 significant SNPs were identified as being associated with yield, and 216 significant SNPs were linked to fiber quality. Notably, the At and Dt subgroups contained 192 and 50 significant signals, respectively (Table S8). An LD threshold of 500 kb (slightly larger than the 442 kb threshold) (Zhao et al. 2022 ) was used to screen selected regions. For PH, two selected regions were identified at (A02: 3,129,702-3,129,709) and (D09: 3,799,753-3,980,726). For FBN, two selected regions were identified at (D07: 19,238,177 − 20,094,267) and (D07: 25,323,417 − 28,205,761). The selected region for LP was located at (D05: 35,622,366 − 36,321,572). Significant selected regions for FL were identified at A05: 16,864,112 − 18,143,374 and A06: 4,479,110-6,077,420. For FS, significant selected regions were found at four locations: A02: 50,299,487 − 50,299,496, D01: 6,732,815-6,732,837, D05: 62,911,399 − 63,597,521, and D13: 50,210,965 − 50,211,014. The selected region for MIC was located at A05: 16,903,207 − 18,187,089, and that for FE was found at A05: 15,688,837 − 16,889,136 (Table S8). Notably, the selected regions for FL and MIC exhibited considerable overlap at the physical location on chromosome A05. Meanwhile, the selected region of FE at A05: 15,688,837 − 16,889,136 was partially overlapping with that of FL, approximately 14 kb away from the selected region of MIC (Table S8). Zhao (Zhao et al. 2022 ) identified an FL-associated gene ( Gbar_A05G017250/GB_A05G1753 ) situated within the FE region identified in this study. This selected region of FL overlapped with the traits of FL, FE, and MIC in this study. Furthermore, Su (Su et al. 2020 ) reported one quantitative trait nucleotide (QTN) each in the FL and FE regions (TM10754 and TM10723), which also exhibited partial overlap with the traits of FL, FE, and MIC identified in this research (Table S8). A total of 242 related genes were annotated within the candidate regions, among which 153 occurred nonsynonymous mutations and led to amino acid changes. Subsequent expression profiling, transcriptome analysis, and qRT-PCR refined six target genes among these 153 genes. Fiber length GB_A05G1764 is a homolog of the Arabidopsis gene AT4G05530 , which encodes a peroxisomal member of the short-chain dehydrogenase family (IBR1). IBR1 serves as a catalyst for the deoxidation process involved in the conversion of indole-3-butyric acid (IBA) to indole-3-acetic acid (IAA). Additionally, IAA derived from IBA facilitates the expansion of root hairs and cotyledon cells during the developmental stages of Arabidopsis seedlings (Spiess et al. 2014 ; Strader et al. 2010 ). Zhao (Zhao et al. 2021 ) disclosed that through IBR1 mediation, IBA-to-IAA conversion can promote root hair elongation in Arabidopsis . In the genomic region spanning 16.88–16.90 Mb on chromosome A05, two nonsynonymous SNPs were identified (Fig. 3 a, c). The first (A/T) results in a codon substitution of aspartic acid with lysine. The second (T/A) leads to the replacement of lysine with methionine, thereby influencing FL (Fig. 3 b, f). Most early varieties (G2 and G3) predominantly harbor the haplotype AT (long fiber length). In contrast, modern varieties (G3) exhibit a significant increase in the haplotype TA (short fiber length) (Fig. 3 d). This shift may be attributed to targeted selection in contemporary Sea Island cotton breeding, where linkage between traits altered haplotype proportion. Expression profiling revealed that GB_A05G1764 was expressed at low levels during the rapid elongation phase of ovule and fiber (Fig. 3 e). Transcriptomic analysis further demonstrated significant differential expression between short fiber (Ashi) and long fiber length (So717) varieties in this period. Short fiber length varieties exhibited higher expression levels (Fig. 3 g). Quantitative qRT-PCR assays conducted on fibers from long fiber length (So717) and short fiber length (Ashi) varieties at 0–30 days post-anthesis (DPA) corroborated the transcriptomic findings (Fig. 3 h). It can be inferred that GB_A05G1764 mediates the negative regulatory effect of IBR1 on FL. GB_A05G1761 encodes a carboxylesterase. Within the genomic region of 16.86–16.88 Mb on chromosome A05, one nonsynonymous SNP was identified (Fig. S4a, c). The A (long fiber length) to T (short fiber length) substitution results in a codon change from threonine to serine, which also influences FL (Fig. S4b, f). Most early varieties (G2 and G3) have a higher proportion of the haplotype (A), whereas modern varieties show a marked increase in the proportion of the haplotype (T) (Fig. S4d). Expression profiling showed that GB_A05G1761 was significantly upregulated during the phases of FE and secondary wall synthesis, notably higher than those in vegetative organs, floral organs, and ovules (Fig. S4e). This indicates that GB_A05G1761 mainly affects fiber development. Transcriptomic analysis revealed a rapid increase in the expression of GB_A05G1761 in fibers during the 5–10 DPA period, followed by a gradual decrease in expression from 10–25 DPA, with long fiber varieties displaying consistently higher expression levels (Fig. S4g). qRT-PCR experiments performed on fibers from long fiber length (So717) and short fiber length (Ashi) varieties at 0–30 DPA yielded results consistent with the transcriptomic data (Fig. S4h). Thus, it can be concluded that GB_A05G1761 positively regulates fiber elongation. Fiber micronaire GB_A05G1895 encodes a protein from the abscisic acid-responsive family (TB2/DP1, HVA22). Abscisic acid (ABA) is critical for regulating plant growth and senescence. It plays a significant role in cotton fiber development and has a negative correlation with fiber elongation (Dou et al. 2022 ; S. H. Dasani 2006 ; Yang et al. 2023 ). Within the region spanning 18.0999 to 18.1015 Mb on chromosome A05, a nonsynonymous SNP was identified (Fig. 4 a, b). The transition from T (low fiber micronaire) to A (high fiber micronaire) alters phenylalanine to tyrosine. Most early cotton varieties carry the haplotype (T), while modern varieties show a significant increase in the proportion of haplotype (A). This may be attributed to the directional selection for fiber micronaire in contemporary Sea Island cotton breeding (Fig. 4 b, f). Expression profile analysis presented high expression levels of GB_A05G1895 in ovules (5–10 DPA) and fibers (10 DPA), suggesting its primary role in fiber development and reproductive processes in Sea Island cotton (Fig. 4 e). qRT-PCR experiments conducted on fibers from high fiber micronaire (Ashi) and low fiber micronaire (Tu79-713) varieties during the 0–30 DPA period revealed a consistent increase in relative expression levels during the rapid elongation phase (10–20 DPA), peaking at 20 DPA. Notably, the relative expression in Ashi was significantly higher than that in Tu79-713 (Fig. 4 g). Therefore, it can be inferred that GB_A05G1895 positively regulates MIC in response to ABA. GB_A05G1771 encodes an early nodulin-like protein (ENOD), which is associated with the differentiation of specialized sieve tube cells and the regulation of cellular dimensions. The ENOD40 gene has been proven to reduce cell size, and nodulin-like proteins in species such as watermelons and tomatoes promote fruit development and maturation (M et al. 2005; Wechter et al. 2008 ). Within the region from 16.92 to 17.00 Mb on chromosome A05, two nonsynonymous SNPs were identified (Fig. S5a, c). The first SNP (T/G) leads to a codon change from valine to glycine, while the second SNP (A/G) results in a substitution from lysine to arginine, thereby affecting fiber micronaire (Fig. S5b, f). Most early varieties primarily possess the haplotype (TA, low fiber micronaire), whereas modern varieties exhibit a significant increase in the haplotype (GG, high fiber micronaire), aligning with the emphasis on MIC in recent breeding efforts for Sea Island cotton. Expression profiling exhibited that GB_A05G1771 was highly expressed in ovules (5–20 DPA) and fibers (10–20 DPA), particularly in fibers (Fig. S5e), implying its predominant influence on fiber development. Results from qRT-PCR experiments on fibers from high Fiber micronaire (Ashi) and low Fiber micronaire (Tu79-713) varieties during the 0–30 DPA period demonstrated that the relative expression levels of Ashi were significantly higher than those of Tu79-713 (Fig. S5g). Consequently, it is concluded that GB_A05G1771 positively influences fiber micronaire by regulating cell size and fiber maturation. Fiber elongation GB_A05G1702 belongs to the structural protein family known as NAD(P)-binding Rossmann-fold superfamily (BAN). BAN is indirectly related to the dynamics of flax cell walls, playing a critical role in fiber morphology and mechanical properties (Chabi et al. 2023 ). Within the region spanning 16.3 to 16.5 Mb on chromosome A05 (Fig. 5 a, c), two nonsynonymous SNPs were identified (Fig. 5 b, f). The first SNP (G/A) results in a codon change from threonine to isoleucine, while the second SNP (C/A) causes a substitution from alanine to serine, thereby altering FE. The proportions of haplotypes GC (high fiber elongation) and AA (low fiber elongation) are approximately equal (1:1) in both early and modern varieties, denoting a limited selection pressure on Fiber elongation in recent breeding practices (Fig. 5 d). Expression analysis demonstrated that GB_A05G1702 was highly expressed in ovules (3–10 DPA) and fibers (10 DPA), significantly surpassing those in vegetative and floral organs (Fig. 5 e). This suggests its primary regulatory role in fiber growth and development. qRT-PCR experiments conducted on fibers from high fiber elongation (65-3049-6) and low fiber elongation (XH25) varieties during the 0–30 DPA period indicated a continuous decrease in the relative expression levels of GB_A05G1702 (10–30 DPA), with significantly higher expression levels in 65-3049-6 compared to XH25 (Fig. 5 g). Thus, it is inferred that GB_A05G1702 positively influences FE through indirect effects on cell wall structure. GB_A05G1707 encodes a basic helix-loop-helix (bHLH) transcription factor, which is implicated in brassinosteroid (BR) signaling during the development of cotton fibers (Lu et al. 2022 ; Lu et al. 2018 ; Wang et al. 2020 ). Within the region of chromosome A05 spanning 16.3 to 16.5 Mb, a nonsynonymous SNP was identified (Fig. S6a, c). The transition from G (high fiber elongation) to A (low fiber elongation) results in a codon change that alters the amino acid from alanine to threonine, thereby impacting fiber elongation rates (Fig. S6b, f). The haplotype distribution (G/A) in both early and modern varieties is approximately 1:1 (Fig. S6d), suggesting a potential linkage to the relevant traits. Expression profiling showed low expression levels of GB_A05G1707 in ovules and during the rapid elongation phase of fibers (Fig. S6e). The elevated expression in fibers at 20 DPA suggests its predominant role in fiber development. qRT-PCR analyses of fibers from high fiber elongation (65-3049-6) and low fiber elongation (XH25) varieties during the 0–30 DPA period revealed a consistent decrease in relative expression from 5–10 DPA, followed by a progressive increase from 15–25 DPA, peaking at 25 DPA. Notably, the relative expression level in 65-3049-6 was significantly higher than in XH25 (Fig. S6g). Thus, it is inferred that GB_A05G1707 positively influences FE rates by indirectly affecting BR signaling. Genome-wide selection The Impact of Training Population Proportions on Prediction Accuracy Various proportions of the training population (10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%) were evaluated across 100 iterations to enhance prediction accuracy. The mean prediction accuracy was calculated as the estimation accuracy for phenotypic traits. At a training population proportion of 10% for Sea Island cotton phenotypes (PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE), prediction accuracy values were relatively low, recorded as 0.68, 0.49, 0.54, 0.34, 0.22, 0.20, 0.39, 0.12, 0.30, 0.18, 0.49, 0.75, 0.28, 0.50, and 0.60, respectively. As the proportion increased, the prediction accuracy gradually improved until stabilization. The stable proportions were found to be 50%, 60%, 50%, 60%, 50%, 50%, 60%, 90%, 70%, 80%, 50%, 60%, 50%, 60%, and 60%, as determined by T-test analysis for optimal prediction ratios for PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE (Fig. 6 ). Overall, as the training population proportion increased, the prediction accuracy consistently improved and ultimately stabilized. For traits exhibiting high heritability (PH, FBN, EFBN, BN, FL, FS, MIC, FU, and FE), the optimal training population proportion was determined to be between 50% and 60%, yielding higher prediction accuracy. For traits with moderate heritability (NBD, HFFBN, and FFBN), the optimal training population proportion ranged from 70–90%, resulting in lower prediction accuracy. The Effect of Varying Marker Quantities on Prediction Accuracy Various quantities (10, 50, 100, 500, 1,000, 5,000, 10,000, and 50,000) were randomly selected from 268,995 SNPs to assess the influence of marker quantities on prediction accuracy. Upon 100 iterations, the mean prediction accuracy was set as the prediction accuracy. For Sea Island cotton phenotypes (PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE), when the number of markers was set to 10, the standard deviation was considerable, resulting in lower prediction accuracy values of 0.33, 0.28, 0.27, 0.19, 0.13, 0.16, 0.23, 0.10, 0.18, 0.12, 0.25, 0.37, 0.15, 0.26, and 0.27, respectively. As the number of markers increased, the standard deviation gradually diminished, and prediction accuracy steadily improved until stabilizing. For all traits, stability was observed at a marker quantity of 5,000, where the standard deviation was relatively small (Fig. 7 ). In summary, as the number of markers increased, prediction accuracy consistently improved until reaching stability. High heritability traits (PH, FBN, EFBN, BN, FL, FS, MIC, FU, and FE) demonstrated high prediction accuracy, while traits with moderate heritability (NBD, HFFBN, and FFBN) exhibited low prediction accuracy. Discussion The Influence of Genotype on Phenotypic Diversity Cotton is one of the primary renewable sources of natural fiber, playing a pivotal role in the textile industry. GS breeding has the potential to expedite the breeding process while reducing costs, making it rapidly adopted in maize and wheat. However, its application in cotton remains largely unexplored. In 2015, the genomes of both Sea Island cotton and Upland cotton were successfully assembled (Yuan et al. 2015 ; Zhang et al. 2015 ). Whole-genome resequencing has since been extensively employed for high-density mapping (Geng et al. 2020 ; Song et al. 2024 ; Yu et al. 2021). Currently, resequencing has been utilized in GWAS within cotton. By comparison, there are only four publications specifically addressing Sea Island cotton research (Table S9) (Fang et al. 2021 ; Song et al. 2024 ; Yu et al. 2021; Zhao et al. 2022 ). This study conducted resequencing on 203 Sea Island cotton accessions, achieving 11.02x coverage. Although some preliminary findings have been published (Fang et al. 2021 ; Song et al. 2024 ; Yu et al. 2021; Zhao et al. 2022 ), the majority of the materials in this study remain unsequenced (71.92%) (Table S10 and Table S11). The assessment of 15 traits across nine environments (comprising various years and locations) revealed that EFBN, EBN, and NBD had not been investigated in Sea Island cotton studies. The findings exhibited minor discrepancies compared to earlier phenotypic research (Table S12) (Fang et al. 2021 ; Song et al. 2024 ; Yu et al. 2021; Zhao et al. 2022 ). Notably, the classification of Sea Island cotton based on geographic origin proved to be more ambiguous than that of other crops (Huang et al. 2012 ; Qu et al. 2022 ; Zhou et al. 2021 ), with patterns of genetic diversity reflecting differing breeding histories. This study identified 26 significant SNPs associated with yield and 216 significant SNPs linked to fiber quality across two or more environments. However, no significant SNPs were identified for plant morphological (Table S8). This indicates that future studies on Sea Island cotton should incorporate larger populations. The analysis revealed 192 significant signals in the At subgroup and 50 in the Dt subgroup. This diverges from previous research, highlighting that variations in population composition can significantly influence mapping results (Ma et al. 2018 ; Zhao et al. 2022 ). The Impact of the A05: 15,688,837 − 18,187,089 Region on Fiber Development in Sea Island Cotton The regions associated with FL and MIC exhibited a high degree of overlap, with 134 shared genes identified. The FL region also partially overlapped with the FE region, sharing six common genes. Both the FL and MIC regions demonstrated some overlap with regions in prior research (Table S13). Notably, the FL genes GB_A05G1761 and GB_A05G1764 were situated within the FL region identified by Zhao (Zhao et al. 2022 ) (A05: 15,773,942 − 16,773,942, 3–79), encompassing 33 shared genes. These genes additionally overlapped with the regions of FL and FE identified by Su (Su et al. 2020 ) (A05: 16,354,758 − 17,099,948, Hai7124), which includes 40 shared genes. The MIC gene GB_A05G1771 was located within the FL region previously reported by Zhao (Zhao et al. 2022 ), which contains 27 shared genes. It was also situated in the FL and FE regions described by Su (Su et al. 2020 ), which comprises 34 shared genes (Table S13). The FE genes GB_A05G1702 and GB_A05G1707 were found within the FL region identified by Zhao (Zhao et al. 2022 ) (A05: 15,773,942 − 16,773,942, 3–79), including 60 shared genes. They also overlapped with the FE and FL regions reported by Su et al. (Su et al. 2020 ), sharing 64 genes (Table S13). In conclusion, chromosome A05 is a critical factor influencing fiber development in Sea Island cotton, and the region A05: 15,688,837 − 18,187,089 identified is a key area for fiber development. Simultaneously, two additional genes on chromosome A05 may also be associated with FE in Sea Island cotton. GB_A05G1840 encodes SKU5, a protein involved in sucrose transport in plants. The accumulation of sucrose promotes cellulose synthesis. Moreover, SKU5 mediates the inhibitory effect of ABA on cotton FE (Beasley and Ting 1973 ; J. et al. 1998; S. H. Dasani 2006 ). GB_A05G1829 encodes BR-signaling kinase 1 (BSK1), which plays a role in responding to BR signaling during cotton fiber development (Lu et al. 2022 ; Wang et al. 2020 ). The identification of these genes will provide valuable insights for molecular breeding in cotton. Marker density and predicted population size affect prediction accuracy GS represents one of the most efficient breeding methodologies available today. In contrast to phenotype-based selection, GS significantly reduces both breeding cycles and costs while maintaining comparable selection gains (Beyene et al. 2019 ). Furthermore, enhancing selection intensity is a critical strategy for accelerating the breeding process and improving genetic gain, all without substantially increasing the scale of breeding operations (Bangera et al. 2017 ; Li et al. 2024 ; Yang et al. 2020 ). Accurate phenotypic estimation is pivotal for GS. Longitudinal data can further enhance prediction accuracy (Wang et al. 2020 ). Additionally, prediction accuracy is influenced by factors such as the genetic architecture of target traits (Velez-Torres et al. 2018 ; Zhang et al. 2017 ), statistical modeling approaches (Wang et al. 2023 ), marker density (Zhang et al. 2015 ), population size (Combs and Bernardo 2013 ), and SNPs associated with the target traits (Zhang et al. 2017 ; Zhang et al. 2015 b). This study assessed the impact of training population size and marker quantity on prediction accuracy. As both the number of markers and the proportion of the training population increased, prediction accuracy improved before it stabilized. With a training population proportion between 50% and 60% and a marker count reaching 5,000, traits with high heritability exhibited elevated prediction accuracy (Cui et al. 2020 ; Hu et al. 2022 ; Lan et al. 2020 ). However, the heritability estimates for EBN and BW were inconsistent with their respective prediction accuracy values. The rrBLUP model operates under the assumption that marker effects are normally distributed and exhibit homogenous variance, indicating that the selection of appropriate statistical models and population sizes is critical for accurately estimating the prediction accuracy of EBN and BW (Zhang et al. 2017 ). When the training population proportion was between 70% and 90%, and the marker count increased to 5,000, traits with moderate heritability had low prediction accuracy, potentially attributable to population LD decay and variations in population composition(Guo et al. 2020 ). In summary, this study delved into the growth characteristics, population structure, and genetic diversity of Sea Island cotton in the northern and southern regions of Xinjiang, identifying significant variations in the field of cotton research. Further analysis of candidate intervals identified A05: 15,688,837 − 18,187,089 as a critical region influencing fiber development in Sea Island cotton. This finding can serve as a reference for molecular-assisted breeding aimed at producing high-yield, high-quality Sea Island cotton. Finally, the investigation into the effects of training population size and marker density on prediction accuracy lays a solid theoretical foundation for cotton selective breeding. Declarations Supplementary Information The online version contains supplementary material available at *. Author contribution statement T. Yang: Analysed and summed all the data, drew the Figures and wrote the manuscript. H. Wang, J. Song, K. Zhao: The individual in charge oversaw the gathering and categorization of empirical data. B. Pang, Y. Wang, P. Luo, W. Liang, S. Shi, J. Wang, Y. Lin, J. Li, Z. Wang, Y. Guo: Participated in preliminary work preparation and experimental data collection, and participated in discussions. W. Gao: Directed the experiments and revised the manuscript. All authors read and approved the final manuscript. Funding All data and materials supporting our findings are included in the Materials and Methods section. Details are provided in the attached files. All the Resequencing raw data we sequenced was deposited in the NCBI short read archives (SRA; accession number: PRJNA1179725). The transcriptome sequencing raw data were downloaded from the NCBI Gene Expression Omnibus (GEO) under the accession number GSE184965. This work was supported by The National Key Research and Development Program of China (2021YFD1900802-4), Natural Science Foundation project of Xinjiang (2019D01A41), Tianshan Youth Project (2018Q016). Data availability The genomic resequencing datasets generated during the current study are available in the NCBI Sequence Read Archive repository under accession number PRJNA1179725. The transcriptome sequencing raw data were downloaded from the NCBI Gene Expression Omnibus (GEO) under the accession number GSE184965. The authors declare that they have no conflict of interest. References Ai Q, Pan W, Zeng Y, Li Y, Cui L (2022) CCCH Zinc finger genes in Barley: genome-wide identification, evolution, expression and haplotype analysis. BMC PLANT BIOL 22:117 Bangera R, Correa K, Lhorente JP, Figueroa R, Yanez JM (2017) Genomic predictions can accelerate selection for resistance against Piscirickettsia salmonis in Atlantic salmon (Salmo salar). BMC GENOMICS 18:121 Beasley CA, Ting IP (1973) The effects of plant growth substances on in vitro fiber development from fertilized cotton ovules. AM J BOT 60:130-139 Beyene Y, Gowda M, Olsen M, Robbins KR, Perez-Rodriguez P, Alvarado G, Dreher K, Gao SY, Mugo S, Prasanna BM, Crossa J (2019) Empirical Comparison of Tropical Maize Hybrids Selected Through Genomic and Phenotypic Selections. FRONT PLANT SCI 10:1502 Canella VC, Persa R, Chen P, Jarquin D (2022) Incorporation of Soil-Derived Covariates in Progeny Testing and Line Selection to Enhance Genomic Prediction Accuracy in Soybean Breeding. FRONT GENET 13:905824 Cerrudo D, Cao S, Yuan Y, Martinez C, Suarez EA, Babu R, Zhang X, Trachsel S (2018) Genomic Selection Outperforms Marker Assisted Selection for Grain Yield and Physiological Traits in a Maize Doubled Haploid Population Across Water Treatments. FRONT PLANT SCI 9:336 Chabi M, Goulas E, Galinousky D, Blervacq AS, Lucau-Danila A, Neutelings G, Grec S, Day A, Chabbert B, Haag K, Mussig J, Arribat S, Planchon S, Renaut J, Hawkins S (2023) Identification of new potential molecular actors related to fiber quality in flax through Omics. FRONT PLANT SCI 14:1204016 Chen ZJ, Sreedasyam A, Ando A, Song Q, De Santiago LM, Hulse-Kemp AM, Ding M, Ye W, Kirkbride RC, Jenkins J, Plott C, Lovell J, Lin YM, Vaughn R, Liu B, Simpson S, Scheffler BE, Wen L, Saski CA, Grover CE, Hu G, Conover JL, Carlson JW, Shu S, Boston LB, Williams M, Peterson DG, McGee K, Jones DC, Wendel JF, Stelly DM, Grimwood J, Schmutz J (2020) Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. NAT GENET 52:525-533 Cheng Y, Huang C, Hu Y, Jin S, Zhang X, Si Z, Zhao T, Chen J, Fang L, Dai F, Yang W, Wang P, Mei G, Guan X, Zhang T (2024) Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton. J ADV RES 56:15-29 Combs E, Bernardo R (2013) Accuracy of Genomewide Selection for Different Traits with Constant Population Size, Heritability, and Number of Markers. PLANT GENOME-US 6:120 Cui Y, Li R, Li G, Zhang F, Zhu T, Zhang Q, Ali J, Li Z, Xu S (2020) Hybrid breeding of rice via genomic selection. PLANT BIOTECHNOL J 18:57-67 Dou L, Li Z, Wang H, Li H, Xiao G, Zhang X (2022) The hexokinase Gene Family in Cotton: Genome-Wide Characterization and Bioinformatics Analysis. FRONT PLANT SCI 13:882587 Endelman JB (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. PLANT GENOME-US 4:250-255 Ez JMY, Barría A, López ME, Moen T, Garcia BF, Yoshida GM, Xu P (2023) Genome‐wide association and genomic selection in aquaculture. Review in aquaculture 15:645-675 Fan L, Wang L, Wang X, Zhang H, Zhu Y, Guo J, Gao W, Geng H, Chen Q, Qu Y (2018) A high-density genetic map of extra-long staple cotton (Gossypium barbadense) constructed using genotyping-by-sequencing based single nucleotide polymorphic markers and identification of fiber traits-related QTL in a recombinant inbred line population. BMC GENOMICS 19:489 Fang L, Zhao T, Hu Y, Si Z, Zhu X, Han Z, Liu G, Wang S, Ju L, Guo M, Mei H, Wang L, Qi B, Wang H, Guan X, Zhang T (2021) Divergent improvement of two cultivated allotetraploid cotton species. PLANT BIOTECHNOL J 19:1325-1336 Fu J, Hao Y, Li H, Reif JC, Chen S, Huang C, Wang G, Li X, Xu Y, Li L (2022) Integration of genomic selection with doubled-haploid evaluation in hybrid breeding: From GS 1.0 to GS 4.0 and beyond. MOL PLANT 15:577-580 Geng X, Sun G, Qu Y, Sarfraz Z, Jia Y, He S, Pan Z, Sun J, Iqbal MS, Wang Q, Qin H, Liu J, Liu H, Yang J, Ma Z, Xu D, Yang J, Zhang J, Li Z, Cai Z, Zhang X, Zhang X, Zhou G, Li L, Zhu H, Wang L, Pang B, Du X (2020) Genome-wide dissection of hybridization for fiber quality and yield-related traits in upland cotton. PLANT J 104:1285-1300 Guo R, Dhliwayo T, Mageto EK, Palacios-Rojas N, Lee M, Yu D, Ruan Y, Zhang A, San VF, Olsen M, Crossa J, Prasanna BM, Zhang L, Zhang X (2020) Genomic Prediction of Kernel Zinc Concentration in Multiple Maize Populations Using Genotyping-by-Sequencing and Repeat Amplification Sequencing Markers. FRONT PLANT SCI 11:534 Guo Z, Tucker DM, Lu J, Kishore V, Gay G (2012) Evaluation of genome-wide selection efficiency in maize nested association mapping populations. THEOR APPL GENET 124:261-275 He P, Zhang Y, Xiao G (2020) Origin of a Subgenome and Genome Evolution of Allotetraploid Cotton Species. MOL PLANT 13:1238-1240 Hu J, Chen B, Zhao J, Zhang F, Xie T, Xu K, Gao G, Yan G, Li H, Li L, Ji G, An H, Li H, Huang Q, Zhang M, Wu J, Song W, Zhang X, Luo Y, Chris PJ, Batley J, Tian S, Wu X (2022) Genomic selection and genetic architecture of agronomic traits during modern rapeseed breeding. NAT GENET 54:694-704 Hu Y, Chen J, Fang L, Zhang Z, Ma W, Niu Y, Ju L, Deng J, Zhao T, Lian J, Baruch K, Fang D, Liu X, Ruan YL, Rahman MU, Han J, Wang K, Wang Q, Wu H, Mei G, Zang Y, Han Z, Xu C, Shen W, Yang D, Si Z, Dai F, Zou L, Huang F, Bai Y, Zhang Y, Brodt A, Ben-Hamo H, Zhu X, Zhou B, Guan X, Zhu S, Chen X, Zhang T (2019) Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. NAT GENET 51:739-748 Huang C, Nie X, Shen C, You C, Li W, Zhao W, Zhang X, Lin Z (2017) Population structure and genetic basis of the agronomic traits of upland cotton in China revealed by a genome-wide association study using high-density SNPs. PLANT BIOTECHNOL J 15:1374-1386 Huang X, Kurata N, Wei X, Wang ZX, Wang A, Zhao Q, Zhao Y, Liu K, Lu H, Li W, Guo Y, Lu Y, Zhou C, Fan D, Weng Q, Zhu C, Huang T, Zhang L, Wang Y, Feng L, Furuumi H, Kubo T, Miyabayashi T, Yuan X, Xu Q, Dong G, Zhan Q, Li C, Fujiyama A, Toyoda A, Lu T, Feng Q, Qian Q, Li J, Han B (2012) A map of rice genome variation reveals the origin of cultivated rice. NATURE 490:497-501 J. GS, R. K, S. TV (1998) Potential Role of Abscisic Acid in Cotton Fiber and Ovule Development. J PLANT GROWTH REGUL 17:1-5 Jiang X, Gong J, Zhang J, Zhang Z, Shi Y, Li J, Liu A, Gong W, Ge Q, Deng X, Fan S, Chen H, Kuang Z, Pan J, Che J, Zhang S, Jia T, Wei R, Chen Q, Wei S, Shang H, Yuan Y (2021) Quantitative Trait Loci and Transcriptome Analysis Reveal Genetic Basis of Fiber Quality Traits in CCRI70 RIL Population of Gossypium hirsutum. FRONT PLANT SCI 12:753755 Lan S, Zheng C, Hauck K, McCausland M, Duguid SD, Booker HM, Cloutier S, You FM (2020) Genomic Prediction Accuracy of Seven Breeding Selection Traits Improved by QTL Identification in Flax. INT J MOL SCI 21:1577 Li B, Chen L, Sun W, Wu D, Wang M, Yu Y, Chen G, Yang W, Lin Z, Zhang X, Duan L, Yang X (2020) Phenomics-based GWAS analysis reveals the genetic architecture for drought resistance in cotton. PLANT BIOTECHNOL J 18:2533-2544 Li F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, Ma Z, Shang H, Ma X, Wu J, Liang X, Huang G, Percy RG, Liu K, Yang W, Chen W, Du X, Shi C, Yuan Y, Ye W, Liu X, Zhang X, Liu W, Wei H, Wei S, Huang G, Zhang X, Zhu S, Zhang H, Sun F, Wang X, Liang J, Wang J, He Q, Huang L, Wang J, Cui J, Song G, Wang K, Xu X, Yu JZ, Zhu Y, Yu S (2015) Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. NAT BIOTECHNOL 33:524-530 Li F, Fan G, Wang K, Sun F, Yuan Y, Song G, Li Q, Ma Z, Lu C, Zou C, Chen W, Liang X, Shang H, Liu W, Shi C, Xiao G, Gou C, Ye W, Xu X, Zhang X, Wei H, Li Z, Zhang G, Wang J, Liu K, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Wang J, Yu S (2014) Genome sequence of the cultivated cotton Gossypium arboreum. NAT GENET 46:567-572 Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. BIOINFORMATICS 26:589-595 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. BIOINFORMATICS 25:2078-2079 Li H, Li X, Zhang P, Feng Y, Mi J, Gao S, Sheng L, Ali M, Yang Z, Li L, Fang W, Wang W, Qian Q, Gu F, Zhou W (2024) Smart Breeding Platform: A web-based tool for high-throughput population genetics, phenomics, and genomic selection. MOL PLANT 17:677-681 Li S, Kong L, Xiao X, Li P, Liu A, Li J, Gong J, Gong W, Ge Q, Shang H, Pan J, Chen H, Peng Y, Zhang Y, Lu Q, Shi Y, Yuan Y (2023) Genome-wide artificial introgressions of Gossypium barbadense into G. hirsutum reveal superior loci for simultaneous improvement of cotton fiber quality and yield traits. J ADV RES 53:1-16 Li W, Li W, Song Z, Gao Z, Xie K, Wang Y, Wang B, Hu J, Zhang Q, Ning C, Wang D, Fan X (2024) Marker Density and Models to Improve the Accuracy of Genomic Selection for Growth and Slaughter Traits in Meat Rabbits. Genes (Basel) 15:454 Li Y, Qin T, Wei C, Sun J, Dong T, Zhou R, Chen Q, Wang Q (2019) Using Transcriptome Analysis to Screen for Key Genes and Pathways Related to Cytoplasmic Male Sterility in Cotton (Gossypium hirsutum L.). INT J MOL SCI 20:5120 Li Z, Wang P, You C, Yu J, Zhang X, Yan F, Ye Z, Shen C, Li B, Guo K, Liu N, Thyssen GN, Fang DD, Lindsey K, Zhang X, Wang M, Tu L (2020) Combined GWAS and eQTL analysis uncovers a genetic regulatory network orchestrating the initiation of secondary cell wall development in cotton. NEW PHYTOL 226:1738-1752 Liggett LA, Cato LD, Weinstock JS, Zhang Y, Nouraie SM, Gladwin MT, Garrett ME, Ashley-Koch A, Telen MJ, Custer B, Kelly S, Dinardo CL, Sabino EC, Loureiro P, Carneiro-Proietti AB, Maximo C, Reiner AP, Abecasis GR, Williams DA, Natarajan P, Bick AG, Sankaran VG (2022) Clonal hematopoiesis in sickle cell disease. J CLIN INVEST 132:138 Lu R, Li Y, Zhang J, Wang Y, Zhang J, Li Y, Zheng Y, Li XB (2022) The bHLH/HLH transcription factors GhFP2 and GhACE1 antagonistically regulate fiber elongation in cotton. PLANT PHYSIOL 189:628-643 Lu R, Zhang J, Liu D, Wei YL, Wang Y, Li XB (2018) Characterization of bHLH/HLH genes that are involved in brassinosteroid (BR) signaling in fiber development of cotton (Gossypium hirsutum). BMC PLANT BIOL 18:304 M L, J P, V G, D J, P B, V G, M F, M M, C C, C R (2005) Changes in transcriptional profiles are associated with early fruit tissue specialization in tomato. PLANT PHYSIOL 139:750-769 Ma Z, He S, Wang X, Sun J, Zhang Y, Zhang G, Wu L, Li Z, Liu Z, Sun G, Yan Y, Jia Y, Yang J, Pan Z, Gu Q, Li X, Sun Z, Dai P, Liu Z, Gong W, Wu J, Wang M, Liu H, Feng K, Ke H, Wang J, Lan H, Wang G, Peng J, Wang N, Wang L, Pang B, Peng Z, Li R, Tian S, Du X (2018) Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. NAT GENET 50:803-813 Ma Z, Zhang Y, Wu L, Zhang G, Sun Z, Li Z, Jiang Y, Ke H, Chen B, Liu Z, Gu Q, Wang Z, Wang G, Yang J, Wu J, Yan Y, Meng C, Li L, Li X, Mo S, Wu N, Ma L, Chen L, Zhang M, Si A, Yang Z, Wang N, Wu L, Zhang D, Cui Y, Cui J, Lv X, Li Y, Shi R, Duan Y, Tian S, Wang X (2021) High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. NAT GENET 53:1385-1391 McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. GENOME RES 20:1297-1303 Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. GENETICS 157:1819-1829 Mir ZA, Chandra T, Saharan A, Budhlakoti N, Mishra DC, Saharan MS, Mir RR, Singh AK, Sharma S, Vikas VK, Kumar S (2023) Recent advances on genome-wide association studies (GWAS) and genomic selection (GS); prospects for Fusarium head blight research in Durum wheat. MOL BIOL REP 50:3885-3901 Montesinos-Lopez A, Crespo-Herrera L, Dreisigacker S, Gerard G, Vitale P, Saint PC, Govindan V, Tarekegn ZT, Flores MC, Perez-Rodriguez P, Ramos-Pulido S, Lillemo M, Li H, Montesinos-Lopez OA, Crossa J (2024) Deep learning methods improve genomic prediction of wheat breeding. FRONT PLANT SCI 15:1324090 Nie X, Wen T, Shao P, Tang B, Nuriman-Guli A, Yu Y, Du X, You C, Lin Z (2020) High-density genetic variation maps reveal the correlation between asymmetric interspecific introgressions and improvement of agronomic traits in Upland and Pima cotton varieties developed in Xinjiang, China. PLANT J 103:677-689 Paterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, Llewellyn D, Showmaker KC, Shu S, Udall J, Yoo MJ, Byers R, Chen W, Doron-Faigenboim A, Duke MV, Gong L, Grimwood J, Grover C, Grupp K, Hu G, Lee TH, Li J, Lin L, Liu T, Marler BS, Page JT, Roberts AW, Romanel E, Sanders WS, Szadkowski E, Tan X, Tang H, Xu C, Wang J, Wang Z, Zhang D, Zhang L, Ashrafi H, Bedon F, Bowers JE, Brubaker CL, Chee PW, Das S, Gingle AR, Haigler CH, Harker D, Hoffmann LV, Hovav R, Jones DC, Lemke C, Mansoor S, Ur RM, Rainville LN, Rambani A, Reddy UK, Rong JK, Saranga Y, Scheffler BE, Scheffler JA, Stelly DM, Triplett BA, Van Deynze A, Vaslin MF, Waghmare VN, Walford SA, Wright RJ, Zaki EA, Zhang T, Dennis ES, Mayer KF, Peterson DG, Rokhsar DS, Wang X, Schmutz J (2012) Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. NATURE 492:423-427 Pedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. BIOINFORMATICS 34:867-868 Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. NAT PROTOC 11:1650-1667 Qu Z, Wu Y, Hu D, Li T, Liang H, Ye F, Xue J, Xu S (2022) Genome-Wide Association Analysis for Candidate Genes Contributing to Kernel-Related Traits in Maize. FRONT PLANT SCI 13:872292 Rabieyan E, Bihamta MR, Moghaddam ME, Mohammadi V, Alipour H (2022) Genome-wide association mapping and genomic prediction of agronomical traits and breeding values in Iranian wheat under rain-fed and well-watered conditions. BMC GENOMICS 23:831 S. H. Dasani VST (2006) Role of abscisic acid in cotton fiber development. Russian Journal of Plant Physiology 53:62-67 Song X, Zhu G, Su X, Yu Y, Duan Y, Wang H, Shang X, Xu H, Chen Q, Guo W (2024) Combined genome and transcriptome analysis of elite fiber quality in Gossypium barbadense. PLANT PHYSIOL 195:2158-2175 Spiess GM, Hausman A, Yu P, Cohen JD, Rampey RA, Zolman BK (2014) Auxin Input Pathway Disruptions Are Mitigated by Changes in Auxin Biosynthetic Gene Expression in Arabidopsis. PLANT PHYSIOL 165:1092-1104 Sreedasyam A, Lovell JT, Mamidi S, Khanal S, Jenkins JW, Plott C, Bryan KB, Li Z, Shu S, Carlson J, Goodstein D, De Santiago L, Kirkbride RC, Calleja S, Campbell T, Koebernick JC, Dever JK, Scheffler JA, Pauli D, Jenkins JN, Mccarty JC, Williams M, Boston L, Webber J, Udall JA, Chen ZJ, Bourland F, Stiller WN, Saski CA, Grimwood J, Chee PW, Jones DC, Schmutz J (2024) Genome resources for three modern cotton lines guide future breeding efforts. NAT PLANTS 10:1039-1051 Strader LC, Culler AH, Cohen JD, Bartel B (2010) Conversion of endogenous indole-3-butyric acid to indole-3-acetic acid drives cell expansion in Arabidopsis seedlings. PLANT PHYSIOL 153:1577-1586 Su J, Wang C, Ma Q, Zhang A, Shi C, Liu J, Zhang X, Yang D, Ma X (2020) An RTM-GWAS procedure reveals the QTL alleles and candidate genes for three yield-related traits in upland cotton. BMC PLANT BIOL 20:416 Su X, Zhu G, Song X, Xu H, Li W, Ning X, Chen Q, Guo W (2020) Genome-wide association analysis reveals loci and candidate genes involved in fiber quality traits in sea island cotton (Gossypium barbadense). BMC PLANT BIOL 20:289 Sun Z, Wang X, Liu Z, Gu Q, Zhang Y, Li Z, Ke H, Yang J, Wu J, Wu L, Zhang G, Zhang C, Ma Z (2017) Genome-wide association study discovered genetic variation and candidate genes of fibre quality traits in Gossypium hirsutum L. PLANT BIOTECHNOL J 15:982-996 Technow F, Burger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups. G3 (Bethesda) 3:197-203 Velez-Torres M, Jesus Garcia-Zavala J, Hernandez-Rodriguez M, Lobato-Ortiz R, Jesus Lopez-Reynoso J, Benitez-Riquelme I, Apolinar Mejia-Contreras J, Esquivel-Esquivel G, Domingo Molina-Galan J, Perez-Rodriguez P, Zhang X (2018) Genomic prediction of the general combining ability of maize lines (Zea mays L.) and the performance of their single crosses. PLANT BREEDING 137:379-387 Wang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H (2023) DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOL PLANT 16:279-293 Wang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. NUCLEIC ACIDS RES 38:e164 Wang L, Cheng H, Xiong F, Ma S, Zheng L, Song Y, Deng K, Wu H, Li F, Yang Z (2020) Comparative phosphoproteomic analysis of BR-defective mutant reveals a key role of GhSK13 in regulating cotton fiber development. SCI CHINA LIFE SCI 63:1905-1917 Wang L, Wang X, Maimaitiaili B, Kafle A, Khan KS, Feng G (2021) Breeding Practice Improves the Mycorrhizal Responsiveness of Cotton (Gossypium spp. L.). FRONT PLANT SCI 12:780454 Wang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, Shen C, Li J, Zhang L, Zhou X, Nie X, Li Z, Guo K, Ma Y, Huang C, Jin S, Zhu L, Yang X, Min L, Yuan D, Zhang Q, Lindsey K, Zhang X (2017) Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. NAT GENET 49:579-587 Wang M, Tu L, Yuan D, Zhu D, Shen C, Li J, Liu F, Pei L, Wang P, Zhao G, Ye Z, Huang H, Yan F, Ma Y, Zhang L, Liu M, You J, Yang Y, Liu Z, Huang F, Li B, Qiu P, Zhang Q, Zhu L, Jin S, Yang X, Min L, Li G, Chen LL, Zheng H, Lindsey K, Lin Z, Udall JA, Zhang X (2019) Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. NAT GENET 51:224-229 Wang N, Li Y, Chen YH, Lu R, Zhou L, Wang Y, Zheng Y, Li XB (2021) Phosphorylation of WRKY16 by MPK3-1 is essential for its transcriptional activity during fiber initiation and elongation in cotton (Gossypium hirsutum). PLANT CELL 33:2736-2752 Wang N, Li Y, Meng Q, Chen M, Wu M, Zhang R, Xu Z, Sun J, Zhang X, Nie X, Yuan D, Lin Z (2023) Genome and haplotype provide insights into the population differentiation and breeding improvement of Gossypium barbadense. J ADV RES 54:15-27 Wang N, Wang H, Zhang A, Liu Y, Yu D, Hao Z, Ilut D, Glaubitz JC, Gao Y, Jones E, Olsen M, Li X, San Vicente F, Prasanna BM, Crossa J, Perez-Rodriguez P, Zhang X (2020) Genomic prediction across years in a maize doubled haploid breeding program to accelerate early-stage testcross testing. Theoretical and Applied Genetics: International Journal of Breeding Research and Cell Genetics 133:2869-2879 Wechter WP, Levi A, Harris KR, Davis AR, Fei Z, Katzir N, Giovannoni JJ, Salman-Minkov A, Hernandez A, Thimmapuram J, Tadmor Y, Portnoy V, Trebitsh T (2008) Gene expression in developing watermelon fruit. BMC GENOMICS 9:275 Werner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ (2018) Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. Theoretical and Applied Genetics: International Journal of Breeding Research and Cell Genetics 131:299-317 Xu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q (2022) Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOL PLANT 15:1664-1695 Yang AQ, Chen B, Ran ML, Yang GM, Zeng C (2020) The application of genomic selection in pig cross breeding. Yi Chuan 42:145-152 Yang Y, Lai W, Long L, Gao W, Xu F, Li P, Zhou S, Ding Y, Hu H (2023) Comparative proteomic analysis identified proteins and the phenylpropanoid biosynthesis pathway involved in the response to ABA treatment in cotton fiber development. Sci Rep 13:1488 Yu J, Hui Y, Chen J, Yu H, Gao X, Zhang Z, Li Q, Zhu S, Zhao T (2021) Whole-genome resequencing of 240 Gossypium barbadense accessions reveals genetic variation and genes associated with fiber strength and lint percentage. THEOR APPL GENET 134:3249-3261 Yuan D, Tang Z, Wang M, Gao W, Tu L, Jin X, Chen L, He Y, Zhang L, Zhu L, Li Y, Liang Q, Lin Z, Yang X, Liu N, Jin S, Lei Y, Ding Y, Li G, Ruan X, Ruan Y, Zhang X (2015) The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres. Sci Rep 5:17662 Zafar MM, Jia X, Shakeel A, Sarfraz Z, Manan A, Imran A, Mo H, Ali A, Youlu Y, Razzaq A, Iqbal MS, Ren M (2021) Unraveling Heat Tolerance in Upland Cotton (Gossypium hirsutum L.) Using Univariate and Multivariate Analysis. FRONT PLANT SCI 12:727835 Zaidi SS, Naqvi RZ, Asif M, Strickler S, Shakir S, Shafiq M, Khan AM, Amin I, Mishra B, Mukhtar MS, Scheffler BE, Scheffler JA, Mueller LA, Mansoor S (2020) Molecular insight into cotton leaf curl geminivirus disease resistance in cultivated cotton (Gossypium hirsutum). PLANT BIOTECHNOL J 18:691-706 Zhang A, Wang H, Beyene Y, Semagn K, Liu Y, Cao S, Cui Z, Ruan Y, Burgueno J, San VF, Olsen M, Prasanna BM, Crossa J, Yu H, Zhang X (2017) Effect of Trait Heritability, Training Population Size and Marker Density on Genomic Prediction Accuracy Estimation in 22 bi-parental Tropical Maize Populations. FRONT PLANT SCI 8:1916 Zhang T, Hu Y, Jiang W, Fang L, Guan X, Chen J, Zhang J, Saski CA, Scheffler BE, Stelly DM, Hulse-Kemp AM, Wan Q, Liu B, Liu C, Wang S, Pan M, Wang Y, Wang D, Ye W, Chang L, Zhang W, Song Q, Kirkbride RC, Chen X, Dennis E, Llewellyn DJ, Peterson DG, Thaxton P, Jones DC, Wang Q, Xu X, Zhang H, Wu H, Zhou L, Mei G, Chen S, Tian Y, Xiang D, Li X, Ding J, Zuo Q, Tao L, Liu Y, Li J, Lin Y, Hui Y, Cao Z, Cai C, Zhu X, Jiang Z, Zhou B, Guo W, Li R, Chen ZJ (2015) Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. NAT BIOTECHNOL 33:531-537 Zhang W, Lin K, Fu W, Xie J, Fan X, Zhang M, Luo H, Yin Y, Guo Q, Huang H, Chen T, Lin X, Yuan Y, Huang C, Du S (2024) Insights for the Captive Management of South China Tigers Based on a Large-Scale Genetic Survey. Genes (Basel) 15:398 Zhang X, Perez-Rodriguez P, Semagn K, Beyene Y, Babu R, Lopez-Cruz MA, San VF, Olsen M, Buckler E, Jannink JL, Prasanna BM, Crossa J (2015) Genomic prediction in biparental tropical maize populations in water-stressed and well-watered environments using low-density and GBS SNPs. Heredity (Edinb) 114:291-299 Zhao H, Wang Y, Zhao S, Fu Y, Zhu L (2021) HOMEOBOX PROTEIN 24 mediates the conversion of indole-3-butyric acid to indole-3-acetic acid to promote root hair elongation. NEW PHYTOL 232:2057-2070 Zhao N, Wang W, Grover CE, Jiang K, Pan Z, Guo B, Zhu J, Su Y, Wang M, Nie H, Xiao L, Guo A, Yang J, Cheng C, Ning X, Li B, Xu H, Adjibolosoo D, Aierxi A, Li P, Geng J, Wendel JF, Kong J, Hua J (2022) Genomic and GWAS analyses demonstrate phylogenomic relationships of Gossypium barbadense in China and selection for fibre length, lint percentage and Fusarium wilt resistance. PLANT BIOTECHNOL J 20:691-710 Zhou Z, Guan H, Liu C, Zhang Z, Geng S, Qin M, Li W, Shi X, Dai Z, Lei Z, Wu Z, Tian B, Hou J (2021) Identification of genomic regions affecting grain peroxidase activity in bread wheat using genome-wide association study. BMC PLANT BIOL 21:523 Supplementary Files SupplementalandFigureslegeds.docx TableS1.xlsx TableS2.xlsx TableS3.xlsx TableS4.xlsx TableS5.xlsx TableS6.xlsx TableS7.xlsx TableS8.xlsx TableS9.xlsx TableS10.xlsx TableS11.xlsx TableS12.xlsx TableS13.xlsx TableS14.xlsx TableS15.xlsx Fig.S1.tif Fig.S2.tif Fig.S3.tif Fig.S4.tif Fig.S5.tif Fig.S6.tif Cite Share Download PDF Status: Published Journal Publication published 09 Jun, 2025 Read the published version in Theoretical and Applied Genetics → Version 1 posted Reviewers agreed at journal 04 Apr, 2025 Reviewers invited by journal 02 Apr, 2025 Editor assigned by journal 02 Apr, 2025 First submitted to journal 01 Apr, 2025 Editorial decision: Minor revisions 27 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5667934","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":437258878,"identity":"5ef09ee6-80ff-4537-b1cf-062db962be58","order_by":0,"name":"Tao Yang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Tao","middleName":"","lastName":"Yang","suffix":""},{"id":437258879,"identity":"9d504e4c-44c3-4bd7-b3ed-2df6c8037255","order_by":1,"name":"Honggang Wang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Honggang","middleName":"","lastName":"Wang","suffix":""},{"id":437258880,"identity":"0dbad73a-518f-44aa-9644-062427009154","order_by":2,"name":"Jikun Song","email":"","orcid":"","institution":"state key laboratory of cotton biology,institute of cotton research of chinese academy of agricultural science,key laboratory of cotton genetic improvement,miniatry of agriculture,Anyang,455000,china","correspondingAuthor":false,"prefix":"","firstName":"Jikun","middleName":"","lastName":"Song","suffix":""},{"id":437258881,"identity":"05943e40-655b-4ca0-8165-b6fc8ca9a442","order_by":3,"name":"Kang Zhao","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Kang","middleName":"","lastName":"Zhao","suffix":""},{"id":437258882,"identity":"36ee7512-4c04-48a8-8d49-a64cf49312c6","order_by":4,"name":"Bo Pang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Bo","middleName":"","lastName":"Pang","suffix":""},{"id":437258883,"identity":"2518ad36-4a83-4ee1-b2a7-ff160e85d148","order_by":5,"name":"Yongpan Wang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Yongpan","middleName":"","lastName":"Wang","suffix":""},{"id":437258884,"identity":"ae29b3da-c52c-42a6-ad2e-e9d2cab670d4","order_by":6,"name":"Ping Luo","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Ping","middleName":"","lastName":"Luo","suffix":""},{"id":437258885,"identity":"67779bb5-86ad-4744-828a-58390b84be6c","order_by":7,"name":"Weiwei Liang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Weiwei","middleName":"","lastName":"Liang","suffix":""},{"id":437258886,"identity":"4c2eed17-5b51-425a-b800-b8331ea341b9","order_by":8,"name":"Shunyu Shi","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Shunyu","middleName":"","lastName":"Shi","suffix":""},{"id":437258887,"identity":"43977a31-7a15-4f70-9e0c-5aea0adb77ae","order_by":9,"name":"Jie Wang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Jie","middleName":"","lastName":"Wang","suffix":""},{"id":437258888,"identity":"20b1564b-d726-4714-9676-d86b08583b24","order_by":10,"name":"Yifeng Lin","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Yifeng","middleName":"","lastName":"Lin","suffix":""},{"id":437258889,"identity":"78b73712-5c76-4423-a875-056102a3e74c","order_by":11,"name":"Jing Li","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Jing","middleName":"","lastName":"Li","suffix":""},{"id":437258890,"identity":"4592fbe3-d684-4ac5-a41d-78b758f47679","order_by":12,"name":"Zhenrui Wang","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Zhenrui","middleName":"","lastName":"Wang","suffix":""},{"id":437258891,"identity":"858f6b6e-086d-4641-b379-43e41705e858","order_by":13,"name":"Yongqin Guo","email":"","orcid":"","institution":"Xinjiang Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Yongqin","middleName":"","lastName":"Guo","suffix":""},{"id":437258892,"identity":"5b2dd5e2-eec7-4b30-9f40-b7187f337613","order_by":14,"name":"Wenwei Gao","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAuElEQVRIiWNgGAWjYHACxgcMPGCGAdFamA1I1sImAWUQqUU+IsesmkfGLrGBvXmbBEPNHcJaDG/kmN2cwZOc2MBzrEyC4dgzIrTMzt124wMPc2KDRI6ZBGPDYeK0FCTw1Cc2yL8hUou8dO42hg88h4G28BCpxUD+/WfJGTzHjdt40ootEo4RY0vPscTPvD3Vsv3shzfe+FBDjC0HgARjDzB2QLwEwhqAtjSAyB/EKB0Fo2AUjIIRCwDLlTYLBc9giAAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0003-6524-1526","institution":"Xinjiang Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"Wenwei","middleName":"","lastName":"Gao","suffix":""}],"badges":[],"createdAt":"2024-12-18 09:17:19","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5667934/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5667934/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00122-025-04911-1","type":"published","date":"2025-06-09T15:57:14+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":79823185,"identity":"3e5eeda0-2e8a-4a91-850c-3560466f4391","added_by":"auto","created_at":"2025-04-03 09:09:09","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":2544726,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePopulation structure of Sea Island cotton germplasm resources.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) The Neighbor-Joining (NJ) phylogenetic tree at k =3, where each branch represents an individual genetic material, and distinct colors are used to denote different subpopulations. (\u003cstrong\u003eb\u003c/strong\u003e) The Principal Component Analysis (PCA) plot of the various subpopulations, with each small triangle representing an individual genetic material and different colors distinguishing the subpopulations. (\u003cstrong\u003ec\u003c/strong\u003e) G1, G2, and G3 represent distinct subpopulations. The numbers within circles indicate the θπ values for each subpopulation, while the number within the triangle represents the θπ value for the entire \"All\" population. The numbers surrounding the triangle denote the FST values between pairs of subpopulations, reflecting the degree of genetic differentiation. (\u003cstrong\u003ed\u003c/strong\u003e) The Linkage Disequilibrium (LD) decay plot for the subpopulations (G1, G2, G3) and the entire \"All\" population. Different colors correspond to different populations, and the gray section in the middle indicate the decay distance at which r² diminishes to 0.5, a crucial metric for assessing the extent of linkage among loci within a population. (\u003cstrong\u003ee\u003c/strong\u003e) Ancestry composition analysis for each genetic material at k=3, where distinct colors represent different ancestral components.\u003c/p\u003e","description":"","filename":"Fig.1.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/d288b351087aa09a7a19a51b.png"},{"id":79823178,"identity":"fd3e1ad6-00b2-493c-998c-957823ae8545","added_by":"auto","created_at":"2025-04-03 09:09:08","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":6668676,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePhenotypic Traits T-text Analysis.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSingle (*), double (**) and triple (***) asterisks mark statistical significance levels of \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05, 0.01 and 0.001 respectively. PH refers to Plant height, FBN refers to Fruit-branch number per plant, EFBN refers to Effective fruit-branch number per plant, BN refers to Boll number per plant, EBN refers to Effect boll number per plant, BW refers to Boll weight, LP refers to Lint percentage, NBD refers to Number of boll drops, HFFBN refers to Height of first fruit-branch node, FFBN refers to First fruit-branch nodes, FL refers to Fiber length, FS refers to Fiber strength, MIC refers to Fiber micronaire, FU refers to Fiber uniformity, FE refers to Fiber elongation\u003c/p\u003e","description":"","filename":"Fig.2.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/a707fc14a4868e78dc6c7fd0.png"},{"id":79823235,"identity":"9d6b4ad5-9c07-43a2-8a8c-8b7730b53463","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":2828435,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification of the FL gene \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eGB A05G1764\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e on chromosome A05.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Manhattan plot for FL. Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). The effect values of genetic markers were corrected by multiple tests using Bonferroni correction method. (\u003cstrong\u003eb\u003c/strong\u003e) Gene structure of\u003cem\u003e GB A05G1764\u003c/em\u003e. Blue rectangles, blue lines and numbers indicate exons and intron regions and chromosomal locations, respectively. (\u003cstrong\u003ec\u003c/strong\u003e) A05 chromosome peak within 1Mb Manhattan map (top) and LD heat map (bottom). Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). Red dot mark non-synonymous SNP locations. The red dotted lines mark the candidate regions. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 1 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of 0. (\u003cstrong\u003ed\u003c/strong\u003e) Haplotype distribution in different geographic regions and subpopulations. The bar chart on the left shows the haplotype distribution in different geographic regions. The bottom bar chart shows the haplotype distribution of different subpopulations. (\u003cstrong\u003ee\u003c/strong\u003e) Expression levels of \u003cem\u003eGB_A05G1764\u003c/em\u003e gene in different tissue and fiber development stages of Hai 7124 in Sea Island cotton reference genome. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 1.5 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of -1.5. (\u003cstrong\u003ef\u003c/strong\u003e) Box plot for FL. In the box plots, the center line indicate the median, the box limits indicate the upper and lower quartiles, and the dots indicate the material distribution of the same genotype. We tested the significance using T-test. (\u003cstrong\u003eg\u003c/strong\u003e) The expression of \u003cem\u003eGB-A05G1764\u003c/em\u003e by RNA-seq (FPKM) in different FL materials (long: XH58, short: Ashi) at the stage of fiber development (0, 5, 10, 15, 20, 25 DPA). three technical repeats. Single (*), double (**) and triple (***) asterisks mark statistical significance levels of P \u0026lt; 0.05, 0.01 and 0.001 respectively. (\u003cstrong\u003eh\u003c/strong\u003e) qRT-PCR results of \u003cem\u003eGB A05G1764\u003c/em\u003e in long-FL (So717) and short-FL (Ashi) at the fiber developmental stages (5, 10, 15, 20, 25, 30 DPA). three technical repeats. Single (*), double (**) and triple (***) asterisks mark statistical significance levels of \u003cem\u003eP \u003c/em\u003e\u0026lt; 0.05, 0.01 and 0.001 respectively.\u003c/p\u003e","description":"","filename":"Fig.3.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/cb5a04d30e282064d2809441.png"},{"id":79823222,"identity":"1168667d-e201-450c-81e9-1d8cfd2aa32a","added_by":"auto","created_at":"2025-04-03 09:09:10","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":2145881,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification of the MIC gene \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eGB_A05G1895\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e on chromosome A05.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Manhattan plot for FL. Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). The effect values of genetic markers were corrected by multiple tests using Bonferroni correction method. (\u003cstrong\u003eb\u003c/strong\u003e) Gene structure of \u003cem\u003eGB A05G1895\u003c/em\u003e. Blue rectangles, blue lines and numbers indicate exons and intron regions and chromosomal locations, respectively. (\u003cstrong\u003ec\u003c/strong\u003e) A05 chromosome peak within 1Mb Manhattan map (top) and LD heat map (bottom). Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). Red dot mark non-synonymous SNP locations. The red dotted lines mark the candidate regions. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 1 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of 0. (\u003cstrong\u003ed\u003c/strong\u003e) Haplotype distribution in different geographic regions and subpopulations. The bar chart on the left shows the haplotype distribution in different geographic regions. The bottom bar chart shows the haplotype distribution of different subpopulations. (\u003cstrong\u003ee\u003c/strong\u003e) Expression levels of \u003cem\u003eGB_A05G1895\u003c/em\u003e gene in different tissue and fiber development stages of Hai 7124 in Sea Island cotton reference genome. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 2 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of -2. (\u003cstrong\u003ef\u003c/strong\u003e) Box plot for MIC. In the box plot, the center line indicate the median, the box limits indicate the upper and lower quartiles, and the dots indicate the material distribution of the same genotype. We tested the significance using T-test. (\u003cstrong\u003eg\u003c/strong\u003e) qRT-PCR results of \u003cem\u003eGB A05G1895\u003c/em\u003e in high-MIC (Ashi) and low-MIC (Tu79-713) at the fiber developmental stages (5, 10, 15, 20, 25, 30 DPA). three technical repeats. Single (*), double (**) and triple (***) asterisks mark statistical significance levels of \u003cem\u003eP \u003c/em\u003e\u0026lt; 0.05, 0.01 and 0.001 respectively.\u003c/p\u003e","description":"","filename":"Fig.4.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/8fa8e8ced3969666396adeb1.png"},{"id":79823186,"identity":"38c94021-ab2e-469e-b3c1-6266cbe7d4b6","added_by":"auto","created_at":"2025-04-03 09:09:09","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":2900651,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification of the FE gene \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003eGB_A05G1702\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e on chromosome A05.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Manhattan plot for FL. Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). The effect values of genetic markers were corrected by multiple tests using Bonferroni correction method. (\u003cstrong\u003eb\u003c/strong\u003e) Gene structure of \u003cem\u003eGB A05G1702\u003c/em\u003e. Blue rectangles, blue lines and numbers indicate exons and intron regions and chromosomal locations, respectively. (\u003cstrong\u003ec\u003c/strong\u003e) A05 chromosome peak within 1Mb Manhattan map (top) and LD heat map (bottom). Dashed line represents the significance threshold (-log\u003csub\u003e10\u003c/sub\u003e \u003cem\u003eP\u003c/em\u003e = 5.4). Red dot mark non-synonymous SNP locations. The red dotted lines mark the candidate regions. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 1 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of 0. (\u003cstrong\u003ed\u003c/strong\u003e) Haplotype distribution in different geographic regions and subpopulations. The bar chart on the left shows the haplotype distribution in different geographic regions. The bottom bar chart shows the haplotype distribution of different subpopulations. (\u003cstrong\u003ee\u003c/strong\u003e) Expression levels of \u003cem\u003eGB_A05G1702\u003c/em\u003e gene in different tissue and fiber development stages of Hai 7124 in Sea Island cotton reference genome. The red indicate R\u003csup\u003e2\u003c/sup\u003e value of 2 and the yellow indicate R\u003csup\u003e2\u003c/sup\u003e value of -2. (\u003cstrong\u003ef\u003c/strong\u003e) Box plot for FE. In the box plot, the center line indicate the median, the box limits indicate the upper and lower quartiles, and the dots indicate the material distribution of the same genotype. We tested the significance using T-test. (\u003cstrong\u003eg\u003c/strong\u003e) qRT-PCR results of \u003cem\u003eGB A05G1702\u003c/em\u003e in high-FE (65-3049-6) and low-FE (XH25) at the fiber developmental stages (5, 10, 15, 20, 25, 30 DPA). three technical repeats. Single (*), double (**) and triple (***) asterisks mark statistical significance levels of \u003cem\u003eP \u003c/em\u003e\u0026lt; 0.05, 0.01 and 0.001 respectively.\u003c/p\u003e","description":"","filename":"Fig.5.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/8bf677c5040cbde993ff76f1.png"},{"id":79823266,"identity":"7e279dac-34d6-4012-9f6b-fa68b854acde","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2557448,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome-wide predictions of different population sizes.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Refers to Plant height (PH). (\u003cstrong\u003eb\u003c/strong\u003e) Refers to Fruit-branch number per plant (FBN). (\u003cstrong\u003ec\u003c/strong\u003e) Refers to Effective fruit-branch number per plant (EFBN). (\u003cstrong\u003ed\u003c/strong\u003e) Refers to Boll number per plant (BN). (\u003cstrong\u003ee\u003c/strong\u003e) Refers to Effect boll number per plant (EBN). (\u003cstrong\u003ef\u003c/strong\u003e) Refers to Boll weight (BW). (\u003cstrong\u003eg\u003c/strong\u003e) Refers to Lint percentage (LP). (\u003cstrong\u003eh\u003c/strong\u003e) Refers to Number of boll drops (NBD). (\u003cstrong\u003ei\u003c/strong\u003e) Refers to Height of first fruit-branch node (HFFBN). (\u003cstrong\u003ej\u003c/strong\u003e) Refers to First fruit-branch nodes (FFBN). (\u003cstrong\u003ek\u003c/strong\u003e) Refers to Fiber length (FL). (\u003cstrong\u003el\u003c/strong\u003e) Refers to Fiber strength (FS). (\u003cstrong\u003em\u003c/strong\u003e) Refers to Fiber micronaire (MIC). (\u003cstrong\u003en\u003c/strong\u003e) Refers to Fiber uniformity (FU). (\u003cstrong\u003eo\u003c/strong\u003e) Refers to Fiber elongation (FE).\u003c/p\u003e","description":"","filename":"Fig.6.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/515b39ebdc7f8107f1107eaa.png"},{"id":79823271,"identity":"1983c360-d3cf-4a15-9a60-45708ac63d4e","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":1628149,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eGenome-wide prediction of different marker densities.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(\u003cstrong\u003ea\u003c/strong\u003e) Refers to Plant height (PH). (\u003cstrong\u003eb\u003c/strong\u003e)Refers to Fruit-branch number per plant (FBN). (\u003cstrong\u003ec\u003c/strong\u003e) Refers to Effective fruit-branch number per plant (EFBN). (\u003cstrong\u003ed\u003c/strong\u003e) Refers to Boll number per plant (BN). (\u003cstrong\u003ee\u003c/strong\u003e) Refers to Effect boll number per plant (EBN). (\u003cstrong\u003ef\u003c/strong\u003e)Refers to Boll weight (BW). (\u003cstrong\u003eg\u003c/strong\u003e) Refers to Lint percentage (LP). (\u003cstrong\u003eh\u003c/strong\u003e)Refers to Number of boll drops (NBD). (\u003cstrong\u003ei\u003c/strong\u003e) Refers to Height of first fruit-branch node (HFFBN). (\u003cstrong\u003ej\u003c/strong\u003e) Refers to First fruit-branch nodes (FFBN). (\u003cstrong\u003ek\u003c/strong\u003e) Refers to Fiber length (FL). (\u003cstrong\u003el\u003c/strong\u003e) Refers to Fiber strength (FS). (\u003cstrong\u003em\u003c/strong\u003e) Refers to Fiber micronaire (MIC). (\u003cstrong\u003en\u003c/strong\u003e) Refers to Fiber uniformity (FU). (\u003cstrong\u003eo\u003c/strong\u003e) Refers to Fiber elongation (FE).\u003c/p\u003e","description":"","filename":"Fig.7.png","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/9acf52ff1133ea6129407afe.png"},{"id":84726629,"identity":"bee0465f-2623-4514-8242-c888d2182a0c","added_by":"auto","created_at":"2025-06-16 16:07:40","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":21400220,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/bce2c11f-8e32-4546-b1d0-ae82be3368d3.pdf"},{"id":79823227,"identity":"ddb930b8-28f6-46a2-a303-c914935e03df","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":114792,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementalandFigureslegeds.docx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/bae4cdae7f7b8b74f0e26336.docx"},{"id":79823267,"identity":"9d0fbdfb-74b3-48ad-b886-2652bbbe06e8","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":38510,"visible":true,"origin":"","legend":"","description":"","filename":"TableS1.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/a3d04ac2c852d5409fc8fadf.xlsx"},{"id":79823262,"identity":"1c7110b6-69d2-4e23-8ee7-462cf89f89f9","added_by":"auto","created_at":"2025-04-03 09:09:13","extension":"xlsx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":12029,"visible":true,"origin":"","legend":"","description":"","filename":"TableS2.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/42373f1236e6f9c19729ac32.xlsx"},{"id":79823187,"identity":"41a7dfb8-e7e8-4e10-8898-e555623d3ec4","added_by":"auto","created_at":"2025-04-03 09:09:09","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":10193,"visible":true,"origin":"","legend":"","description":"","filename":"TableS3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/353d0016be8f9e2d1e84d9d1.xlsx"},{"id":79824170,"identity":"431af356-6733-4847-b279-71fbde465c25","added_by":"auto","created_at":"2025-04-03 09:17:09","extension":"xlsx","order_by":5,"title":"","display":"","copyAsset":false,"role":"supplement","size":19691,"visible":true,"origin":"","legend":"","description":"","filename":"TableS4.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/97ab8ce141652627ccd27ffc.xlsx"},{"id":79826105,"identity":"1c61c7a2-955d-4f7b-955d-9dd0a1bd08f3","added_by":"auto","created_at":"2025-04-03 09:33:11","extension":"xlsx","order_by":6,"title":"","display":"","copyAsset":false,"role":"supplement","size":56266,"visible":true,"origin":"","legend":"","description":"","filename":"TableS5.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/d4305dd06f32a4981ca8fa51.xlsx"},{"id":79824171,"identity":"79449b94-48bb-4b91-8068-ede5752a7376","added_by":"auto","created_at":"2025-04-03 09:17:10","extension":"xlsx","order_by":7,"title":"","display":"","copyAsset":false,"role":"supplement","size":57546,"visible":true,"origin":"","legend":"","description":"","filename":"TableS6.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/a4e03e4b753f2f001bea66ef.xlsx"},{"id":79825494,"identity":"77a6e421-2c95-4032-b8d2-2387cd050cc7","added_by":"auto","created_at":"2025-04-03 09:25:11","extension":"xlsx","order_by":8,"title":"","display":"","copyAsset":false,"role":"supplement","size":57721,"visible":true,"origin":"","legend":"","description":"","filename":"TableS7.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/57f1800945dd1b93aa829751.xlsx"},{"id":79823190,"identity":"b2a5ccd6-e165-420a-b506-b359a5f652b5","added_by":"auto","created_at":"2025-04-03 09:09:09","extension":"xlsx","order_by":9,"title":"","display":"","copyAsset":false,"role":"supplement","size":47469,"visible":true,"origin":"","legend":"","description":"","filename":"TableS8.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/7830dcdcd9435450e68a21c1.xlsx"},{"id":79825496,"identity":"fe849bc2-0ac4-462d-aba9-5f3edc654189","added_by":"auto","created_at":"2025-04-03 09:25:14","extension":"xlsx","order_by":10,"title":"","display":"","copyAsset":false,"role":"supplement","size":11798,"visible":true,"origin":"","legend":"","description":"","filename":"TableS9.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/909e5875eb233faae8500f21.xlsx"},{"id":79823274,"identity":"4643ff54-fdd0-4075-9182-38c0d89b8e3d","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"xlsx","order_by":11,"title":"","display":"","copyAsset":false,"role":"supplement","size":32986,"visible":true,"origin":"","legend":"","description":"","filename":"TableS10.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/97eea844466e14af0ab126d6.xlsx"},{"id":79823238,"identity":"fa1a20d6-ced7-4c28-aa7a-2f1ee7b09c75","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"xlsx","order_by":12,"title":"","display":"","copyAsset":false,"role":"supplement","size":10654,"visible":true,"origin":"","legend":"","description":"","filename":"TableS11.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/fadb33232e501d7a92fd25e9.xlsx"},{"id":79823288,"identity":"095dd408-6b2d-4a21-84e1-b74fa246c61d","added_by":"auto","created_at":"2025-04-03 09:09:15","extension":"xlsx","order_by":13,"title":"","display":"","copyAsset":false,"role":"supplement","size":16413,"visible":true,"origin":"","legend":"","description":"","filename":"TableS12.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/016bc3a1fc401b1338efd56f.xlsx"},{"id":79823243,"identity":"cd0bd933-368f-4d92-ac6b-f1a47381df2a","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"xlsx","order_by":14,"title":"","display":"","copyAsset":false,"role":"supplement","size":51460,"visible":true,"origin":"","legend":"","description":"","filename":"TableS13.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/a0056085e573b75a7fe53b6a.xlsx"},{"id":79823233,"identity":"94cb6c9c-147a-44b7-897e-b788e4d3d8c5","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"xlsx","order_by":15,"title":"","display":"","copyAsset":false,"role":"supplement","size":10058,"visible":true,"origin":"","legend":"","description":"","filename":"TableS14.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/3d62de7ce390de72838a37a8.xlsx"},{"id":79823191,"identity":"128f3a17-b4cf-4476-993a-b3ed22ea7e8f","added_by":"auto","created_at":"2025-04-03 09:09:09","extension":"xlsx","order_by":16,"title":"","display":"","copyAsset":false,"role":"supplement","size":10467,"visible":true,"origin":"","legend":"","description":"","filename":"TableS15.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/308124dd033fa09094c1032b.xlsx"},{"id":79823282,"identity":"421fd39b-da79-44cc-b0a2-09bb86c09349","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"tif","order_by":17,"title":"","display":"","copyAsset":false,"role":"supplement","size":3391448,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S1.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/42992d6ee6c4ca4974103d68.tif"},{"id":79823241,"identity":"5e4bcae1-c5fd-4f3c-b497-84ed58dc567c","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"tif","order_by":18,"title":"","display":"","copyAsset":false,"role":"supplement","size":4117124,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S2.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/0a7d27dd9367df48312fc858.tif"},{"id":79823272,"identity":"ae9de115-2498-4945-9a28-7506ebed627b","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"tif","order_by":19,"title":"","display":"","copyAsset":false,"role":"supplement","size":1133580,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S3.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/6302999a13fcd9003441f917.tif"},{"id":79823242,"identity":"9e15ef68-7b75-439e-89b3-fa3a9bf2c323","added_by":"auto","created_at":"2025-04-03 09:09:11","extension":"tif","order_by":20,"title":"","display":"","copyAsset":false,"role":"supplement","size":2947528,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S4.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/476507132651e06f929f3d1a.tif"},{"id":79823292,"identity":"da7d23ea-52ab-4c97-9720-44b38f439001","added_by":"auto","created_at":"2025-04-03 09:09:15","extension":"tif","order_by":21,"title":"","display":"","copyAsset":false,"role":"supplement","size":3089424,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S5.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/615c4b735bb2ae201a483a77.tif"},{"id":79823281,"identity":"45ab429f-00d1-4312-8c41-a6a31adbf4b3","added_by":"auto","created_at":"2025-04-03 09:09:14","extension":"tif","order_by":22,"title":"","display":"","copyAsset":false,"role":"supplement","size":3037640,"visible":true,"origin":"","legend":"","description":"","filename":"Fig.S6.tif","url":"https://assets-eu.researchsquare.com/files/rs-5667934/v1/3fa9aac40f728e22bdeb7c8d.tif"}],"financialInterests":"","formattedTitle":"GWAS and GS analysis revealed the selection and prediction efficiency for yield, plant morphological, and fiber quality in Gossypium barbadense","fulltext":[{"header":"Key message ","content":"\u003cp\u003e\u003cstrong\u003eGenetic variation in a \u003cem\u003eGossypium barbadense\u003c/em\u003e population was revealed using resquencing. GWAS and RNA-seq on \u003cem\u003eGossypium barbadense\u003c/em\u003e population identified several candidate genes associated with fiber length, micronaire and elongation.\u003c/strong\u003e\u003c/p\u003e\n"},{"header":"Introduction","content":"\u003cp\u003eCotton (\u003cem\u003eGossypium L.\u003c/em\u003e) is one of the world's premier sources of high-quality plant fibers. Its exceptional adaptability and thermal insulation properties make it an ideal choice for textile production (Jiang et al. \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Two primary cultivated tetraploid species (\u003cem\u003eG. hirsutum\u003c/em\u003e, AD\u003csub\u003e1\u003c/sub\u003e and \u003cem\u003eG. barbadense\u003c/em\u003e, AD\u003csub\u003e2\u003c/sub\u003e) are extensively cultivated in tropical and temperate regions. Key producing countries include India, China, the United States, Pakistan, and Brazil (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Zafar et al. \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Zaidi et al. \u003cspan citationid=\"CR82\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Upland cotton (\u003cem\u003eG. hirsutum\u003c/em\u003e) dominates global cotton production due to its broad environmental adaptability, accounting for 90% of output. In contrast, Sea Island cotton (\u003cem\u003eG. barbadense\u003c/em\u003e) represents only about 2% of total production because of its distinct regionalism (Sun et al. \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR71\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Sea island cotton, characterized by fine, long, and strong fibers, possesses unique thermal properties akin to those of animal fibers like cashmere, earning it the title \"the gem of fibers.\" Consequently, there is a strong demand to develop high-yield, high-quality Sea Island cotton varieties, driving significant interest in the genetic study of fiber quality and yield-related traits. Meanwhile, the complexity of Sea Island cotton's genetic traits presents enormous challenges. Currently, marker-assisted selection (MAS) and genomic selection (GS) are pivotal techniques for incorporating desirable traits. Notably, even when prediction accuracy is low, GS has proven superior to MAS (Cerrudo et al. \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Guo et al. 2012).\u003c/p\u003e \u003cp\u003eThe two tetraploid cotton species (\u003cem\u003eG. hirsutum\u003c/em\u003e and \u003cem\u003eG. barbadense, AD\u003c/em\u003e) arose from the hybridization of the A and D genomes approximately 1\u0026ndash;2\u0026nbsp;million years ago, followed by independent domestication across various regions (Hu et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Paterson et al. \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR69\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). To date, fifteen genome assemblies have been completed for nine Upland cotton varieties (TM-1, ZM24, NDM8, JBM, Zhongzhimian No. 2, B371, YZ1, Yuanmian11, and CSX8308) (Chen et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Hu et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Ma et al. \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Sreedasyam et al. \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Zhang et al. \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). In contrast, only three varieties of Sea Island cotton have undergone five genome assemblies. The genome assembly of Sea Island cotton (line 3\u0026ndash;79) was completed for the first time in 2015 (Yuan et al. \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Since then, it has been continually updated (Chen et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR70\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Subsequent assemblies of the Hai7124 genome in 2019 (Hu et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) and the Pima90 genome in 2021 (Ma et al. \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) further advanced Sea Island cotton genomics. Research on cotton genomics has rapidly expanded in recent years, including sequencing efforts for nearly 10,000 accessions (Cheng et al. \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Geng et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; He et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Huang et al. \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Nie et al. \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Conversely, Sea Island cotton remains in an early exploratory phase. Its narrow genetic variation and limited sample size have impeded the identification of specific genomic variations (Fang et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yu et al. 2021).\u003c/p\u003e \u003cp\u003eGenome selection (GS) has demonstrated significant potential in reducing breeding costs and enhancing breeding efficiency (Ez et al. \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Fu et al. \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Mir et al. \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). It was initially proposed by Meuwissen et al. in 2001 (Meuwissen et al. \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). CIMMYT first implemented GS in maize to develop technical models, evaluate factors influencing prediction accuracy, and establish GS protocols. It was found that GS models incorporating gene-environment interactions substantially improve prediction efficiency for complex traits. High-density markers further enhance the accuracy of phenotypic prediction (Zhang et al. \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Moreover, the optimal GS model depends on the genetic architecture of the target traits (Montesinos-Lopez et al. \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Velez-Torres et al. \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), and incorporating multi-year data can increase prediction accuracy. To date, GS breeding has been successfully applied in crops such as soybean (Canella et al. \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), wheat (Rabieyan et al. \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), maize (Technow et al. \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), and canola (Werner et al. \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), along with the establishment of several intelligent breeding platforms (Li et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Xu et al. \u003cspan citationid=\"CR76\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). However, the scarcity of GS in cotton research remarkably impedes improving cotton breeding efficiency.\u003c/p\u003e \u003cp\u003eDespite the exceptional fiber quality of Sea Island cotton, which makes it a highly prized resource, its relatively low yield significantly hinders widespread commercialization. Consequently, genomic research on yield, fiber quality, and plant morphological in Sea Island cotton is rare. Fan (Fan et al. \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), utilizing GBS-SNP technology, constructed the first intra-species linkage map for Sea Island cotton (5917 \u0026times; Pima S-7), spanning 3,076.23 cM with an average marker density of 1.09 cM. This study identified 24 quantitative trait loci (QTLs) related to fiber quality and 18 QTLs associated with yield using 143 recombinant inbred lines (RILs). Su (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) employed the CottonSNP80K array and identified the gene \u003cem\u003eGB_A03G0335\u003c/em\u003e (encoding E3 ubiquitin-protein ligase) as being linked to FL, fiber strength (FS), fiber uniformity (FU), and FE. Yu (Yu et al. 2021) through GWAS of 240 varieties, discovered three genes associated with FS: \u003cem\u003eGB_D11G3437\u003c/em\u003e (encoding casein kinase 1-like protein HD16, involved in regulating flowering time via gibberellin signaling), \u003cem\u003eGB_D11G3460\u003c/em\u003e (encoding a WVD2/WDL family microtubule-associated protein that modulates cortical microtubule orientation), and \u003cem\u003eGB_D11G3471\u003c/em\u003e (encoding tubulin alpha-1 chain (TUBA1), a core component of cytoskeletal microtubules). Additionally, two genes related to lint percentage (LP) were identified: \u003cem\u003eGB_A07G1034\u003c/em\u003e (HERK1, a receptor kinase essential for BR-regulated cell elongation) and \u003cem\u003eGB_A13G0822\u003c/em\u003e (GbTCP, which regulates fiber and root hair development through jasmonic acid biosynthesis and other pathways). Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), through GWAS analysis of 336 varieties, identified five genes associated with four traits (Fusarium wilt resistance, FL, FS, and LP), including \u003cem\u003eGbar_A05G017500\u003c/em\u003e (encoding a PUB4 ubiquitin ligase), \u003cem\u003eGbar_D11G032670\u003c/em\u003e (encoding HD16 protein), \u003cem\u003eGbar_A05G014160\u003c/em\u003e (encoding a RING-type zinc finger E3 ubiquitin ligase from the RBR family), \u003cem\u003eGbar_D03G001430\u003c/em\u003e (encoding a putative ZHD6 protein), and \u003cem\u003eGbar_D03G001910\u003c/em\u003e (encoding a predicted WAKL14 receptor kinase). Song (Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), carried out GWAS on 269 varieties and identified the gene \u003cem\u003eGB_D03G0092\u003c/em\u003e. A comparison between \u003cem\u003eGB_D03G0092\u003c/em\u003e\u003csup\u003eH\u003c/sup\u003e and \u003cem\u003eGB_D03G0092\u003c/em\u003e\u003csup\u003eB\u003c/sup\u003e revealed that frameshift mutation caused by 1-bp deletion significantly enhanced fiber quality in Sea Island cotton.\u003c/p\u003e \u003cp\u003eThis study leveraged GWAS to investigate the genetic associations between phenotypes and genotypes in Sea Island cotton. A comprehensive phenotypic assessment was conducted for 203 Sea Island cotton accessions across 15 traits over five years and four locations. GWAS was performed based on phenotypic data with resequencing results involving yield, fiber quality, and plant morphological. Moreover, RNA sequencing identified six candidate genes. For the first time, GS was applied to evaluate the impact of training population size and marker density on prediction accuracy. These findings offer a valuable reference for the genetic improvement of Sea Island cotton and the efficient breeding of high-quality cotton varieties.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003ePlant Material\u003c/p\u003e \u003cp\u003eThe study incorporated 203 Sea Island cotton (\u003cem\u003eGossypium barbadense\u003c/em\u003e) accessions sourced from the germplasm repository at the College of Agriculture, Xinjiang Agricultural University (Cotton Breeding Center, Ministry of Education). These accessions represent cotton-producing regions worldwide, including China (Xinjiang, the Yangtze River Basin, the Yellow River Basin, and the Pearl River Basin), the United States, the former Soviet Union, Albania, and Egypt, spanning Asia, Europe, Africa, and North America (Table S1). Prior to experimentation, all accessions had undergone multiple generations of self-pollination and exhibited normal growth and development under natural field conditions.\u003c/p\u003e \u003cp\u003eField Trial and Phenotypic Evaluation\u003c/p\u003e \u003cp\u003eField trials were conducted over five years at four locations in Xinjiang (totaling nine environments) to evaluate 15 phenotypic traits (Table S5 Table S6 and Table S7). The four trial sites included southern Xinjiang (S) (Korla and Aral) and northern Xinjiang (N) (Shihezi and Changji) (Table S14). A completely randomized design was employed, with two replicates per accession. Each accession was planted in two rows, with a row length of 2.50 meters, a row spacing of 0.66 meters, and a plant spacing of 0.10 meters. Sowing occurred in mid-April, and harvesting was completed by late October in southern Xinjiang and early to mid-October in northern Xinjiang.\u003c/p\u003e \u003cp\u003eAt the time of harvest, phenotypic data were collected for the following traits: plant height (PH), fruit-branch number per plant (FBN), effective fruit-branch number per plant (EFBN), boll number per plant (BN), effective boll number per plant (EBN), number of boll drops (NBD), height of first fruit-branch node (HFFBN) and first fruit-branch nodes (FFBN). Twenty naturally opened bolls were harvested for variety testing and flowering evaluation to assess boll weight (BW) and lint percentage (LP). Fiber quality was evaluated at the China Colored Cotton Company in Urumqi, utilizing the HFT9000 instrument to measure fiber length (FL), fiber strength (FS), fiber micronaire (MIC), fiber uniformity (FU), and fiber elongation (FE). Data collected over five years from nine distinct environments were statistically analyzed using Excel 2020. The broad sense heritability and the best linear unbiased prediction (BLUP) of breeding value were obtained using R 4.0 (the Matrix and lme4 packages). The BLUP values were categorized into three types: southern Xinjiang BLUP values (S-traits), northern Xinjiang BLUP values (N-traits), and overall environmental BLUP values (Traits).\u003c/p\u003e \u003cp\u003eDNA Extraction and Sequencing\u003c/p\u003e \u003cp\u003eSeeds were cultivated indoors until the trefoil stage. Then, fresh leaf samples were collected and immediately snap-frozen in liquid nitrogen. Genomic DNA was extracted using a plant DNA extraction kit, verifying the purity and integrity (Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The qualified genomic DNA samples were dispatched to Biomarker Technologies in Beijing for library construction and sequenced on an Illumina HiSeq PE 150 platform. The sequenced reads underwent quality control and filtering for subsequent analysis. Reads containing adapters and exhibiting low quality (single-end reads with more than 10% N bases or having more than 50% of bases with a quality score (Q) below 5) were eliminated (Yu et al. 2021).\u003c/p\u003e \u003cp\u003eSequence Alignment\u003c/p\u003e \u003cp\u003eAn index file for the reference genome \"Hai 7124\" was constructed. The qualified sequencing data were aligned to the reference using BWA v2.2.2 (Li and Durbin \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). The Haplotype Caller functionality of SAMtools v1.9, mosdepth v0.3.1, and GATK v3.8 was employed for variant detection and statistical analysis, including metrics such as sample alignment rate, sequencing depth, and genome coverage (Li et al. \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2009\u003c/span\u003e; McKenna et al. \u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e2010\u003c/span\u003e; Pedersen and Quinlan \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). SNP filtering was conducted based on the following criteria: QD\u0026thinsp;\u0026lt;\u0026thinsp;5.0, MQ\u0026thinsp;\u0026lt;\u0026thinsp;40.0, FS\u0026thinsp;\u0026gt;\u0026thinsp;60.0, QUAL\u0026thinsp;\u0026lt;\u0026thinsp;30.0, MQrankSum \u0026lt; -12.5, and ReadPosRankSum \u0026lt; -8.0. Other parameters defaulted. For InDels, the filtering criteria included QD\u0026thinsp;\u0026lt;\u0026thinsp;2.0, MQ\u0026thinsp;\u0026lt;\u0026thinsp;40.0, FS\u0026thinsp;\u0026gt;\u0026thinsp;100.0, MQrankSum \u0026lt; -10.0, ReadPosRankSum \u0026lt; -10.0, and QUAL\u0026thinsp;\u0026lt;\u0026thinsp;30.0 (Yu et al. 2021). VCFtools v0.1.13 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://vcftools.github.io/examples.html\u003c/span\u003e\u003cspan address=\"https://vcftools.github.io/examples.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was utilized to apply filters for minor allele frequency (MAF)\u0026thinsp;\u0026ge;\u0026thinsp;0.05 and missing data\u0026thinsp;\u0026le;\u0026thinsp;20%, excluding low-quality SNPs from further graphing. Genotype imputation was performed using Beagle, contributing to downstream analysis.\u003c/p\u003e \u003cp\u003eVariant Annotation\u003c/p\u003e \u003cp\u003eThe annotation information from the reference genome \"Hai7124\" was adopted to annotate SNPs based on their physical positions utilizing SnpEff v3.6c (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pcingola.github.io/SnpEff/\u003c/span\u003e\u003cspan address=\"https://pcingola.github.io/SnpEff/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (Ai et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Liggett et al. \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). The SNPs were categorized into intergenic, upstream, downstream, exonic, and intronic regions. The exonic SNPs were further divided into star lost, stop gain, stop lost, synonymous stop, synonymous SNPs, and nonsynonymous SNPs.\u003c/p\u003e \u003cp\u003ePopulation Structure and Principal Component Analysis\u003c/p\u003e \u003cp\u003ePLINK v1.90 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.cog-genomics.org/plink2\u003c/span\u003e\u003cspan address=\"http://www.cog-genomics.org/plink2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was utilized to format the data and select effective SNPs, employing the following parameters: indep-pairwise with a window size of 50, step size of 50, and r\u0026sup2; threshold of 0.2. Admixture v1.30 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://software.genetics.ucla.edu/admixture/\u003c/span\u003e\u003cspan address=\"http://software.genetics.ucla.edu/admixture/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was employed for population structure analysis, with a convergence threshold (C) of 0.01 and five-fold cross-validation (Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Principal component analysis (PCA) was performed using GCTA 1.92.4, and the significance was calculated using the twstats function in EIG 6.1.4 (Yu et al. 2021).\u003c/p\u003e \u003cp\u003eLinkage Disequilibrium Analysis\u003c/p\u003e \u003cp\u003eLinkage disequilibrium (LD) analysis for the Sea Island cotton population and its subgroups was performed using PLINK v1.90 (r\u0026sup2; values were used to assess LD). The parameters were set as follows: --ld-window 999999 -ld-window-kb 2000 -ld-window-r\u0026sup2; 0. Statistical analysis of the results was performed using a Perl script.\u003c/p\u003e \u003cp\u003ePhylogenetic Tree Construction and Genetic Diversity Analysis\u003c/p\u003e \u003cp\u003eBased on 2,717,759 high-quality SNP markers covering the entire genome of Sea Island cotton, a neighbor-joining (NJ) phylogenetic tree was constructed using VCF2Dis v1.50 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/BGI-shenzhen/VCF2Dis\u003c/span\u003e\u003cspan address=\"https://github.com/BGI-shenzhen/VCF2Dis\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (Zhang et al. \u003cspan citationid=\"CR85\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Nucleotide polymorphism (θπ) was computed using VCFtools 0.1.13 with a window parameter of 100 Kb, evaluating the genetic diversity of the Sea Island cotton population (Yu et al. 2021).\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eGWAS\u003c/h2\u003e \u003cp\u003eIn order to mitigate the environmental impact on association analysis, BLUP values were calculated, resulting in three datasets: S-BLUP, N-BLUP, and BLUP, which were subsequently used for GWAS. GEMMA 0.98.5 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://www.xzlab.org/software.html\u003c/span\u003e\u003cspan address=\"http://www.xzlab.org/software.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) was employed to calculate the standard kinship matrix with the following parameters: -bfile gene -gk 2 -o kin. Significant principal components were incorporated as covariates for GWAS, rectifying the influence of population structure on the results. The parameters included -bfile gene -lmm 1 -n Traits -c PC -k kin -o (where \"gene\" denotes the genotype file, \"Traits\" represents the column numbers of the traits, \"PC\" refers to the principal component covariate file, and \"kin\" indicates the kinship matrix).\u003c/p\u003e \u003cp\u003eTranscriptome Data Analysis\u003c/p\u003e \u003cp\u003eTranscriptome data for various tissues, ovules, and fiber developmental stages of the Sea Island cotton 'Hai7124' (reference genome) were acquired, in addition to transcriptome data for 'XH58' (FL_Long) and 'Ashi' (FL_Short) at different fiber developmental stages (accession numbers: PRJNA490626 and GSE184965). The data were converted to the fastq format, followed by quality control, filtering, and alignment to calculate FPKM values (Pertea et al. \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eExpression Analysis\u003c/p\u003e \u003cp\u003eExpression analysis was performed at various stages of boll growth (0, 5, 10, 15, 20, 25, and 30 days post-anthesis (DPA) using differential materials harvested from the bolls. For FL, So717 (long fiber length) and Ashi (short fiber length) were selected. For MIC, Ashi (high fiber micronaire) and Tu79-713 (low fiber micronaire) were chosen. For FE, 65-3049-6 (high fiber elongation) and XH25 (low fiber elongation) were utilized. Total RNA was extracted, followed by reverse transcription quantitative PCR (qRT-PCR) analysis. \u003cem\u003eGbUBQ7\u003c/em\u003e served as the internal reference gene. Each sample had three biological replicates and technical replicates (Table S15).\u003c/p\u003e \u003cp\u003eKey Gene Selection\u003c/p\u003e \u003cp\u003eThe association analysis results underwent Bonferroni correction, with a significance threshold of -log(\u003cem\u003eP\u003c/em\u003e)\u0026thinsp;\u0026gt;\u0026thinsp;5.4. Key genes were selected based on their consistent identification across at least two environments, accompanied by prominent peaks. Gene annotation for selected regions was conducted using ANNOVA v1.0.0 (Wang et al. \u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e2010\u003c/span\u003e), filtering nonsynonymous SNPs (resulting in amino acid changes). Finally, critical genes of interest were identified by integrating expression profiling, transcriptome data, phenotypic traits, and quantitative PCR analysis.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eGS\u003c/h3\u003e\n\u003cp\u003ePLINK v1.90 was utilized to filter out redundant SNPs, yielding 268,995 effective markers. R 4.3.1 with the rrBLUP package was employed for GS (Endelman \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). The five-fold cross-validation approach was implemented to evaluate the influence of training population size and marker quantity on the prediction accuracy of GS. Eighty percent of the samples were randomly designated as the training population, and the remaining twenty percent served as the prediction population. This process was repeated 100 times to enhance the likelihood of incorporating all samples, providing a comprehensive assessment of various factors affecting prediction accuracy. Moreover, the effects of different training population ratios on GS prediction accuracy were investigated. The training population was randomly selected, ranging from 10\u0026ndash;90% (in 10% increments), and the remaining varieties served as the prediction population, repeating 100 times. Additionally, the impact of varying marker counts on prediction accuracy was examined. The marker count was set at 10, 50, 100, 500, 1,000, 5,000, 10,000, and 50,000. Similarly, this procedure was randomly repeated 100 times.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eGenomic Variability and Population Structure of Sea Island Cotton\u003c/p\u003e \u003cp\u003eA total of 5.4 Tb of sequencing data was derived, achieving a Q30 score of 93.31%. The average alignment rate of the Sea Island cotton population to the reference genome (Table S1) was 97.33%, with an average coverage depth of 11.02\u0026times; and a coverage ratio of 97.96% (indicating at least one base was covered) (Table S1). This study identified 2,718,759 high-quality SNPs (Table S2 and Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Fig. S1) and 353,901 high-quality InDels (Table S3, Fig. S2), which were unevenly distributed across the 26 chromosomes of Sea Island cotton. Specifically, 1,633,794 SNPs were located in the At sub-group (60.10%), approximately 1.5 times that of the Dt sub-group (1,084,965 SNPs, 39.90%). This is consistent with previous findings that the At sub-group is approximately twice as large as the Dt sub-group (Li et al. \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).The annotation of SNPs revealed that they were concentrated in intergenic regions, which comprised 71.11% of the entire genome. The intronic regions accounted for only 4.57%. The exonic regions contained 54,028 SNPs, constituting merely 1.99% of the total genome. Meanwhile, 33,793 nonsynonymous mutations were identified.\u003c/p\u003e \u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eSummary of SNPs information of Sea Island cotton\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRegion\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eCategory\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSNP\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\" morerows=\"5\" rowspan=\"6\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eDownstream\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e262,205\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eUpstream\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e342,830\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIntergenic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1,933,286\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eIntronic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e124,383\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSplice\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2,026\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e5\u0026rsquo; UTR prime\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e1\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eStar lost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e71\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eStop gain\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e794\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eStop lost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e230\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSynonymous\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e19,099\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNonsynonymous\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e33,793\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eExonic\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSynonymous stop\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e41\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e2,718,759\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e \u003cp\u003eA neighbor-joining (NJ) phylogenetic tree was constructed (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003ea), along with population structure analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003ee) and PCA (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003eb and Table S4) to elucidate the evolutionary relationships among the 203 Sea Island cotton varieties. The Sea Island cotton population was classified into three groups: G1 comprised 134 varieties with diverse origins. Modern Sea Island cotton varieties were dominant, including 37 from the former Soviet Union, 26 from Xinjiang, 45 from the Yangtze-Huanghe and Pearl River basins, six from the United States, 16 from Egypt, two from Albania, and two from unidentified sources. G2 consisted of 42 varieties derived from early materials in Xinjiang and the former Soviet Union. G3 had 27 varieties, mainly from early materials in the United States, Through the calculation of diversity index (θπ), the average θπ across all varieties was 3.96\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e, with G1 at 3.85\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e, G2 at 2.95\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e, and G3 at 2.53\u0026times;10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e, suggesting a high level of diversity in modern varieties (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003ec). The FST values calculated between subgroups were 0.110, 0.108, and 0.154, revealing a greater genetic differentiation between G2 and G3 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003ec). LD analysis of the three subgroups indicated that the r\u0026sup2; value declined to 0.5 at a distance of 442 kb for the overall population, which is significantly less than the distances reported by Song (Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) (2000 kb) and Yu (Yu et al. 2021) (1000 kb), but greater than the 388 kb observed by Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) (half of the maximum value). The decay distances for G3 and G2 were greater, measuring 1271 kb and 922 kb, respectively. G1 and the overall population exhibited a consistent decay trend with a distance of 344 kb (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e1\u003c/span\u003ed).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003ePhenotypic Analysis\u003c/p\u003e \u003cp\u003eBLUP values for 15 traits were calculated separately for the northern and southern regions of Xinjiang. The northern region exhibited significantly higher variability than the southern area (Table S5 and Table S6). A T-test analysis conducted on traits from both regions revealed that PH, FBN, EFBN, BN, EBN, BW, HFFBN, and NBD were significantly lower in the southern region of Xinjiang (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). In contrast, LP, FFBN, FL, FS, and FU were evidently higher in the southern region (\u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001). No significant differences were observed in MIC and FE (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e2\u003c/span\u003e, Table S5 and S6), indicating that Sea Island cotton fibers from the southern region are superior, characterized by higher seed cotton percentages and greater suitability for cultivation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTo further elucidate the relationships among traits, breeding values (BLUP) were computed across nine environmental conditions. Notable positive correlations were found between LP and several traits, including PH, FBN, EFBN, BN, EBN, BW, NBD, HFFBN, FFBN, FL, FS, and FU. Conversely, LP exhibited a significant negative correlation with FE. Furthermore, MIC demonstrated positive correlations with FL, FS, and FU. FL was positively correlated with FS (Fig. S3). The assessment of broad sense heritability revealed that it was high in fiber quality traits and low in plant morphological (Table S7).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAssociation Analysis\u003c/p\u003e \u003cp\u003eGWAS was conducted on three types of data: S-BLUP, N-BLUP, and BLUP. The results were subjected to Bonferroni correction, with a significance threshold of -log\u003csub\u003e10\u003c/sub\u003e(\u003cem\u003eP\u003c/em\u003e)\u0026thinsp;\u0026gt;\u0026thinsp;5.4. At least two environments and distinct co-located peaks were set as selection criteria. A total of 26 significant SNPs were identified as being associated with yield, and 216 significant SNPs were linked to fiber quality. Notably, the At and Dt subgroups contained 192 and 50 significant signals, respectively (Table S8).\u003c/p\u003e \u003cp\u003eAn LD threshold of 500 kb (slightly larger than the 442 kb threshold) (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) was used to screen selected regions. For PH, two selected regions were identified at (A02: 3,129,702-3,129,709) and (D09: 3,799,753-3,980,726). For FBN, two selected regions were identified at (D07: 19,238,177\u0026thinsp;\u0026minus;\u0026thinsp;20,094,267) and (D07: 25,323,417\u0026thinsp;\u0026minus;\u0026thinsp;28,205,761). The selected region for LP was located at (D05: 35,622,366\u0026thinsp;\u0026minus;\u0026thinsp;36,321,572). Significant selected regions for FL were identified at A05: 16,864,112\u0026thinsp;\u0026minus;\u0026thinsp;18,143,374 and A06: 4,479,110-6,077,420. For FS, significant selected regions were found at four locations: A02: 50,299,487\u0026thinsp;\u0026minus;\u0026thinsp;50,299,496, D01: 6,732,815-6,732,837, D05: 62,911,399\u0026thinsp;\u0026minus;\u0026thinsp;63,597,521, and D13: 50,210,965\u0026thinsp;\u0026minus;\u0026thinsp;50,211,014. The selected region for MIC was located at A05: 16,903,207\u0026thinsp;\u0026minus;\u0026thinsp;18,187,089, and that for FE was found at A05: 15,688,837\u0026thinsp;\u0026minus;\u0026thinsp;16,889,136 (Table S8). Notably, the selected regions for FL and MIC exhibited considerable overlap at the physical location on chromosome A05. Meanwhile, the selected region of FE at A05: 15,688,837\u0026thinsp;\u0026minus;\u0026thinsp;16,889,136 was partially overlapping with that of FL, approximately 14 kb away from the selected region of MIC (Table S8). Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) identified an FL-associated gene (\u003cem\u003eGbar_A05G017250/GB_A05G1753\u003c/em\u003e) situated within the FE region identified in this study. This selected region of FL overlapped with the traits of FL, FE, and MIC in this study. Furthermore, Su (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) reported one quantitative trait nucleotide (QTN) each in the FL and FE regions (TM10754 and TM10723), which also exhibited partial overlap with the traits of FL, FE, and MIC identified in this research (Table S8).\u003c/p\u003e \u003cp\u003eA total of 242 related genes were annotated within the candidate regions, among which 153 occurred nonsynonymous mutations and led to amino acid changes. Subsequent expression profiling, transcriptome analysis, and qRT-PCR refined six target genes among these 153 genes.\u003c/p\u003e \u003cp\u003eFiber length\u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1764\u003c/em\u003e is a homolog of the \u003cem\u003eArabidopsis\u003c/em\u003e gene \u003cem\u003eAT4G05530\u003c/em\u003e, which encodes a peroxisomal member of the short-chain dehydrogenase family (IBR1). IBR1 serves as a catalyst for the deoxidation process involved in the conversion of indole-3-butyric acid (IBA) to indole-3-acetic acid (IAA). Additionally, IAA derived from IBA facilitates the expansion of root hairs and cotyledon cells during the developmental stages of Arabidopsis seedlings (Spiess et al. \u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Strader et al. \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). Zhao (Zhao et al. \u003cspan citationid=\"CR87\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) disclosed that through IBR1 mediation, IBA-to-IAA conversion can promote root hair elongation in \u003cem\u003eArabidopsis\u003c/em\u003e. In the genomic region spanning 16.88\u0026ndash;16.90 Mb on chromosome A05, two nonsynonymous SNPs were identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003ea, c). The first (A/T) results in a codon substitution of aspartic acid with lysine. The second (T/A) leads to the replacement of lysine with methionine, thereby influencing FL (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003eb, f). Most early varieties (G2 and G3) predominantly harbor the haplotype AT (long fiber length). In contrast, modern varieties (G3) exhibit a significant increase in the haplotype TA (short fiber length) (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003ed). This shift may be attributed to targeted selection in contemporary Sea Island cotton breeding, where linkage between traits altered haplotype proportion. Expression profiling revealed that \u003cem\u003eGB_A05G1764\u003c/em\u003e was expressed at low levels during the rapid elongation phase of ovule and fiber (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003ee). Transcriptomic analysis further demonstrated significant differential expression between short fiber (Ashi) and long fiber length (So717) varieties in this period. Short fiber length varieties exhibited higher expression levels (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003eg). Quantitative qRT-PCR assays conducted on fibers from long fiber length (So717) and short fiber length (Ashi) varieties at 0\u0026ndash;30 days post-anthesis (DPA) corroborated the transcriptomic findings (Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e3\u003c/span\u003eh). It can be inferred that \u003cem\u003eGB_A05G1764\u003c/em\u003e mediates the negative regulatory effect of IBR1 on FL.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1761\u003c/em\u003e encodes a carboxylesterase. Within the genomic region of 16.86\u0026ndash;16.88 Mb on chromosome A05, one nonsynonymous SNP was identified (Fig. S4a, c). The A (long fiber length) to T (short fiber length) substitution results in a codon change from threonine to serine, which also influences FL (Fig. S4b, f). Most early varieties (G2 and G3) have a higher proportion of the haplotype (A), whereas modern varieties show a marked increase in the proportion of the haplotype (T) (Fig. S4d). Expression profiling showed that \u003cem\u003eGB_A05G1761\u003c/em\u003e was significantly upregulated during the phases of FE and secondary wall synthesis, notably higher than those in vegetative organs, floral organs, and ovules (Fig. S4e). This indicates that \u003cem\u003eGB_A05G1761\u003c/em\u003e mainly affects fiber development. Transcriptomic analysis revealed a rapid increase in the expression of \u003cem\u003eGB_A05G1761\u003c/em\u003e in fibers during the 5\u0026ndash;10 DPA period, followed by a gradual decrease in expression from 10\u0026ndash;25 DPA, with long fiber varieties displaying consistently higher expression levels (Fig. S4g). qRT-PCR experiments performed on fibers from long fiber length (So717) and short fiber length (Ashi) varieties at 0\u0026ndash;30 DPA yielded results consistent with the transcriptomic data (Fig. S4h). Thus, it can be concluded that \u003cem\u003eGB_A05G1761\u003c/em\u003e positively regulates fiber elongation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFiber micronaire\u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1895\u003c/em\u003e encodes a protein from the abscisic acid-responsive family (TB2/DP1, HVA22). Abscisic acid (ABA) is critical for regulating plant growth and senescence. It plays a significant role in cotton fiber development and has a negative correlation with fiber elongation (Dou et al. \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; S. H. Dasani \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2006\u003c/span\u003e; Yang et al. \u003cspan citationid=\"CR78\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Within the region spanning 18.0999 to 18.1015 Mb on chromosome A05, a nonsynonymous SNP was identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003ea, b). The transition from T (low fiber micronaire) to A (high fiber micronaire) alters phenylalanine to tyrosine. Most early cotton varieties carry the haplotype (T), while modern varieties show a significant increase in the proportion of haplotype (A). This may be attributed to the directional selection for fiber micronaire in contemporary Sea Island cotton breeding (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eb, f). Expression profile analysis presented high expression levels of \u003cem\u003eGB_A05G1895\u003c/em\u003e in ovules (5\u0026ndash;10 DPA) and fibers (10 DPA), suggesting its primary role in fiber development and reproductive processes in Sea Island cotton (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003ee). qRT-PCR experiments conducted on fibers from high fiber micronaire (Ashi) and low fiber micronaire (Tu79-713) varieties during the 0\u0026ndash;30 DPA period revealed a consistent increase in relative expression levels during the rapid elongation phase (10\u0026ndash;20 DPA), peaking at 20 DPA. Notably, the relative expression in Ashi was significantly higher than that in Tu79-713 (Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e4\u003c/span\u003eg). Therefore, it can be inferred that \u003cem\u003eGB_A05G1895\u003c/em\u003e positively regulates MIC in response to ABA.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1771\u003c/em\u003e encodes an early nodulin-like protein (ENOD), which is associated with the differentiation of specialized sieve tube cells and the regulation of cellular dimensions. The ENOD40 gene has been proven to reduce cell size, and nodulin-like proteins in species such as watermelons and tomatoes promote fruit development and maturation (M et al. 2005; Wechter et al. \u003cspan citationid=\"CR74\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). Within the region from 16.92 to 17.00 Mb on chromosome A05, two nonsynonymous SNPs were identified (Fig. S5a, c). The first SNP (T/G) leads to a codon change from valine to glycine, while the second SNP (A/G) results in a substitution from lysine to arginine, thereby affecting fiber micronaire (Fig. S5b, f). Most early varieties primarily possess the haplotype (TA, low fiber micronaire), whereas modern varieties exhibit a significant increase in the haplotype (GG, high fiber micronaire), aligning with the emphasis on MIC in recent breeding efforts for Sea Island cotton. Expression profiling exhibited that \u003cem\u003eGB_A05G1771\u003c/em\u003e was highly expressed in ovules (5\u0026ndash;20 DPA) and fibers (10\u0026ndash;20 DPA), particularly in fibers (Fig. S5e), implying its predominant influence on fiber development. Results from qRT-PCR experiments on fibers from high Fiber micronaire (Ashi) and low Fiber micronaire (Tu79-713) varieties during the 0\u0026ndash;30 DPA period demonstrated that the relative expression levels of Ashi were significantly higher than those of Tu79-713 (Fig. S5g). Consequently, it is concluded that \u003cem\u003eGB_A05G1771\u003c/em\u003e positively influences fiber micronaire by regulating cell size and fiber maturation.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFiber elongation\u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1702\u003c/em\u003e belongs to the structural protein family known as NAD(P)-binding Rossmann-fold superfamily (BAN). BAN is indirectly related to the dynamics of flax cell walls, playing a critical role in fiber morphology and mechanical properties (Chabi et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Within the region spanning 16.3 to 16.5 Mb on chromosome A05 (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e5\u003c/span\u003ea, c), two nonsynonymous SNPs were identified (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e5\u003c/span\u003eb, f). The first SNP (G/A) results in a codon change from threonine to isoleucine, while the second SNP (C/A) causes a substitution from alanine to serine, thereby altering FE. The proportions of haplotypes GC (high fiber elongation) and AA (low fiber elongation) are approximately equal (1:1) in both early and modern varieties, denoting a limited selection pressure on Fiber elongation in recent breeding practices (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e5\u003c/span\u003ed). Expression analysis demonstrated that \u003cem\u003eGB_A05G1702\u003c/em\u003e was highly expressed in ovules (3\u0026ndash;10 DPA) and fibers (10 DPA), significantly surpassing those in vegetative and floral organs (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e5\u003c/span\u003ee). This suggests its primary regulatory role in fiber growth and development. qRT-PCR experiments conducted on fibers from high fiber elongation (65-3049-6) and low fiber elongation (XH25) varieties during the 0\u0026ndash;30 DPA period indicated a continuous decrease in the relative expression levels of \u003cem\u003eGB_A05G1702\u003c/em\u003e (10\u0026ndash;30 DPA), with significantly higher expression levels in 65-3049-6 compared to XH25 (Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e5\u003c/span\u003eg). Thus, it is inferred that \u003cem\u003eGB_A05G1702\u003c/em\u003e positively influences FE through indirect effects on cell wall structure.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eGB_A05G1707\u003c/em\u003e encodes a basic helix-loop-helix (bHLH) transcription factor, which is implicated in brassinosteroid (BR) signaling during the development of cotton fibers (Lu et al. \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Lu et al. \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Within the region of chromosome A05 spanning 16.3 to 16.5 Mb, a nonsynonymous SNP was identified (Fig. S6a, c). The transition from G (high fiber elongation) to A (low fiber elongation) results in a codon change that alters the amino acid from alanine to threonine, thereby impacting fiber elongation rates (Fig. S6b, f). The haplotype distribution (G/A) in both early and modern varieties is approximately 1:1 (Fig. S6d), suggesting a potential linkage to the relevant traits. Expression profiling showed low expression levels of \u003cem\u003eGB_A05G1707\u003c/em\u003e in ovules and during the rapid elongation phase of fibers (Fig. S6e). The elevated expression in fibers at 20 DPA suggests its predominant role in fiber development. qRT-PCR analyses of fibers from high fiber elongation (65-3049-6) and low fiber elongation (XH25) varieties during the 0\u0026ndash;30 DPA period revealed a consistent decrease in relative expression from 5\u0026ndash;10 DPA, followed by a progressive increase from 15\u0026ndash;25 DPA, peaking at 25 DPA. Notably, the relative expression level in 65-3049-6 was significantly higher than in XH25 (Fig. S6g). Thus, it is inferred that \u003cem\u003eGB_A05G1707\u003c/em\u003e positively influences FE rates by indirectly affecting BR signaling.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eGenome-wide selection\u003c/p\u003e \u003cp\u003eThe Impact of Training Population Proportions on Prediction Accuracy\u003c/p\u003e \u003cp\u003eVarious proportions of the training population (10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90%) were evaluated across 100 iterations to enhance prediction accuracy. The mean prediction accuracy was calculated as the estimation accuracy for phenotypic traits.\u003c/p\u003e \u003cp\u003eAt a training population proportion of 10% for Sea Island cotton phenotypes (PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE), prediction accuracy values were relatively low, recorded as 0.68, 0.49, 0.54, 0.34, 0.22, 0.20, 0.39, 0.12, 0.30, 0.18, 0.49, 0.75, 0.28, 0.50, and 0.60, respectively. As the proportion increased, the prediction accuracy gradually improved until stabilization. The stable proportions were found to be 50%, 60%, 50%, 60%, 50%, 50%, 60%, 90%, 70%, 80%, 50%, 60%, 50%, 60%, and 60%, as determined by T-test analysis for optimal prediction ratios for PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE (Fig.\u0026nbsp;\u003cspan refid=\"Fig24\" class=\"InternalRef\"\u003e6\u003c/span\u003e). Overall, as the training population proportion increased, the prediction accuracy consistently improved and ultimately stabilized. For traits exhibiting high heritability (PH, FBN, EFBN, BN, FL, FS, MIC, FU, and FE), the optimal training population proportion was determined to be between 50% and 60%, yielding higher prediction accuracy. For traits with moderate heritability (NBD, HFFBN, and FFBN), the optimal training population proportion ranged from 70\u0026ndash;90%, resulting in lower prediction accuracy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe Effect of Varying Marker Quantities on Prediction Accuracy\u003c/p\u003e \u003cp\u003eVarious quantities (10, 50, 100, 500, 1,000, 5,000, 10,000, and 50,000) were randomly selected from 268,995 SNPs to assess the influence of marker quantities on prediction accuracy. Upon 100 iterations, the mean prediction accuracy was set as the prediction accuracy. For Sea Island cotton phenotypes (PH, FBN, EFBN, BN, EBN, BW, LP, NBD, HFFBN, FFBN, FL, FS, MIC, FU, and FE), when the number of markers was set to 10, the standard deviation was considerable, resulting in lower prediction accuracy values of 0.33, 0.28, 0.27, 0.19, 0.13, 0.16, 0.23, 0.10, 0.18, 0.12, 0.25, 0.37, 0.15, 0.26, and 0.27, respectively. As the number of markers increased, the standard deviation gradually diminished, and prediction accuracy steadily improved until stabilizing. For all traits, stability was observed at a marker quantity of 5,000, where the standard deviation was relatively small (Fig.\u0026nbsp;\u003cspan refid=\"Fig26\" class=\"InternalRef\"\u003e7\u003c/span\u003e). In summary, as the number of markers increased, prediction accuracy consistently improved until reaching stability. High heritability traits (PH, FBN, EFBN, BN, FL, FS, MIC, FU, and FE) demonstrated high prediction accuracy, while traits with moderate heritability (NBD, HFFBN, and FFBN) exhibited low prediction accuracy.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe Influence of Genotype on Phenotypic Diversity\u003c/p\u003e \u003cp\u003eCotton is one of the primary renewable sources of natural fiber, playing a pivotal role in the textile industry. GS breeding has the potential to expedite the breeding process while reducing costs, making it rapidly adopted in maize and wheat. However, its application in cotton remains largely unexplored. In 2015, the genomes of both Sea Island cotton and Upland cotton were successfully assembled (Yuan et al. \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Zhang et al. \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2015\u003c/span\u003e). Whole-genome resequencing has since been extensively employed for high-density mapping (Geng et al. \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yu et al. 2021). Currently, resequencing has been utilized in GWAS within cotton. By comparison, there are only four publications specifically addressing Sea Island cotton research (Table S9) (Fang et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yu et al. 2021; Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis study conducted resequencing on 203 Sea Island cotton accessions, achieving 11.02x coverage. Although some preliminary findings have been published (Fang et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yu et al. 2021; Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), the majority of the materials in this study remain unsequenced (71.92%) (Table S10 and Table S11). The assessment of 15 traits across nine environments (comprising various years and locations) revealed that EFBN, EBN, and NBD had not been investigated in Sea Island cotton studies. The findings exhibited minor discrepancies compared to earlier phenotypic research (Table S12) (Fang et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Song et al. \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yu et al. 2021; Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Notably, the classification of Sea Island cotton based on geographic origin proved to be more ambiguous than that of other crops (Huang et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2012\u003c/span\u003e; Qu et al. \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Zhou et al. \u003cspan citationid=\"CR89\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), with patterns of genetic diversity reflecting differing breeding histories. This study identified 26 significant SNPs associated with yield and 216 significant SNPs linked to fiber quality across two or more environments. However, no significant SNPs were identified for plant morphological (Table S8). This indicates that future studies on Sea Island cotton should incorporate larger populations. The analysis revealed 192 significant signals in the At subgroup and 50 in the Dt subgroup. This diverges from previous research, highlighting that variations in population composition can significantly influence mapping results (Ma et al. \u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe Impact of the A05: 15,688,837\u0026thinsp;\u0026minus;\u0026thinsp;18,187,089 Region on Fiber Development in Sea Island Cotton\u003c/p\u003e \u003cp\u003eThe regions associated with FL and MIC exhibited a high degree of overlap, with 134 shared genes identified. The FL region also partially overlapped with the FE region, sharing six common genes. Both the FL and MIC regions demonstrated some overlap with regions in prior research (Table S13). Notably, the FL genes \u003cem\u003eGB_A05G1761\u003c/em\u003e and \u003cem\u003eGB_A05G1764\u003c/em\u003e were situated within the FL region identified by Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) (A05: 15,773,942\u0026thinsp;\u0026minus;\u0026thinsp;16,773,942, 3\u0026ndash;79), encompassing 33 shared genes. These genes additionally overlapped with the regions of FL and FE identified by Su (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) (A05: 16,354,758\u0026thinsp;\u0026minus;\u0026thinsp;17,099,948, Hai7124), which includes 40 shared genes. The MIC gene \u003cem\u003eGB_A05G1771\u003c/em\u003e was located within the FL region previously reported by Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), which contains 27 shared genes. It was also situated in the FL and FE regions described by Su (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), which comprises 34 shared genes (Table S13). The FE genes \u003cem\u003eGB_A05G1702\u003c/em\u003e and \u003cem\u003eGB_A05G1707\u003c/em\u003e were found within the FL region identified by Zhao (Zhao et al. \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) (A05: 15,773,942\u0026thinsp;\u0026minus;\u0026thinsp;16,773,942, 3\u0026ndash;79), including 60 shared genes. They also overlapped with the FE and FL regions reported by Su et al. (Su et al. \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), sharing 64 genes (Table S13). In conclusion, chromosome A05 is a critical factor influencing fiber development in Sea Island cotton, and the region A05: 15,688,837\u0026thinsp;\u0026minus;\u0026thinsp;18,187,089 identified is a key area for fiber development.\u003c/p\u003e \u003cp\u003eSimultaneously, two additional genes on chromosome A05 may also be associated with FE in Sea Island cotton. \u003cem\u003eGB_A05G1840\u003c/em\u003e encodes SKU5, a protein involved in sucrose transport in plants. The accumulation of sucrose promotes cellulose synthesis. Moreover, SKU5 mediates the inhibitory effect of ABA on cotton FE (Beasley and Ting \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1973\u003c/span\u003e; J. et al. 1998; S. H. Dasani \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). \u003cem\u003eGB_A05G1829\u003c/em\u003e encodes BR-signaling kinase 1 (BSK1), which plays a role in responding to BR signaling during cotton fiber development (Lu et al. \u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Wang et al. \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). The identification of these genes will provide valuable insights for molecular breeding in cotton.\u003c/p\u003e \u003cp\u003eMarker density and predicted population size affect prediction accuracy\u003c/p\u003e \u003cp\u003eGS represents one of the most efficient breeding methodologies available today. In contrast to phenotype-based selection, GS significantly reduces both breeding cycles and costs while maintaining comparable selection gains (Beyene et al. \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2019\u003c/span\u003e). Furthermore, enhancing selection intensity is a critical strategy for accelerating the breeding process and improving genetic gain, all without substantially increasing the scale of breeding operations (Bangera et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Li et al. \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yang et al. \u003cspan citationid=\"CR77\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Accurate phenotypic estimation is pivotal for GS. Longitudinal data can further enhance prediction accuracy (Wang et al. \u003cspan citationid=\"CR73\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). Additionally, prediction accuracy is influenced by factors such as the genetic architecture of target traits (Velez-Torres et al. \u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Zhang et al. \u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e2017\u003c/span\u003e), statistical modeling approaches (Wang et al. \u003cspan citationid=\"CR72\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), marker density (Zhang et al. \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), population size (Combs and Bernardo \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2013\u003c/span\u003e), and SNPs associated with the target traits (Zhang et al. \u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Zhang et al. \u003cspan citationid=\"CR84\" class=\"CitationRef\"\u003e2015\u003c/span\u003eb).\u003c/p\u003e \u003cp\u003eThis study assessed the impact of training population size and marker quantity on prediction accuracy. As both the number of markers and the proportion of the training population increased, prediction accuracy improved before it stabilized. With a training population proportion between 50% and 60% and a marker count reaching 5,000, traits with high heritability exhibited elevated prediction accuracy (Cui et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Hu et al. \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Lan et al. \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, the heritability estimates for EBN and BW were inconsistent with their respective prediction accuracy values. The rrBLUP model operates under the assumption that marker effects are normally distributed and exhibit homogenous variance, indicating that the selection of appropriate statistical models and population sizes is critical for accurately estimating the prediction accuracy of EBN and BW (Zhang et al. \u003cspan citationid=\"CR83\" class=\"CitationRef\"\u003e2017\u003c/span\u003e). When the training population proportion was between 70% and 90%, and the marker count increased to 5,000, traits with moderate heritability had low prediction accuracy, potentially attributable to population LD decay and variations in population composition(Guo et al. \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2020\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn summary, this study delved into the growth characteristics, population structure, and genetic diversity of Sea Island cotton in the northern and southern regions of Xinjiang, identifying significant variations in the field of cotton research. Further analysis of candidate intervals identified A05: 15,688,837\u0026thinsp;\u0026minus;\u0026thinsp;18,187,089 as a critical region influencing fiber development in Sea Island cotton. This finding can serve as a reference for molecular-assisted breeding aimed at producing high-yield, high-quality Sea Island cotton. Finally, the investigation into the effects of training population size and marker density on prediction accuracy lays a solid theoretical foundation for cotton selective breeding.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eSupplementary Information\u003c/strong\u003e The online version contains supplementary material available at *.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contribution statement\u0026nbsp;\u003c/strong\u003eT. Yang: Analysed and summed all the data, drew the Figures and wrote the manuscript. H. Wang, J. Song, K. Zhao: The individual in charge oversaw the gathering and categorization of empirical data. B. Pang, Y. Wang, P. Luo, W. Liang, S. Shi, J. Wang, Y. Lin, J. Li, Z. Wang, Y. Guo: Participated in preliminary work preparation and experimental data collection, and participated in discussions. W. Gao: Directed the experiments and revised the manuscript. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003eAll data and materials supporting our findings are included in the Materials and Methods section. Details are provided in the attached files. All the Resequencing raw data we sequenced was deposited in the NCBI short read archives (SRA; accession number: PRJNA1179725). The transcriptome sequencing raw data were downloaded from the NCBI Gene Expression Omnibus (GEO) under the accession number GSE184965. This work was supported by The National Key Research and Development Program of China (2021YFD1900802-4), Natural Science Foundation project of Xinjiang (2019D01A41), Tianshan Youth Project (2018Q016).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u0026nbsp;\u003c/strong\u003eThe genomic resequencing datasets generated during the current study are available in the NCBI Sequence Read Archive repository under accession number PRJNA1179725. The transcriptome sequencing raw data were downloaded from the NCBI Gene Expression Omnibus (GEO) under the accession number GSE184965.\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no conflict of interest.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAi Q, Pan W, Zeng Y, Li Y, Cui L (2022) CCCH Zinc finger genes in Barley: genome-wide identification, evolution, expression and haplotype analysis. BMC PLANT BIOL 22:117\u003c/li\u003e\n\u003cli\u003eBangera R, Correa K, Lhorente JP, Figueroa R, Yanez JM (2017) Genomic predictions can accelerate selection for resistance against Piscirickettsia salmonis in Atlantic salmon (Salmo salar). BMC GENOMICS 18:121\u003c/li\u003e\n\u003cli\u003eBeasley CA, Ting IP (1973) The effects of plant growth substances on in vitro fiber development from fertilized cotton ovules. AM J BOT 60:130-139\u003c/li\u003e\n\u003cli\u003eBeyene Y, Gowda M, Olsen M, Robbins KR, Perez-Rodriguez P, Alvarado G, Dreher K, Gao SY, Mugo S, Prasanna BM, Crossa J (2019) Empirical Comparison of Tropical Maize Hybrids Selected Through Genomic and Phenotypic Selections. FRONT PLANT SCI 10:1502\u003c/li\u003e\n\u003cli\u003eCanella VC, Persa R, Chen P, Jarquin D (2022) Incorporation of Soil-Derived Covariates in Progeny Testing and Line Selection to Enhance Genomic Prediction Accuracy in Soybean Breeding. FRONT GENET 13:905824\u003c/li\u003e\n\u003cli\u003eCerrudo D, Cao S, Yuan Y, Martinez C, Suarez EA, Babu R, Zhang X, Trachsel S (2018) Genomic Selection Outperforms Marker Assisted Selection for Grain Yield and Physiological Traits in a Maize Doubled Haploid Population Across Water Treatments. FRONT PLANT SCI 9:336\u003c/li\u003e\n\u003cli\u003eChabi M, Goulas E, Galinousky D, Blervacq AS, Lucau-Danila A, Neutelings G, Grec S, Day A, Chabbert B, Haag K, Mussig J, Arribat S, Planchon S, Renaut J, Hawkins S (2023) Identification of new potential molecular actors related to fiber quality in flax through Omics. FRONT PLANT SCI 14:1204016\u003c/li\u003e\n\u003cli\u003eChen ZJ, Sreedasyam A, Ando A, Song Q, De Santiago LM, Hulse-Kemp AM, Ding M, Ye W, Kirkbride RC, Jenkins J, Plott C, Lovell J, Lin YM, Vaughn R, Liu B, Simpson S, Scheffler BE, Wen L, Saski CA, Grover CE, Hu G, Conover JL, Carlson JW, Shu S, Boston LB, Williams M, Peterson DG, McGee K, Jones DC, Wendel JF, Stelly DM, Grimwood J, Schmutz J (2020) Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. NAT GENET 52:525-533\u003c/li\u003e\n\u003cli\u003eCheng Y, Huang C, Hu Y, Jin S, Zhang X, Si Z, Zhao T, Chen J, Fang L, Dai F, Yang W, Wang P, Mei G, Guan X, Zhang T (2024) Gossypium purpurascens genome provides insight into the origin and domestication of upland cotton. J ADV RES 56:15-29\u003c/li\u003e\n\u003cli\u003eCombs E, Bernardo R (2013) Accuracy of Genomewide Selection for Different Traits with Constant Population Size, Heritability, and Number of Markers. PLANT GENOME-US 6:120\u003c/li\u003e\n\u003cli\u003eCui Y, Li R, Li G, Zhang F, Zhu T, Zhang Q, Ali J, Li Z, Xu S (2020) Hybrid breeding of rice via genomic selection. PLANT BIOTECHNOL J 18:57-67\u003c/li\u003e\n\u003cli\u003eDou L, Li Z, Wang H, Li H, Xiao G, Zhang X (2022) The hexokinase Gene Family in Cotton: Genome-Wide Characterization and Bioinformatics Analysis. FRONT PLANT SCI 13:882587\u003c/li\u003e\n\u003cli\u003eEndelman JB (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. PLANT GENOME-US 4:250-255\u003c/li\u003e\n\u003cli\u003eEz JMY, Barr\u0026iacute;a A, L\u0026oacute;pez ME, Moen T, Garcia BF, Yoshida GM, Xu P (2023) Genome‐wide association and genomic selection in aquaculture. Review in aquaculture 15:645-675\u003c/li\u003e\n\u003cli\u003eFan L, Wang L, Wang X, Zhang H, Zhu Y, Guo J, Gao W, Geng H, Chen Q, Qu Y (2018) A high-density genetic map of extra-long staple cotton (Gossypium barbadense) constructed using genotyping-by-sequencing based single nucleotide polymorphic markers and identification of fiber traits-related QTL in a recombinant inbred line population. BMC GENOMICS 19:489\u003c/li\u003e\n\u003cli\u003eFang L, Zhao T, Hu Y, Si Z, Zhu X, Han Z, Liu G, Wang S, Ju L, Guo M, Mei H, Wang L, Qi B, Wang H, Guan X, Zhang T (2021) Divergent improvement of two cultivated allotetraploid cotton species. PLANT BIOTECHNOL J 19:1325-1336\u003c/li\u003e\n\u003cli\u003eFu J, Hao Y, Li H, Reif JC, Chen S, Huang C, Wang G, Li X, Xu Y, Li L (2022) Integration of genomic selection with doubled-haploid evaluation in hybrid breeding: From GS 1.0 to GS 4.0 and beyond. MOL PLANT 15:577-580\u003c/li\u003e\n\u003cli\u003eGeng X, Sun G, Qu Y, Sarfraz Z, Jia Y, He S, Pan Z, Sun J, Iqbal MS, Wang Q, Qin H, Liu J, Liu H, Yang J, Ma Z, Xu D, Yang J, Zhang J, Li Z, Cai Z, Zhang X, Zhang X, Zhou G, Li L, Zhu H, Wang L, Pang B, Du X (2020) Genome-wide dissection of hybridization for fiber quality and yield-related traits in upland cotton. PLANT J 104:1285-1300\u003c/li\u003e\n\u003cli\u003eGuo R, Dhliwayo T, Mageto EK, Palacios-Rojas N, Lee M, Yu D, Ruan Y, Zhang A, San VF, Olsen M, Crossa J, Prasanna BM, Zhang L, Zhang X (2020) Genomic Prediction of Kernel Zinc Concentration in Multiple Maize Populations Using Genotyping-by-Sequencing and Repeat Amplification Sequencing Markers. FRONT PLANT SCI 11:534\u003c/li\u003e\n\u003cli\u003eGuo Z, Tucker DM, Lu J, Kishore V, Gay G (2012) Evaluation of genome-wide selection efficiency in maize nested association mapping populations. THEOR APPL GENET 124:261-275\u003c/li\u003e\n\u003cli\u003eHe P, Zhang Y, Xiao G (2020) Origin of a Subgenome and Genome Evolution of Allotetraploid Cotton Species. MOL PLANT 13:1238-1240\u003c/li\u003e\n\u003cli\u003eHu J, Chen B, Zhao J, Zhang F, Xie T, Xu K, Gao G, Yan G, Li H, Li L, Ji G, An H, Li H, Huang Q, Zhang M, Wu J, Song W, Zhang X, Luo Y, Chris PJ, Batley J, Tian S, Wu X (2022) Genomic selection and genetic architecture of agronomic traits during modern rapeseed breeding. NAT GENET 54:694-704\u003c/li\u003e\n\u003cli\u003eHu Y, Chen J, Fang L, Zhang Z, Ma W, Niu Y, Ju L, Deng J, Zhao T, Lian J, Baruch K, Fang D, Liu X, Ruan YL, Rahman MU, Han J, Wang K, Wang Q, Wu H, Mei G, Zang Y, Han Z, Xu C, Shen W, Yang D, Si Z, Dai F, Zou L, Huang F, Bai Y, Zhang Y, Brodt A, Ben-Hamo H, Zhu X, Zhou B, Guan X, Zhu S, Chen X, Zhang T (2019) Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. NAT GENET 51:739-748\u003c/li\u003e\n\u003cli\u003eHuang C, Nie X, Shen C, You C, Li W, Zhao W, Zhang X, Lin Z (2017) Population structure and genetic basis of the agronomic traits of upland cotton in China revealed by a genome-wide association study using high-density SNPs. PLANT BIOTECHNOL J 15:1374-1386\u003c/li\u003e\n\u003cli\u003eHuang X, Kurata N, Wei X, Wang ZX, Wang A, Zhao Q, Zhao Y, Liu K, Lu H, Li W, Guo Y, Lu Y, Zhou C, Fan D, Weng Q, Zhu C, Huang T, Zhang L, Wang Y, Feng L, Furuumi H, Kubo T, Miyabayashi T, Yuan X, Xu Q, Dong G, Zhan Q, Li C, Fujiyama A, Toyoda A, Lu T, Feng Q, Qian Q, Li J, Han B (2012) A map of rice genome variation reveals the origin of cultivated rice. NATURE 490:497-501\u003c/li\u003e\n\u003cli\u003eJ. GS, R. K, S. TV (1998) Potential Role of Abscisic Acid in Cotton Fiber and Ovule Development. J PLANT GROWTH REGUL 17:1-5\u003c/li\u003e\n\u003cli\u003eJiang X, Gong J, Zhang J, Zhang Z, Shi Y, Li J, Liu A, Gong W, Ge Q, Deng X, Fan S, Chen H, Kuang Z, Pan J, Che J, Zhang S, Jia T, Wei R, Chen Q, Wei S, Shang H, Yuan Y (2021) Quantitative Trait Loci and Transcriptome Analysis Reveal Genetic Basis of Fiber Quality Traits in CCRI70 RIL Population of Gossypium hirsutum. FRONT PLANT SCI 12:753755\u003c/li\u003e\n\u003cli\u003eLan S, Zheng C, Hauck K, McCausland M, Duguid SD, Booker HM, Cloutier S, You FM (2020) Genomic Prediction Accuracy of Seven Breeding Selection Traits Improved by QTL Identification in Flax. INT J MOL SCI 21:1577\u003c/li\u003e\n\u003cli\u003eLi B, Chen L, Sun W, Wu D, Wang M, Yu Y, Chen G, Yang W, Lin Z, Zhang X, Duan L, Yang X (2020) Phenomics-based GWAS analysis reveals the genetic architecture for drought resistance in cotton. PLANT BIOTECHNOL J 18:2533-2544\u003c/li\u003e\n\u003cli\u003eLi F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, Ma Z, Shang H, Ma X, Wu J, Liang X, Huang G, Percy RG, Liu K, Yang W, Chen W, Du X, Shi C, Yuan Y, Ye W, Liu X, Zhang X, Liu W, Wei H, Wei S, Huang G, Zhang X, Zhu S, Zhang H, Sun F, Wang X, Liang J, Wang J, He Q, Huang L, Wang J, Cui J, Song G, Wang K, Xu X, Yu JZ, Zhu Y, Yu S (2015) Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. NAT BIOTECHNOL 33:524-530\u003c/li\u003e\n\u003cli\u003eLi F, Fan G, Wang K, Sun F, Yuan Y, Song G, Li Q, Ma Z, Lu C, Zou C, Chen W, Liang X, Shang H, Liu W, Shi C, Xiao G, Gou C, Ye W, Xu X, Zhang X, Wei H, Li Z, Zhang G, Wang J, Liu K, Kohel RJ, Percy RG, Yu JZ, Zhu YX, Wang J, Yu S (2014) Genome sequence of the cultivated cotton Gossypium arboreum. NAT GENET 46:567-572\u003c/li\u003e\n\u003cli\u003eLi H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler transform. BIOINFORMATICS 26:589-595\u003c/li\u003e\n\u003cli\u003eLi H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R (2009) The Sequence Alignment/Map format and SAMtools. BIOINFORMATICS 25:2078-2079\u003c/li\u003e\n\u003cli\u003eLi H, Li X, Zhang P, Feng Y, Mi J, Gao S, Sheng L, Ali M, Yang Z, Li L, Fang W, Wang W, Qian Q, Gu F, Zhou W (2024) Smart Breeding Platform: A web-based tool for high-throughput population genetics, phenomics, and genomic selection. MOL PLANT 17:677-681\u003c/li\u003e\n\u003cli\u003eLi S, Kong L, Xiao X, Li P, Liu A, Li J, Gong J, Gong W, Ge Q, Shang H, Pan J, Chen H, Peng Y, Zhang Y, Lu Q, Shi Y, Yuan Y (2023) Genome-wide artificial introgressions of Gossypium barbadense into G. hirsutum reveal superior loci for simultaneous improvement of cotton fiber quality and yield traits. J ADV RES 53:1-16\u003c/li\u003e\n\u003cli\u003eLi W, Li W, Song Z, Gao Z, Xie K, Wang Y, Wang B, Hu J, Zhang Q, Ning C, Wang D, Fan X (2024) Marker Density and Models to Improve the Accuracy of Genomic Selection for Growth and Slaughter Traits in Meat Rabbits. Genes (Basel) 15:454\u003c/li\u003e\n\u003cli\u003eLi Y, Qin T, Wei C, Sun J, Dong T, Zhou R, Chen Q, Wang Q (2019) Using Transcriptome Analysis to Screen for Key Genes and Pathways Related to Cytoplasmic Male Sterility in Cotton (Gossypium hirsutum L.). INT J MOL SCI 20:5120\u003c/li\u003e\n\u003cli\u003eLi Z, Wang P, You C, Yu J, Zhang X, Yan F, Ye Z, Shen C, Li B, Guo K, Liu N, Thyssen GN, Fang DD, Lindsey K, Zhang X, Wang M, Tu L (2020) Combined GWAS and eQTL analysis uncovers a genetic regulatory network orchestrating the initiation of secondary cell wall development in cotton. NEW PHYTOL 226:1738-1752\u003c/li\u003e\n\u003cli\u003eLiggett LA, Cato LD, Weinstock JS, Zhang Y, Nouraie SM, Gladwin MT, Garrett ME, Ashley-Koch A, Telen MJ, Custer B, Kelly S, Dinardo CL, Sabino EC, Loureiro P, Carneiro-Proietti AB, Maximo C, Reiner AP, Abecasis GR, Williams DA, Natarajan P, Bick AG, Sankaran VG (2022) Clonal hematopoiesis in sickle cell disease. J CLIN INVEST 132:138\u003c/li\u003e\n\u003cli\u003eLu R, Li Y, Zhang J, Wang Y, Zhang J, Li Y, Zheng Y, Li XB (2022) The bHLH/HLH transcription factors GhFP2 and GhACE1 antagonistically regulate fiber elongation in cotton. PLANT PHYSIOL 189:628-643\u003c/li\u003e\n\u003cli\u003eLu R, Zhang J, Liu D, Wei YL, Wang Y, Li XB (2018) Characterization of bHLH/HLH genes that are involved in brassinosteroid (BR) signaling in fiber development of cotton (Gossypium hirsutum). BMC PLANT BIOL 18:304\u003c/li\u003e\n\u003cli\u003eM L, J P, V G, D J, P B, V G, M F, M M, C C, C R (2005) Changes in transcriptional profiles are associated with early fruit tissue specialization in tomato. PLANT PHYSIOL 139:750-769\u003c/li\u003e\n\u003cli\u003eMa Z, He S, Wang X, Sun J, Zhang Y, Zhang G, Wu L, Li Z, Liu Z, Sun G, Yan Y, Jia Y, Yang J, Pan Z, Gu Q, Li X, Sun Z, Dai P, Liu Z, Gong W, Wu J, Wang M, Liu H, Feng K, Ke H, Wang J, Lan H, Wang G, Peng J, Wang N, Wang L, Pang B, Peng Z, Li R, Tian S, Du X (2018) Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield. NAT GENET 50:803-813\u003c/li\u003e\n\u003cli\u003eMa Z, Zhang Y, Wu L, Zhang G, Sun Z, Li Z, Jiang Y, Ke H, Chen B, Liu Z, Gu Q, Wang Z, Wang G, Yang J, Wu J, Yan Y, Meng C, Li L, Li X, Mo S, Wu N, Ma L, Chen L, Zhang M, Si A, Yang Z, Wang N, Wu L, Zhang D, Cui Y, Cui J, Lv X, Li Y, Shi R, Duan Y, Tian S, Wang X (2021) High-quality genome assembly and resequencing of modern cotton cultivars provide resources for crop improvement. NAT GENET 53:1385-1391\u003c/li\u003e\n\u003cli\u003eMcKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. GENOME RES 20:1297-1303\u003c/li\u003e\n\u003cli\u003eMeuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. GENETICS 157:1819-1829\u003c/li\u003e\n\u003cli\u003eMir ZA, Chandra T, Saharan A, Budhlakoti N, Mishra DC, Saharan MS, Mir RR, Singh AK, Sharma S, Vikas VK, Kumar S (2023) Recent advances on genome-wide association studies (GWAS) and genomic selection (GS); prospects for Fusarium head blight research in Durum wheat. MOL BIOL REP 50:3885-3901\u003c/li\u003e\n\u003cli\u003eMontesinos-Lopez A, Crespo-Herrera L, Dreisigacker S, Gerard G, Vitale P, Saint PC, Govindan V, Tarekegn ZT, Flores MC, Perez-Rodriguez P, Ramos-Pulido S, Lillemo M, Li H, Montesinos-Lopez OA, Crossa J (2024) Deep learning methods improve genomic prediction of wheat breeding. FRONT PLANT SCI 15:1324090\u003c/li\u003e\n\u003cli\u003eNie X, Wen T, Shao P, Tang B, Nuriman-Guli A, Yu Y, Du X, You C, Lin Z (2020) High-density genetic variation maps reveal the correlation between asymmetric interspecific introgressions and improvement of agronomic traits in Upland and Pima cotton varieties developed in Xinjiang, China. PLANT J 103:677-689\u003c/li\u003e\n\u003cli\u003ePaterson AH, Wendel JF, Gundlach H, Guo H, Jenkins J, Jin D, Llewellyn D, Showmaker KC, Shu S, Udall J, Yoo MJ, Byers R, Chen W, Doron-Faigenboim A, Duke MV, Gong L, Grimwood J, Grover C, Grupp K, Hu G, Lee TH, Li J, Lin L, Liu T, Marler BS, Page JT, Roberts AW, Romanel E, Sanders WS, Szadkowski E, Tan X, Tang H, Xu C, Wang J, Wang Z, Zhang D, Zhang L, Ashrafi H, Bedon F, Bowers JE, Brubaker CL, Chee PW, Das S, Gingle AR, Haigler CH, Harker D, Hoffmann LV, Hovav R, Jones DC, Lemke C, Mansoor S, Ur RM, Rainville LN, Rambani A, Reddy UK, Rong JK, Saranga Y, Scheffler BE, Scheffler JA, Stelly DM, Triplett BA, Van Deynze A, Vaslin MF, Waghmare VN, Walford SA, Wright RJ, Zaki EA, Zhang T, Dennis ES, Mayer KF, Peterson DG, Rokhsar DS, Wang X, Schmutz J (2012) Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. NATURE 492:423-427\u003c/li\u003e\n\u003cli\u003ePedersen BS, Quinlan AR (2018) Mosdepth: quick coverage calculation for genomes and exomes. BIOINFORMATICS 34:867-868\u003c/li\u003e\n\u003cli\u003ePertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. NAT PROTOC 11:1650-1667\u003c/li\u003e\n\u003cli\u003eQu Z, Wu Y, Hu D, Li T, Liang H, Ye F, Xue J, Xu S (2022) Genome-Wide Association Analysis for Candidate Genes Contributing to Kernel-Related Traits in Maize. FRONT PLANT SCI 13:872292\u003c/li\u003e\n\u003cli\u003eRabieyan E, Bihamta MR, Moghaddam ME, Mohammadi V, Alipour H (2022) Genome-wide association mapping and genomic prediction of agronomical traits and breeding values in Iranian wheat under rain-fed and well-watered conditions. BMC GENOMICS 23:831\u003c/li\u003e\n\u003cli\u003eS. H. Dasani VST (2006) Role of abscisic acid in cotton fiber development. Russian Journal of Plant Physiology 53:62-67\u003c/li\u003e\n\u003cli\u003eSong X, Zhu G, Su X, Yu Y, Duan Y, Wang H, Shang X, Xu H, Chen Q, Guo W (2024) Combined genome and transcriptome analysis of elite fiber quality in Gossypium barbadense. PLANT PHYSIOL 195:2158-2175\u003c/li\u003e\n\u003cli\u003eSpiess GM, Hausman A, Yu P, Cohen JD, Rampey RA, Zolman BK (2014) Auxin Input Pathway Disruptions Are Mitigated by Changes in Auxin Biosynthetic Gene Expression in Arabidopsis. PLANT PHYSIOL 165:1092-1104\u003c/li\u003e\n\u003cli\u003eSreedasyam A, Lovell JT, Mamidi S, Khanal S, Jenkins JW, Plott C, Bryan KB, Li Z, Shu S, Carlson J, Goodstein D, De Santiago L, Kirkbride RC, Calleja S, Campbell T, Koebernick JC, Dever JK, Scheffler JA, Pauli D, Jenkins JN, Mccarty JC, Williams M, Boston L, Webber J, Udall JA, Chen ZJ, Bourland F, Stiller WN, Saski CA, Grimwood J, Chee PW, Jones DC, Schmutz J (2024) Genome resources for three modern cotton lines guide future breeding efforts. NAT PLANTS 10:1039-1051\u003c/li\u003e\n\u003cli\u003eStrader LC, Culler AH, Cohen JD, Bartel B (2010) Conversion of endogenous indole-3-butyric acid to indole-3-acetic acid drives cell expansion in Arabidopsis seedlings. PLANT PHYSIOL 153:1577-1586\u003c/li\u003e\n\u003cli\u003eSu J, Wang C, Ma Q, Zhang A, Shi C, Liu J, Zhang X, Yang D, Ma X (2020) An RTM-GWAS procedure reveals the QTL alleles and candidate genes for three yield-related traits in upland cotton. BMC PLANT BIOL 20:416\u003c/li\u003e\n\u003cli\u003eSu X, Zhu G, Song X, Xu H, Li W, Ning X, Chen Q, Guo W (2020) Genome-wide association analysis reveals loci and candidate genes involved in fiber quality traits in sea island cotton (Gossypium barbadense). BMC PLANT BIOL 20:289\u003c/li\u003e\n\u003cli\u003eSun Z, Wang X, Liu Z, Gu Q, Zhang Y, Li Z, Ke H, Yang J, Wu J, Wu L, Zhang G, Zhang C, Ma Z (2017) Genome-wide association study discovered genetic variation and candidate genes of fibre quality traits in Gossypium hirsutum L. PLANT BIOTECHNOL J 15:982-996\u003c/li\u003e\n\u003cli\u003eTechnow F, Burger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separated training sets for heterotic groups. G3 (Bethesda) 3:197-203\u003c/li\u003e\n\u003cli\u003eVelez-Torres M, Jesus Garcia-Zavala J, Hernandez-Rodriguez M, Lobato-Ortiz R, Jesus Lopez-Reynoso J, Benitez-Riquelme I, Apolinar Mejia-Contreras J, Esquivel-Esquivel G, Domingo Molina-Galan J, Perez-Rodriguez P, Zhang X (2018) Genomic prediction of the general combining ability of maize lines (Zea mays L.) and the performance of their single crosses. PLANT BREEDING 137:379-387\u003c/li\u003e\n\u003cli\u003eWang K, Abid MA, Rasheed A, Crossa J, Hearne S, Li H (2023) DNNGP, a deep neural network-based method for genomic prediction using multi-omics data in plants. MOL PLANT 16:279-293\u003c/li\u003e\n\u003cli\u003eWang K, Li M, Hakonarson H (2010) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. NUCLEIC ACIDS RES 38:e164\u003c/li\u003e\n\u003cli\u003eWang L, Cheng H, Xiong F, Ma S, Zheng L, Song Y, Deng K, Wu H, Li F, Yang Z (2020) Comparative phosphoproteomic analysis of BR-defective mutant reveals a key role of GhSK13 in regulating cotton fiber development. SCI CHINA LIFE SCI 63:1905-1917\u003c/li\u003e\n\u003cli\u003eWang L, Wang X, Maimaitiaili B, Kafle A, Khan KS, Feng G (2021) Breeding Practice Improves the Mycorrhizal Responsiveness of Cotton (Gossypium spp. L.). FRONT PLANT SCI 12:780454\u003c/li\u003e\n\u003cli\u003eWang M, Tu L, Lin M, Lin Z, Wang P, Yang Q, Ye Z, Shen C, Li J, Zhang L, Zhou X, Nie X, Li Z, Guo K, Ma Y, Huang C, Jin S, Zhu L, Yang X, Min L, Yuan D, Zhang Q, Lindsey K, Zhang X (2017) Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. NAT GENET 49:579-587\u003c/li\u003e\n\u003cli\u003eWang M, Tu L, Yuan D, Zhu D, Shen C, Li J, Liu F, Pei L, Wang P, Zhao G, Ye Z, Huang H, Yan F, Ma Y, Zhang L, Liu M, You J, Yang Y, Liu Z, Huang F, Li B, Qiu P, Zhang Q, Zhu L, Jin S, Yang X, Min L, Li G, Chen LL, Zheng H, Lindsey K, Lin Z, Udall JA, Zhang X (2019) Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. NAT GENET 51:224-229\u003c/li\u003e\n\u003cli\u003eWang N, Li Y, Chen YH, Lu R, Zhou L, Wang Y, Zheng Y, Li XB (2021) Phosphorylation of WRKY16 by MPK3-1 is essential for its transcriptional activity during fiber initiation and elongation in cotton (Gossypium hirsutum). PLANT CELL 33:2736-2752\u003c/li\u003e\n\u003cli\u003eWang N, Li Y, Meng Q, Chen M, Wu M, Zhang R, Xu Z, Sun J, Zhang X, Nie X, Yuan D, Lin Z (2023) Genome and haplotype provide insights into the population differentiation and breeding improvement of Gossypium barbadense. J ADV RES 54:15-27\u003c/li\u003e\n\u003cli\u003eWang N, Wang H, Zhang A, Liu Y, Yu D, Hao Z, Ilut D, Glaubitz JC, Gao Y, Jones E, Olsen M, Li X, San Vicente F, Prasanna BM, Crossa J, Perez-Rodriguez P, Zhang X (2020) Genomic prediction across years in a maize doubled haploid breeding program to accelerate early-stage testcross testing. Theoretical and Applied Genetics: International Journal of Breeding Research and Cell Genetics 133:2869-2879\u003c/li\u003e\n\u003cli\u003eWechter WP, Levi A, Harris KR, Davis AR, Fei Z, Katzir N, Giovannoni JJ, Salman-Minkov A, Hernandez A, Thimmapuram J, Tadmor Y, Portnoy V, Trebitsh T (2008) Gene expression in developing watermelon fruit. BMC GENOMICS 9:275\u003c/li\u003e\n\u003cli\u003eWerner CR, Qian L, Voss-Fels KP, Abbadi A, Leckband G, Frisch M, Snowdon RJ (2018) Genome-wide regression models considering general and specific combining ability predict hybrid performance in oilseed rape with similar accuracy regardless of trait architecture. Theoretical and Applied Genetics: International Journal of Breeding Research and Cell Genetics 131:299-317\u003c/li\u003e\n\u003cli\u003eXu Y, Zhang X, Li H, Zheng H, Zhang J, Olsen MS, Varshney RK, Prasanna BM, Qian Q (2022) Smart breeding driven by big data, artificial intelligence, and integrated genomic-enviromic prediction. MOL PLANT 15:1664-1695\u003c/li\u003e\n\u003cli\u003eYang AQ, Chen B, Ran ML, Yang GM, Zeng C (2020) The application of genomic selection in pig cross breeding. Yi Chuan 42:145-152\u003c/li\u003e\n\u003cli\u003eYang Y, Lai W, Long L, Gao W, Xu F, Li P, Zhou S, Ding Y, Hu H (2023) Comparative proteomic analysis identified proteins and the phenylpropanoid biosynthesis pathway involved in the response to ABA treatment in cotton fiber development. Sci Rep 13:1488\u003c/li\u003e\n\u003cli\u003eYu J, Hui Y, Chen J, Yu H, Gao X, Zhang Z, Li Q, Zhu S, Zhao T (2021) Whole-genome resequencing of 240 Gossypium barbadense accessions reveals genetic variation and genes associated with fiber strength and lint percentage. THEOR APPL GENET 134:3249-3261\u003c/li\u003e\n\u003cli\u003eYuan D, Tang Z, Wang M, Gao W, Tu L, Jin X, Chen L, He Y, Zhang L, Zhu L, Li Y, Liang Q, Lin Z, Yang X, Liu N, Jin S, Lei Y, Ding Y, Li G, Ruan X, Ruan Y, Zhang X (2015) The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres. Sci Rep 5:17662\u003c/li\u003e\n\u003cli\u003eZafar MM, Jia X, Shakeel A, Sarfraz Z, Manan A, Imran A, Mo H, Ali A, Youlu Y, Razzaq A, Iqbal MS, Ren M (2021) Unraveling Heat Tolerance in Upland Cotton (Gossypium hirsutum L.) Using Univariate and Multivariate Analysis. FRONT PLANT SCI 12:727835\u003c/li\u003e\n\u003cli\u003eZaidi SS, Naqvi RZ, Asif M, Strickler S, Shakir S, Shafiq M, Khan AM, Amin I, Mishra B, Mukhtar MS, Scheffler BE, Scheffler JA, Mueller LA, Mansoor S (2020) Molecular insight into cotton leaf curl geminivirus disease resistance in cultivated cotton (Gossypium hirsutum). PLANT BIOTECHNOL J 18:691-706\u003c/li\u003e\n\u003cli\u003eZhang A, Wang H, Beyene Y, Semagn K, Liu Y, Cao S, Cui Z, Ruan Y, Burgueno J, San VF, Olsen M, Prasanna BM, Crossa J, Yu H, Zhang X (2017) Effect of Trait Heritability, Training Population Size and Marker Density on Genomic Prediction Accuracy Estimation in 22 bi-parental Tropical Maize Populations. FRONT PLANT SCI 8:1916\u003c/li\u003e\n\u003cli\u003eZhang T, Hu Y, Jiang W, Fang L, Guan X, Chen J, Zhang J, Saski CA, Scheffler BE, Stelly DM, Hulse-Kemp AM, Wan Q, Liu B, Liu C, Wang S, Pan M, Wang Y, Wang D, Ye W, Chang L, Zhang W, Song Q, Kirkbride RC, Chen X, Dennis E, Llewellyn DJ, Peterson DG, Thaxton P, Jones DC, Wang Q, Xu X, Zhang H, Wu H, Zhou L, Mei G, Chen S, Tian Y, Xiang D, Li X, Ding J, Zuo Q, Tao L, Liu Y, Li J, Lin Y, Hui Y, Cao Z, Cai C, Zhu X, Jiang Z, Zhou B, Guo W, Li R, Chen ZJ (2015) Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. NAT BIOTECHNOL 33:531-537\u003c/li\u003e\n\u003cli\u003eZhang W, Lin K, Fu W, Xie J, Fan X, Zhang M, Luo H, Yin Y, Guo Q, Huang H, Chen T, Lin X, Yuan Y, Huang C, Du S (2024) Insights for the Captive Management of South China Tigers Based on a Large-Scale Genetic Survey. Genes (Basel) 15:398\u003c/li\u003e\n\u003cli\u003eZhang X, Perez-Rodriguez P, Semagn K, Beyene Y, Babu R, Lopez-Cruz MA, San VF, Olsen M, Buckler E, Jannink JL, Prasanna BM, Crossa J (2015) Genomic prediction in biparental tropical maize populations in water-stressed and well-watered environments using low-density and GBS SNPs. Heredity (Edinb) 114:291-299\u003c/li\u003e\n\u003cli\u003eZhao H, Wang Y, Zhao S, Fu Y, Zhu L (2021) HOMEOBOX PROTEIN 24 mediates the conversion of indole-3-butyric acid to indole-3-acetic acid to promote root hair elongation. NEW PHYTOL 232:2057-2070\u003c/li\u003e\n\u003cli\u003eZhao N, Wang W, Grover CE, Jiang K, Pan Z, Guo B, Zhu J, Su Y, Wang M, Nie H, Xiao L, Guo A, Yang J, Cheng C, Ning X, Li B, Xu H, Adjibolosoo D, Aierxi A, Li P, Geng J, Wendel JF, Kong J, Hua J (2022) Genomic and GWAS analyses demonstrate phylogenomic relationships of Gossypium barbadense in China and selection for fibre length, lint percentage and Fusarium wilt resistance. PLANT BIOTECHNOL J 20:691-710\u003c/li\u003e\n\u003cli\u003eZhou Z, Guan H, Liu C, Zhang Z, Geng S, Qin M, Li W, Shi X, Dai Z, Lei Z, Wu Z, Tian B, Hou J (2021) Identification of genomic regions affecting grain peroxidase activity in bread wheat using genome-wide association study. BMC PLANT BIOL 21:523\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"theoretical-and-applied-genetics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"taag","sideBox":"Learn more about [Theoretical and Applied Genetics](https://www.springer.com/journal/122)","snPcode":"122","submissionUrl":"https://submission.nature.com/new-submission/122/3","title":"Theoretical and Applied Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Gossypium barbadense, GWAS, yield, plant morphological, fiber quality, GS","lastPublishedDoi":"10.21203/rs.3.rs-5667934/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5667934/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Sea Island cotton (Gossypium barbadense), a premier tetraploid cotton species, is globally renowned for its fibers, which exhibit thermal expansion and contraction properties similar to those of animal fibers such as cashmere. Despite its significance, there remains a limited understanding of how genes influence primary traits across germplasms and the relationship between predictive factors identified through genomic selection (GS) technology and heritability. This study aimed to address this academic gap. A total of 203 Sea Island cotton accessions were incorporated for resequencing. Population evolution analysis revealed three distinct groups, which were largely shaped by geographical distribution and breeding objectives. Then, Genome-Wide Association Study (GWAS) was performed on 15 traits related to yield, fiber quality, and plant morphological, identifying a greater number of loci associated with fiber quality traits that exhibited higher broad sense heritability. Transcriptomic and gene expression analysis identified six key genes involved in regulating fiber length (GB_A05G1764 and GB_A05G1761), fiber micronaire (GB_A05G1895 and GB_A05G1771), and fiber elongation (GB_A05G1702 and GB_A05G1707). Furthermore, geographical and temporal analyses indicated that these traits underwent directional selection in Sea Island cotton. In addition, this study explored the effects of marker density and population size on prediction accuracy using GS technology, finding that traits with higher broad sense heritability, such as fiber quality, achieved higher prediction accuracy, while those with lower broad sense heritability, such as plant morphological, showed reduced accuracy. This study provides an important reference for future GS breeding, in addition to deepening the scientific understanding of the genetic evolution of cotton","manuscriptTitle":"GWAS and GS analysis revealed the selection and prediction efficiency for yield, plant morphological, and fiber quality in Gossypium barbadense","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-03 09:08:45","doi":"10.21203/rs.3.rs-5667934/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"","date":"2025-04-04T05:50:58+00:00","index":0,"fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-02T07:23:45+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-04-02T07:05:02+00:00","index":"","fulltext":""},{"type":"submitted","content":"Theoretical and Applied Genetics","date":"2025-04-01T08:47:21+00:00","index":"","fulltext":""},{"type":"decision","content":"Minor revisions","date":"2025-03-27T10:07:33+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"theoretical-and-applied-genetics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"taag","sideBox":"Learn more about [Theoretical and Applied Genetics](https://www.springer.com/journal/122)","snPcode":"122","submissionUrl":"https://submission.nature.com/new-submission/122/3","title":"Theoretical and Applied Genetics","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"b80cf90c-7e92-4ef6-ad32-fc64eb12a241","owner":[],"postedDate":"April 3rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-06-16T16:03:10+00:00","versionOfRecord":{"articleIdentity":"rs-5667934","link":"https://doi.org/10.1007/s00122-025-04911-1","journal":{"identity":"theoretical-and-applied-genetics","isVorOnly":false,"title":"Theoretical and Applied Genetics"},"publishedOn":"2025-06-09 15:57:14","publishedOnDateReadable":"June 9th, 2025"},"versionCreatedAt":"2025-04-03 09:08:45","video":"","vorDoi":"10.1007/s00122-025-04911-1","vorDoiUrl":"https://doi.org/10.1007/s00122-025-04911-1","workflowStages":[]},"version":"v1","identity":"rs-5667934","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5667934","identity":"rs-5667934","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00